How LLMs work, in plain English
Here is the surprising truth at the heart of every AI chatbot you have ever used: underneath all the magic, a large language model (LLM) is doing one thing extraordinarily well. It predicts the next word. That's it. You give it some text, and it asks itself, "Given everything I have read across billions of pages, what word is most likely to come next?" Then it adds that word, looks at the new, slightly longer text, and predicts again. Word by word, it builds a complete answer.
Think of it like the autocomplete on your phone, but trained on a library the size of the internet and far, far better at understanding context. When you text "I'm running a little..." your phone suggests "late." An LLM does the same thing, except it can carry that prediction across whole paragraphs, hold a tone, follow instructions, and weave in facts it absorbed during training. The fluency feels like understanding, and for practical purposes it often behaves like it. But the engine is prediction, not a database lookup and not conscious thought.
Why does this matter to you as a prompt writer? Because if the model is constantly guessing the most likely continuation of your words, then the words you choose steer everything. A clear, well-framed prompt nudges those predictions toward exactly what you want. A vague one leaves the model guessing in the dark. The whole craft of prompt engineering grows from this single idea.
Because the model predicts rather than retrieves, it can confidently state things that are wrong. This is called "hallucination." Treat AI output as a fast first draft to verify, not gospel, especially for facts, numbers, and citations.
Tokens and context windows: the model's desk space
The model doesn't read whole words the way you do. It chops text into small chunks called tokens. A token is roughly three-quarters of an English word, so "prompt engineering" is about three tokens, and a tidy rule of thumb is that 100 tokens equals around 75 words. The model counts everything in tokens: your question, the conversation history, and its own reply.
Now picture the model working at a desk. The context window is how big that desk is, the total amount of text (in tokens) the model can keep in front of it at once. This is its short-term memory for your conversation. Everything you have said so far, plus everything it has answered, has to fit on that desk. Modern models have huge desks, but they are not infinite.
Here's the catch. When a long chat overflows the desk, the oldest papers slide off the edge. The model literally cannot see the things that scrolled out of its context window anymore, which is why a marathon conversation can suddenly "forget" a detail you mentioned an hour ago. It isn't being careless; that text is simply off the desk. Knowing this changes how you work: for important details, restate them when they matter, or start a fresh chat with a clean summary.
Pasting a giant document into a long chat can quietly push your earlier instructions off the desk. If quality drops mid-conversation, start fresh and re-paste only what's essential.
ChatGPT, Claude & Gemini at a glance
The three models you'll use most in this course are OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini. They are built on the same next-word-prediction foundation, so they feel similar, but each has a personality and a sweet spot shaped by how it was trained and what it connects to.
ChatGPT is the versatile all-rounder. It is fast, creative, great at brainstorming, casual drafting, coding help, and it has a rich ecosystem of plugins, image generation, and voice. If you want one tool to reach for first, this is a safe default. Claude shines on long, nuanced writing and careful reasoning. It tends to produce thoughtful, well-structured prose, handles very large documents gracefully, and is a favorite for editing, analysis, and tasks where tone and care matter. Gemini is deeply woven into Google's world; it is strong at pulling in fresh information and lives comfortably alongside Search, Docs, and Gmail, making it handy for research and anything connected to your Google workflow.
None of these is simply "the best." The differences are real but modest, and they shift with every update. The skill that lasts is knowing how to match a model to a job, which is exactly what you'll practice next.
Choosing the right model for the task
The fastest way to pick a model is to ask one question: what does this task most need? If it needs up-to-the-minute facts or ties into your Google account, lean Gemini. If it needs long, careful, well-toned writing or analysis of a big document, lean Claude. If it needs quick creative output, coding help, or a general workhorse, ChatGPT is a dependable first stop. When stakes are high, run the same prompt through two models and compare; their disagreements are often where the interesting truth hides.
Notice how a clear ask helps any model, while a fuzzy one wastes the strengths of all three. Compare these two:
Tell me about renewable energy.I'm writing a 300-word intro for a high school blog. Explain how solar and
wind energy work in simple, friendly language, give one real-world example of
each, and end with a single encouraging sentence about the future.The second prompt sets the audience, length, format, and tone, so every model has a clear target to predict toward. You'll go deep on this in later lessons; for now, just feel how much the framing matters.
Don't marry one model. The pros keep two or three open and route each task to the tool best suited for it. Picking the right model is itself a prompt-engineering skill.
🔧 Technical Deep Dive — How tokenization actually works
Under the hood, models split text into subword units using an algorithm like byte-pair encoding. Common words ("the", "and") usually map to a single token, but rarer or longer words get broken into pieces, so "unbelievable" might become "un", "believ", and "able." That's why tokens are not the same as words: a token can be a whole word, part of a word, a space, or even a single punctuation mark. Numbers and code often tokenize in surprising ways too.
Estimating token counts: for everyday English, multiply your word count by about 1.33, or remember that 1,000 tokens is roughly 750 words (about a page and a half). If you need exact numbers, providers offer token counters, but the rule of thumb is plenty for planning prompts.
Why limits cause truncation and forgetting: every model has a fixed maximum number of tokens it can hold across the prompt plus the reply. When a conversation grows past that ceiling, the system trims the oldest tokens to make room, so early instructions and details silently disappear from the model's view. The model isn't ignoring you; that text is no longer in its context at all. The fix is practical: keep important constraints near your latest message, summarize long threads, and split huge tasks into smaller chats.
Time to feel the differences for yourself with a quick side-by-side test. It takes about five minutes.
- Open ChatGPT, Claude, and Gemini in three browser tabs (free versions are fine).
- Pick one simple, identical question, for example: "Explain what a black hole is to a curious 10-year-old in about four sentences."
- Paste the exact same question into all three, with no other instructions, and read each reply.
- Note one clear difference you notice, such as tone, length, choice of example, or how friendly it feels.
- Jot a one-line verdict: which answer would you actually use for that audience, and why?
📌 Save your question and the three answers in your prompt doc. It's your first personal benchmark for comparing models.
- LLMs work by predicting the next word extremely well, so the words you choose steer the result, and they can confidently get facts wrong.
- Text is measured in tokens (about 100 tokens to 75 words), and the context window is the model's limited desk space, so long chats can forget early details.
- ChatGPT is the versatile all-rounder, Claude excels at thoughtful long-form writing and reasoning, and Gemini is strong for connected, up-to-date research.
- There is no single best model; match the tool to what the task most needs, and compare two when the stakes are high.