Chatbot vs. LLM App vs. Agent · Building AI Agents with LLMs

The 5,000-ticket morning

It's Monday at Northwind Co. Over the weekend the support queue filled with 5,000 tickets. One of them reads:

"Order #4471 arrived cracked. I want a refund."

Now imagine three different systems handling that exact message:

A chatbot matches the word "refund" and replies with a canned link: "Here's our return policy 👉." It never opened the order. It can't.
An LLM app reads the message, drafts a warm, fluent apology, and explains the policy in the customer's own words. Beautifully written — but it still doesn't know whether order #4471 even exists, let alone if it qualifies.
An agent reads the ticket, decides it needs to look something up, calls the orders API, checks #4471 against the 30-day policy, sees it qualifies, issues the refund, and writes back: "Done — $89 is on its way back to your card, arriving in 3–5 days. Sorry about the damage."

Same input. Three wildly different outcomes. The difference isn't how smart the model is — all three could share the very same LLM. The difference is how much each one is allowed to decide and do on its own. That single axis — autonomy — is what this lesson is about, and it's the lens you'll use for the rest of the course.

Try it · 60 seconds

Before reading on, jot down which of the three systems above describes the last AI product you used. Was it answering you, or acting for you? Keep that example handy — you'll place it on the spectrum at the end of this lesson.

Three systems, one spectrum

People use "chatbot," "LLM app," and "agent" interchangeably, and that fuzziness causes real engineering mistakes — building a rigid pipeline when you needed an agent, or unleashing an autonomous agent on a job that only needed a script. Let's pin down each one by a single question: who decides what happens next?

Rule-based chatbot — the script decides

The classic chatbot is a decision tree. Keyword matches, intents, and hard-coded branches map inputs to canned outputs. There may be no LLM at all. It's predictable and cheap — and it falls off a cliff the moment a user phrases something the author didn't anticipate. Control lives entirely in the code.

LLM app — the model decides what to say

Wrap a single LLM call in some code — take input, build a prompt, return the model's text — and you have an LLM app. Summarizers, "rewrite this email," a Q&A box over your FAQ: these are LLM apps. The model handles language brilliantly, but the control flow is still fixed: input → one model call → output. It generates a response; it does not take an action and it cannot choose to do something you didn't pre-wire. Even a fixed multi-step pipeline (classify → look up → reply, in that exact order, every time) is still an LLM app — a workflow — because you chose the steps, not the model.

Agent — the model decides what to do

An agent puts the LLM in a loop with access to tools (functions that read or change the world), and lets the model choose its own next step — call a tool, call a different tool, or stop and answer — based on what it has learned so far. The path through the program is decided at runtime by the model, not laid down in advance by you.

Under the hood

"Autonomy" sounds philosophical, but mechanically it's mundane: it's whether a conditional edge in the control flow is resolved by your if-statement or by the model's output. When the LLM returns a request to call refund_order(...) and your runtime routes on that, the model just steered the program. When your code says if intent == "refund":, you steered it. Same LLM — totally different system. You'll wire that exact branch by hand in Section 3.

The autonomy spectrum

These three aren't sharp boxes — they're points on a continuous line. As you move right, the program hands more of the "what next?" decision to the model: more flexibility and capability, but also more ways to go wrong, more cost, and harder debugging.

The autonomy spectrum: moving right hands more of the "what next?" decision from your code to the model.

Decision point · more autonomy is not always better

Further right is more powerful, not more correct. If a task's steps are always the same — "summarize this document" — an LLM app is cheaper, faster, and physically cannot go rogue. Reach for an agent only when you genuinely can't predict the steps in advance. We'll turn this into a rule of thumb in Lesson 1.4: prefer the least autonomy that solves the problem.

Common pitfall

"It uses an LLM, so it's an agent." Not even close. A summarizer calls a frontier model and is still just an LLM app — fixed flow, no decisions, no tools. And on the flip side, plenty of "AI agents" in the wild are really hard-coded workflows with a model sprinkled on top. Don't judge by the model; judge by who controls the path.

One refund, three systems

Theory sticks when you see it run. Let's route the same request — "I want a refund for order 4471" — through all three system types, in code. We'll use the same model everywhere (claude-sonnet-4-6 via LangChain's init_chat_model) so the only thing that changes is the architecture around it. Don't worry about understanding every LangGraph line yet — you'll build this loop from scratch in Section 3. For now, just watch who decides.

1 · The chatbot — pure rules, no model

# A rule-based chatbot: the CODE decides everything via keyword matching.
def chatbot(message: str) -> str:
    text = message.lower()
    if "refund" in text:
        return "I can help with refunds! Here's our return policy: northwind.co/returns"
    if "track" in text or "where" in text:
        return "Track your order here: northwind.co/track"
    return "Sorry, I didn't understand. Try rephrasing, or type 'agent' for a human."

print(chatbot("I want a refund for order 4471"))
# → "I can help with refunds! Here's our return policy: northwind.co/returns"

It saw the word refund and fired a canned branch. It never learned which order, whether it qualifies, or that the box arrived cracked. Zero autonomy — and zero real help.

2 · The LLM app — one model call, fixed flow

from langchain.chat_models import init_chat_model
from langchain.messages import SystemMessage, HumanMessage

model = init_chat_model("claude-sonnet-4-6", temperature=0)

def llm_app(message: str) -> str:
    """Fixed control flow: input → ONE model call → output. No tools, no loop."""
    messages = [
        SystemMessage("You are Aria, a friendly Northwind support assistant."),
        HumanMessage(message),
    ]
    return model.invoke(messages).content

print(llm_app("I want a refund for order 4471"))
# → "I'm so sorry order 4471 arrived damaged! Our policy allows refunds within
#    30 days of delivery. I'd be happy to help — could you confirm the order date?"

Far more human. But look closely: it asked the customer for the order date because it has no way to look it up. The flow is hard-wired — one call in, one reply out. It chose the words; it could not choose to act.

Under the hood

Notice there is no if branching and no loop in llm_app. Control enters at the top, makes exactly one model.invoke(...), and exits. That straight-line shape is the signature of an LLM app. The agent below adds the one thing that changes everything: a loop the model can keep re-entering until it's satisfied.

3 · The agent — a loop with tools, the model steers

Now we give the model tools and put it in a loop. We'll use LangGraph's prebuilt create_agent — the same ReAct-style loop you'll later build by hand. The model now decides, on its own, to look up the order and then issue the refund.

from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.agents import create_agent   # LangChain 1.0 prebuilt ReAct-style agent

model = init_chat_model("claude-sonnet-4-6", temperature=0)

# --- Tools: the agent's hands. The model can REQUEST these; it never runs them itself.
@tool
def get_order(order_id: str) -> str:
    """Look up an order's status, date, and amount by its ID."""
    db = {"4471": {"status": "delivered", "days_ago": 6, "amount_usd": 89}}
    if order_id not in db:
        return f"No order found with id {order_id}."
    o = db[order_id]
    return f"Order {order_id}: {o['status']} {o['days_ago']} days ago, ${o['amount_usd']}."

@tool
def refund_order(order_id: str, amount_usd: int) -> str:
    """Issue a refund for an order. Only call AFTER confirming it qualifies."""
    return f"Refund of ${amount_usd} issued for order {order_id}. ✅"

# --- Assemble the agent: model + tools + a system prompt, wrapped in a reasoning loop.
agent = create_agent(
    model,
    tools=[get_order, refund_order],
    prompt=(
        "You are Aria, a Northwind support agent. "
        "Refunds are allowed within 30 days of delivery. "
        "Always LOOK UP the order before deciding, and never refund an unqualified order."
    ),
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "I want a refund for order 4471"}]}
)
print(result["messages"][-1].content)
# → "All set! Order 4471 was delivered 6 days ago, so it's within our 30-day window.
#    I've issued a refund of $89 — it'll be back on your card in 3–5 days."

Here's the run, hop by hop — and notice you never wrote any of this sequence:

The agent looped model → tool → model → tool → model, deciding each hop. The "look up, then refund" sequence emerged — it was never coded.

Under the hood

The agent fired two tool calls in the right order — and you never wrote "first look up, then refund." That sequence emerged from the loop: the model asked for get_order, the result flowed back into its context, and only then, seeing the order was 6 days old, did it ask for refund_order. That emergent, model-chosen ordering is the line between an LLM app and an agent.

Common pitfall

Autonomy cuts both ways. Because the model decided to call refund_order, a badly-worded prompt or a confusing tool could make it refund an order that shouldn't qualify — and no if-statement of yours would stop it. That's exactly why later sections add prompting discipline (§2), careful tool design (§4.3), and a human-approval gate before risky actions (§4.6). With great autonomy comes great need for guardrails.

Place them on the spectrum

Your turn. Below are four real products. For each, decide where it sits — chatbot, LLM app, or agent — and, more importantly, write one sentence justifying it by answering the only question that matters: who decides what happens next?

An airline's "track my flight" phone menu — press 1 for status, 2 for baggage, say your confirmation code.
A "summarize this PDF" button in a note-taking app — you click it, you get three bullet points.
A coding assistant that, given "fix the failing test," reads the file, runs the test, edits the code, re-runs it, and repeats until it's green.
A customer-service bot that classifies your message, then — depending on the category — either answers from an FAQ or routes you to a human, always in that fixed order.

Try it

Write your four answers down before scrolling. Then add the example from this lesson's first "Try it" — your own most-recent AI product. Five placements total. The justifications matter more than the labels; if you can defend "who decides," you understand the spectrum.

How they actually place — and why it's a judgment call

Product	Placement	Who decides what happens next?
1 · Flight phone menu	Chatbot	The code — a fixed decision tree of menu options. No model, no flexibility.
2 · "Summarize this PDF"	LLM App	The model picks the words, but the flow is fixed: one click → one call → output. No tools, no loop.
3 · Test-fixing coding assistant	Agent	The model decides each step — read, run, edit, re-run — looping until the goal is met. Clear autonomy.
4 · Classify-then-route bot	LLM App (workflow)	The model classifies, but you hard-coded "FAQ or human, in that order." A fixed pipeline — a workflow, not an agent.

The tricky ones are #3 and #4, and they reveal the real boundary. Both feel multi-step. But #4's steps are fixed by you — the model only fills in a classification inside a path you laid down. #3's steps are chosen by the model at runtime; it might edit one file or five, re-run twice or ten times. That is the agent line: not "does it have multiple steps?" but "does the model decide the steps, or did you?"

Decision point · the placement is a judgment call

There's no universal cutoff, and reasonable engineers will disagree on borderline systems — that's fine and expected. What's not optional is being able to justify your placement by naming who holds the "what next?" decision. In interviews and design reviews, that articulation is the skill, not the label. (Lesson 1.4 sharpens the workflow-vs-agent line into a practical build rule.)

Under the hood

Real products often mix points on the spectrum: a coding agent (autonomous) might call a summarizer (an LLM app) as one of its tools, which in turn falls back to a regex (a chatbot-style rule). The spectrum classifies a system's control flow, not the whole product. Keep asking the question at each layer.

Takeaway

You can now do the thing this lesson promised:

Separate chatbot, LLM app, and agent by asking who decides what happens next.
Place any product on the assisted ↔ autonomous spectrum and justify the placement.
Spot the trap of calling something an "agent" just because it uses an LLM.

Where this goes next

We just said an agent "decides and acts." But to decide and act, it must have parts that do the deciding and the acting — a place to take input, something that reasons, a memory, tools, and a way to produce output. Every agent ever built, Aria included, shares the same handful of organs. Learn them once, and you'll see them everywhere.