It is not "what's the right architecture?"
The most common misconception about system design — and the reason most interview answers sound like a Wikipedia summary — is that it is a search for the correct architecture. It isn't. There is no correct architecture in the absence of constraints. You can't say "use Kafka" any more than a doctor can say "use ibuprofen" without first asking what hurts.
System design is the discipline of converting constraints into structure. That sentence will reappear in every chapter of this course. Hold it close.
The work, then, is in three movements:
- Name the constraints. What must the system do, at what scale, with what guarantees, under what failure modes, against what budget?
- Pick the binding ones. Most constraints fall away easily. A few become load-bearing — they determine the shape of the solution.
- Choose tradeoffs deliberately. Every binding constraint forces a sacrifice elsewhere. Architecture is the record of those choices.
Designing a feature vs designing a system
A useful sharpening question: what's the difference between designing a feature and designing a system?
When you design a feature, you ask: "How should this code behave?" When you design a system, you ask a different family of questions:
- How does it behave when a dependency is slow?
- How does it behave when traffic spikes 10×?
- How does it behave during a deploy, a region failover, a database upgrade?
- What's the blast radius of a single bad release?
- Who is paged when this breaks, and what do they see?
These are all questions about behavior in the presence of failure, scale, and time. Features live in the happy path; systems live in the long tail. That's why a single page of "happy-path" architecture means very little.
The three axes you're always trading against
Every system design conversation eventually collapses onto three axes. They aren't independent; pushing on one almost always bends the others. Naming them up front saves hours of arguing past each other.
1. Latency
How long does the user wait? But also — and this is where senior engineers part company with juniors — what's the tail? p50 latency is a marketing number. p99 latency is what the user actually experiences when something goes wrong. p999 is what shows up in your incident reports.
2. Throughput
How many operations per second can the system absorb? Throughput and latency are siblings, not synonyms. A system can have low latency at low load and high latency at high throughput — that curve, not a single number, is what you're designing.
3. Durability and availability
Durability is the probability that data you wrote survives. Availability is the probability that the system answers when asked. These also are not the same thing — a system can be durable but unavailable (S3 during a regional outage), or available but lossy (a cache).
…and the fourth axis nobody puts on the slide: cost
You can always buy your way out of a latency or throughput problem temporarily. You cannot buy your way out of bad design. Senior engineers think about cost — money, complexity, operational burden — as a first-class constraint, not an afterthought.
A 10× cheaper architecture that meets the requirements is almost always a better answer than a "more correct" architecture that doesn't. The right system is the simplest one that satisfies the binding constraints with margin to spare.
Functional vs non-functional, and why architects care about the second one
Functional requirements tell you what the system does: "users can post tweets," "drivers can be matched to riders." They tend to be the easy part — every system in a domain has roughly the same functional surface area.
Non-functional requirements tell you how it has to do those things: "with 50ms p99 latency at 200k QPS, 99.99% available, with read-after-write consistency for the author's own timeline."
The non-functional requirements are what differentiate Twitter from a wedding-photo website. They are also what dictate every interesting architectural choice. If you find yourself in a design conversation that's mostly about features, gently push it back toward NFRs — that's where the real design happens.
Designing at the right zoom level
Beginners draw at the wrong zoom. They either draw the system as a single box ("backend") or they draw it as twelve micro-services with REST contracts before anyone has agreed on what the system is supposed to do.
A useful discipline: start at level 1 — users, data, the world outside. Then descend to level 2 — major subsystems. Only descend to level 3 (services, queues, databases) once levels 1 and 2 are committed. You will catch enormous design errors at the higher levels that are nearly impossible to see once you're arguing about whether to use REST or gRPC.
Architectures are evolved, not chosen
One of the kindest things you can do for a junior engineer is to disabuse them of the idea that real systems were designed up front. They weren't. Twitter, Uber, Stripe, Netflix — all of them started as a single Ruby/Python/Node monolith on a single host. The architectures you read about today are the scar tissue of years of evolution under pressure.
That has two implications:
- You don't need to design for 100M users on day one. Designing for the load you have, plus a clear theory of where to cut next, is usually correct.
- The interesting case studies in this course aren't blueprints — they're histories. We look at why the system ended up this way, what they tried first, and what broke.
"Premature optimization is the root of all evil" applies to architecture too. The most expensive systems in the world are the ones designed for problems they never had.
A working rubric you can use today
When you walk into your next design review — or your next interview — try this rubric:
- State the goal in one sentence. "Let users post a tweet and see other people's tweets in <1s." That's it. Resist the urge to add adjectives.
- List the binding non-functional requirements. Three to five, no more. Each one is a number with a unit (200k QPS, p99 50ms, 99.99% available, 100PB storage).
- List the things you will explicitly not design for yet. "Multi-region. Compliance. Search relevance." Naming what's out of scope is half the architectural skill.
- Sketch level 1, then level 2. Stop. Ask if the binding constraints are met. Only then descend.
- For each big decision, name the alternatives you rejected and why. If you can't, you haven't made a decision yet — you've expressed a preference.
Key takeaways
- System design is converting constraints into structure, not searching for a correct architecture.
- Features live in the happy path; systems live in the long tail.
- You are always trading latency, throughput, durability/availability, and cost.
- Non-functional requirements dictate every interesting architectural choice.
- Design at the right zoom level; commit before descending.
- Architectures are evolved under pressure, not chosen up front.