It's a Tuesday. A junior data engineer opens their laptop: a pipeline failed overnight, so the morning dashboard is stale. They read the error log, find a malformed file from a supplier, patch the load, and rerun it — dashboard's green by 9:30. Then they write some SQL to answer an analyst's question, review a teammate's code, and ship a small new table. That's the job. Let's map out the skills that get you to that Tuesday.
The five skill pillars
Job ads vary wildly in wording but ask for the same handful of skills underneath. Picture them as five pillars holding up your career — and notice they have a natural order, because each one leans on the one before it.
- SQL. The non-negotiable foundation. You'll query, filter, join, and aggregate data all day. Roughly Sections 4–6. If you learn nothing else well, learn this.
- Data modeling. Designing how tables relate — keys, normalization, star schemas — so data is clean and efficient to query. Sections 7 and 10.
- Python. The glue for automation, file handling, and moving data the SQL alone can't. Section 8 onward.
- Pipelines & ETL/ELT. Building the automated movers that run on a schedule and recover from failure. Sections 9 and 13.
- Warehousing & orchestration. Storing analysis-ready data at scale and coordinating the jobs that fill it. Sections 10 and 13.
The order matters more than it looks. You can't model data you can't query, can't automate a transformation you can't write, can't orchestrate pipelines you haven't built. This is precisely why a beginner who rushes off to learn Spark or Kafka before they're fluent in SQL ends up frustrated — they're trying to stack the roof before the walls. The course is sequenced to let your skills compound: each section assumes the last, so by the capstone you're combining all five pillars without thinking about it.
Learning to cook. SQL is knife skills — boring to drill, but everything depends on them. Modeling is understanding ingredients and how they combine. Python is using the whole kitchen. Pipelines are running service night after night without burning anything. You don't open a restaurant on day one; you build the fundamentals until the advanced stuff feels easy.
The three roles you'll see in job ads
"Data engineer" is an umbrella, and nearby titles overlap. Here's how to read them so a job posting tells you what the day-to-day really is:
- Data Engineer (DE). Owns the plumbing end-to-end: ingestion, pipelines, the warehouse, reliability. Heaviest on Python, pipelines, and orchestration. This is the spine of the course.
- Analytics Engineer (AE). A newer, hugely in-demand role that sits between DE and analyst: takes raw warehouse data and builds clean, documented, reusable models (often with dbt) for analysts to use. Heavy on SQL and modeling. That's Section 11.
- BI Developer / Analyst. Lives in the dashboards and metrics layer, turning modeled data into reports decision-makers read. Owns the last box of the stack. Sections 11–12 touch this.
You don't have to pick today. The beauty of starting with the data engineer skill set is that it's the broadest foundation — from here you can slide toward analytics engineering, lean into platform/infrastructure work, or specialize later. Right now your only job is to set realistic expectations: you are at the very beginning, and comparing your week-one self to a senior engineer with five years of incidents under their belt is a guaranteed way to feel bad about steady, real progress.
Walkthrough: ShopStream
Let's make the roadmap concrete by mapping ShopStream tasks to the pillar — and the course section — each one exercises. The query below is the kind a hiring manager loves to see: it touches joins, aggregation, and filtering, all skills you'll have by Section 6.
-- A "junior DE on a Tuesday" query: which channels drove the most completed revenue?
SELECT o.channel,
COUNT(DISTINCT o.order_id) AS orders,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY o.channel
ORDER BY revenue DESC;| ShopStream task | Pillar | Course section |
|---|---|---|
| Query revenue by channel (above) | SQL | Sections 4–6 |
| Design keys linking orders → customers | Data modeling | Section 7 |
| Script the nightly CSV import | Python | Section 8 |
| Automate the load on a schedule | Pipelines / ETL | Sections 9, 13 |
| Build the analysis-ready warehouse | Warehousing | Section 10 |
| Model clean metrics for analysts | Analytics engineering | Section 11 |
Self-assess. For each of the five pillars (SQL, modeling, Python, pipelines, warehousing), rate yourself 0–3 today and jot one sentence on why. Save it somewhere you'll find later.
Expected: a short honest snapshot — likely lots of 0s and 1s now. Revisit it after the capstone; the jump is the whole point.
Show solution
SQL 1 — can SELECT/FROM, shaky on joins
Modeling 0 — don't know what a key is yet
Python 1 — basic scripts, never used pandas
Pipelines 0 — never built one
Warehousing 0 — heard the word, that's it
=> Starting point logged 2026-06-10. Revisit after Section 14.
There's no wrong answer — this is a baseline, not a test. The win is being able to measure your own growth.
The fastest-rising title of the last few years is Analytics Engineer — companies discovered that the gap between "raw data in a warehouse" and "metrics an analyst trusts" is huge and needs a dedicated owner. It's a great on-ramp: heavy on the SQL and modeling you're about to learn, lighter on infrastructure.
- Chasing Spark/Kafka before SQL is solid. Flashy tools impress nobody if you can't write a clean join. Build the foundation first.
- Comparing your week-1 self to a senior. They have years of broken pipelines behind them. Measure yourself against last week, not against them.
- Collecting tools instead of skills. Knowing ten tool names is worth less than deeply understanding one pillar you can demonstrate in a project.
Five pillars — SQL, modeling, Python, pipelines, warehousing — compound in order. Master them in sequence and the senior-level work stops looking like magic.