
DATA 503: Fundamentals of Data Engineering
January 12, 2026
If you had to define data engineering in one sentence, what would you say?
Examples include:
Which role is primarily responsible for making raw data reliable and accessible for others?
A. Data Analyst
B. Data Scientist
C. Data Engineer
D. Product Manager
Answer: C
Michael wants a dashboard in 10 minutes.
Your job is to make those questions answerable without manual spreadsheet heroics.
Common failure modes:
Prompt:
Directions:
A pipeline is a repeatable path from sources to usable outputs.

Data engineering exists because:
Which “V” is usually hardest in your experience?
Raise a hand for:
A simple example:
Now “top customers” depends on spelling.
Also think about:
ETL:
ELT:
ETL is often:
ELT is often:
Batch:
Streaming:
Scenario:
Questions:
A pipeline is production when:

During break:
Relational databases remain a core tool because:
We will pretend we have these tables:
Columns:
Note
Primary Key (PK) is a unique identifier for each row in the table. More on this later.
| employee_id | full_name | role | branch | hire_date |
|---|---|---|---|---|
| 1 | Michael Scott | Regional Manager | Scranton | 1992-03-15 |
| 2 | Dwight Schrute | Assistant Regional Manager | Scranton | 1995-04-01 |
| 3 | Jim Halpert | Sales Representative | Scranton | 1999-08-01 |
| 4 | Pam Beesly | Receptionist | Scranton | 2000-01-03 |
| 5 | Stanley Hudson | Sales Representative | Scranton | 1990-09-10 |
| 6 | Phyllis Vance | Sales Representative | Scranton | 2000-02-14 |
| 7 | Kevin Malone | Accountant | Scranton | 1998-06-15 |
| 8 | Oscar Martinez | Accountant | Scranton | 1996-11-20 |
| 9 | Angela Martin | Head of Accounting | Scranton | 1994-05-05 |
| 10 | Creed Bratton | Quality Assurance | Scranton | 1993-12-01 |
Columns:
orders:
order_items:
Note
Foreign Key (FK) is a reference to a primary key in another table. More on this later.

In what table is the best place to add another address field so that we have both a billing and shipping address?
A. customers
B. orders
C. order_items
D. products
Answer: A
Note
BUT, we should ask ourselves if there is a better way to approach this problem.
Normalization is a way to reduce duplication.
Prompt:
Directions:
SQL lets you:
Today we focus on SELECT fundamentals.
All employees:
| employee_id | full_name | role | branch | hire_date |
|---|---|---|---|---|
| 1 | Michael Scott | Regional Manager | Scranton | 1992-03-15 |
| 2 | Dwight Schrute | Assistant Regional Manager | Scranton | 1995-04-01 |
| 3 | Jim Halpert | Sales Representative | Scranton | 1999-08-01 |
| 4 | Pam Beesly | Receptionist | Scranton | 2000-01-03 |
| 5 | Stanley Hudson | Sales Representative | Scranton | 1990-09-10 |
| 6 | Phyllis Vance | Sales Representative | Scranton | 2000-02-14 |
| 7 | Kevin Malone | Accountant | Scranton | 1998-06-15 |
| 8 | Oscar Martinez | Accountant | Scranton | 1996-11-20 |
| 9 | Angela Martin | Head of Accounting | Scranton | 1994-05-05 |
| 10 | Creed Bratton | Quality Assurance | Scranton | 1993-12-01 |
Note
SELECT * is a wildcard that selects all columns. It is not a good practice to use * in production queries. Instead, you should list the columns you need.
Only names and roles:
| full_name | role |
|---|---|
| Michael Scott | Regional Manager |
| Dwight Schrute | Assistant Regional Manager |
| Jim Halpert | Sales Representative |
| Pam Beesly | Receptionist |
| Stanley Hudson | Sales Representative |
| Phyllis Vance | Sales Representative |
| Kevin Malone | Accountant |
| Oscar Martinez | Accountant |
| Angela Martin | Head of Accounting |
| Creed Bratton | Quality Assurance |
Unique branches:
| branch |
|---|
| Scranton |
All Scranton employees:
| full_name | role |
|---|---|
| Michael Scott | Regional Manager |
| Dwight Schrute | Assistant Regional Manager |
| Jim Halpert | Sales Representative |
| Pam Beesly | Receptionist |
| Stanley Hudson | Sales Representative |
| Phyllis Vance | Sales Representative |
| Kevin Malone | Accountant |
| Oscar Martinez | Accountant |
| Angela Martin | Head of Accounting |
| Creed Bratton | Quality Assurance |
Newest hires first:
| full_name | hire_date |
|---|---|
| Phyllis Vance | 2000-02-14 |
| Pam Beesly | 2000-01-03 |
| Jim Halpert | 1999-08-01 |
| Kevin Malone | 1998-06-15 |
| Oscar Martinez | 1996-11-20 |
| Dwight Schrute | 1995-04-01 |
| Angela Martin | 1994-05-05 |
| Creed Bratton | 1993-12-01 |
| Michael Scott | 1992-03-15 |
| Stanley Hudson | 1990-09-10 |
Top 5 newest hires:
| full_name | hire_date |
|---|---|
| Phyllis Vance | 2000-02-14 |
| Pam Beesly | 2000-01-03 |
| Jim Halpert | 1999-08-01 |
| Kevin Malone | 1998-06-15 |
| Oscar Martinez | 1996-11-20 |
Question:
We define line value as:
If you filter rows, which clause do you use?
A. FROM
B. WHERE
C. ORDER BY
D. LIMIT
Answer: B
We are going to build a single SELECT statement for a given table.
Table:
Columns:
Write a query to list:
Conditions:
Output:
Limit:
Directions:
Modify your query to break ties by episode_number ascending.
Most real questions require combining tables.
Example:

What is the main purpose of a foreign key?
A. Make queries faster
B. Guarantee a relationship points to an existing row
C. Store text efficiently
D. Replace the need for indexes
Answer: B
Write down:
Send me your answers on Canvas on the Week 1 Participation Activity.