
DATA 503: Fundamentals of Data Engineering
January 26, 2026
Blockbuster Bend is the final video rental store. Today we connect data across the store’s database.
Run these commands to load the data into PostgreSQL:
The ERD shows how tables connect through primary and foreign keys.
Crows-Foot Notation
The Blockbuster Bend database contains several interconnected table groups:
film - Movie titles and detailslanguage - Available languagescategory - Film genres (Action, Comedy, etc.)film_category - Links films to categoriesactor - Actor namesfilm_actor - Links actors to filmscustomer - Customer informationstaff - Employee recordsinventory - Physical copies of filmsrental - Rental transactionspayment - Payment recordsstore - Store locationsaddress - Street addressescity - City namescountry - Country namesMost real questions span multiple tables:
Joins let us answer these questions by connecting tables.
Consider these two tables from our database:
| film_id | title | language_id |
|---|---|---|
| 1 | ACADEMY DINOSAUR | 1 |
| 2 | ACE GOLDFINGER | 1 |
| 3 | ADAPTATION HOLES | 2 |
| language_id | name |
|---|---|
| 1 | English |
| 2 | Italian |
| 3 | Japanese |
| 4 | Mandarin |
| 5 | French |
| 6 | German |
How do we see film titles with their language names in a single result?

Primary Key: Uniquely identifies each row in a table (e.g., language_id in language)
Foreign Key: References a primary key in another table (e.g., language_id in film)
An inner join returns only rows where the join condition is satisfied in both tables.
Only matching rows are returned
language_idLet’s trace through an inner join step by step:
| film_id | title | language_id |
|---|---|---|
| 1 | ACADEMY DINOSAUR | 1 |
| 2 | ACE GOLDFINGER | 1 |
| 3 | ADAPTATION HOLES | 2 |
| 4 | AFFAIR PREJUDICE | 6 |
| 5 | AFRICAN EGG | 4 |
| language_id | name |
|---|---|
| 1 | English |
| 2 | Italian |
| 3 | Japanese |
| 4 | Mandarin |
| 5 | French |
| 6 | German |
The database compares each film’s language_id to the language table:
| title | name |
|---|---|
| ACADEMY DINOSAUR | English |
| ACE GOLDFINGER | English |
| ADAPTATION HOLES | Italian |
| AFFAIR PREJUDICE | German |
| AFRICAN EGG | Mandarin |
Note: Japanese and French have no films, so they do not appear.
SQL Query Structure
title | language
---------------------+----------
ACADEMY DINOSAUR | English
ACE GOLDFINGER | English
ADAPTATION HOLES | Italian
AFFAIR PREJUDICE | German
AFRICAN EGG | Mandarin
JOIN is shorthand for INNER JOINAS f and AS l create table aliasesON specifies the join conditionWithout aliases, queries become verbose and harder to read:
title | release_year | language
-------------------+--------------+----------
ACADEMY DINOSAUR | 2012 | English
AGENT TRUMAN | 2010 | English
ALASKA PHANTOM | 2016 | English
Films connect to categories through the film_category bridge table:

| title | category |
| ---------------- | -------- |
| ACE GOLDFINGER | Action |
| ADAPTATION HOLES | Action |
| AIRPLANE SIERRA | Action |
| ALASKA PHANTOM | Action |
| ANGELS LIFE | Action |
film to film_categoryfilm_category to categoryThe film_actor bridge table connects films and actors:
| title | first_name | last_name |
| ---------------- | ---------- | --------- |
| ACADEMY DINOSAUR | JOHNNY | CAGE |
| ACADEMY DINOSAUR | ROCK | DUKAKIS |
| ACADEMY DINOSAUR | CHRISTIAN | GABLE |
| ACADEMY DINOSAUR | PENELOPE | GUINESS |
| ACADEMY DINOSAUR | MARY | KEITEL |
| ACADEMY DINOSAUR | OPRAH | KILMER |
| ACADEMY DINOSAUR | WARREN | NOLTE |
| ACADEMY DINOSAUR | SANDRA | PECK |
| ACADEMY DINOSAUR | MENA | TEMPLE |
| ACADEMY DINOSAUR | LUCILLE | TRACY |

Write a query that returns:
customer_idfirst_namelast_namecityJoin customer to address to city. Order by last_name, then first_name. Limit to 5 rows.
| customer_id | first_name | last_name | city |
| ----------- | ---------- | --------- | ----------------------- |
| 505 | RAFAEL | ABNEY | Talavera |
| 504 | NATHANIEL | ADAM | Joliet |
| 36 | KATHLEEN | ADAMS | Arak |
| 96 | DIANA | ALEXANDER | Augusta-Richmond County |
| 470 | GORDON | ALLARD | Hodeida |
Use ON for join conditions and WHERE for row filters:
| title | rating | category |
| ---------------- | ------ | -------- |
| ALI FOREVER | PG | Comedy |
| BLACKOUT PRIVATE | PG | Comedy |
| CAROL TEXAS | PG | Comedy |
| CHARADE DUFFEL | PG | Comedy |
| DISCIPLE MOTHER | PG | Comedy |
ON defines how tables relateWHERE filters the joined resultON can produce unexpected results with outer joinsInner joins exclude rows without matches. Sometimes we need to see unmatched rows:
Outer joins preserve unmatched rows.
A left join keeps all rows from the left table, even without matches.
All left table rows returned
NULL for right table columnsLet’s say that we wish to list all films that we do not have a copy of in our inventory. In other words, we want to find all films that are not in the inventory table.
film (left table)
| film_id | title |
|---|---|
| 1 | ACADEMY DINOSAUR |
| 2 | ACE GOLDFINGER |
| 3 | ADAPTATION HOLES |
… (1000 rows)
inventory (right table)
| inventory_id | film_id |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
… (4581 rows)
| film_id | title | inventory_id |
|---|---|---|
| 14 | ALICE FANTASIA | NULL |
| 33 | APOLLO TEEN | NULL |
| 36 | ARGONAUTS TOWN | NULL |
| 38 | ARK RIDGEMONT | NULL |
… (42 rows)
LEFT JOIN keeps all filmsNULL valuesWHERE i.inventory_id IS NULL filters to only unmatched rowsA right join keeps all rows from the right table, even without matches.
All right table rows returned
A full outer join keeps all rows from both tables.
All rows from both tables
Table A
| id | value_a |
|---|---|
| 1 | Apple |
| 2 | Banana |
| 3 | Cherry |
Table B
| id | value_b |
|---|---|
| 2 | Two |
| 3 | Three |
| 4 | Four |
Table A
| id | value_a |
|---|---|
| 1 | Apple |
| 2 | Banana |
| 3 | Cherry |
Table B
| id | value_b |
|---|---|
| 2 | Two |
| 3 | Three |
| 4 | Four |
| a.id | value_a | b.id | value_b |
|---|---|---|---|
| 1 | Apple | NULL | NULL |
| 2 | Banana | 2 | Two |
| 3 | Cherry | 3 | Three |
| NULL | NULL | 4 | Four |
| Join Type | Left Table | Right Table | Use Case |
|---|---|---|---|
| INNER JOIN | Only matched | Only matched | Standard queries |
| LEFT JOIN | All rows | Only matched | Find unmatched in right |
| RIGHT JOIN | Only matched | All rows | Find unmatched in left |
| FULL OUTER JOIN | All rows | All rows | Find all unmatched |
Find inventory items that have never been rented.
Return:
inventory_idfilm_idtitlestore_idOrder by store_id, then inventory_id. Limit to 5 rows.
inventory_id | film_id | title | store_id
-------------+---------+-------------------+----------
1 | 1 | ACADEMY DINOSAUR | 1
2 | 1 | ACADEMY DINOSAUR | 1
...
A cross join (Cartesian product) returns every combination of rows from both tables.
No join condition
store
| store_id |
|---|
| 1 |
| 2 |
category (partial)
| category_id | name |
|---|---|
| 1 | Action |
| 5 | Comedy |
| 7 | Drama |
| store_id | name |
|---|---|
| 1 | Action |
| 1 | Comedy |
| 1 | Drama |
| 2 | Action |
| 2 | Comedy |
| 2 | Drama |
Every store paired with every category (2 x 3 = 6 rows).
Generate a planning grid for all store-category combinations:
store_id | category | planned_inventory
---------+------------+------------------
1 | Action | 0
1 | Animation | 0
1 | Children | 0
1 | Classics | 0
1 | Comedy | 0
1 | Documentary| 0
1 | Drama | 0
1 | Family | 0
1 | Foreign | 0
1 | Games | 0
Cross joins can create enormous result sets:
| Table A Rows | Table B Rows | Result Rows |
|---|---|---|
| 100 | 100 | 10,000 |
| 1,000 | 1,000 | 1,000,000 |
| 10,000 | 10,000 | 100,000,000 |
Always use WHERE or LIMIT when exploring cross joins.
A self join joins a table to itself. This is useful when:

customer_1 | first_1 | last_name | customer_2 | first_2
-----------+-----------+-----------+------------+---------
318 | BRIAN | WYMAN | 412 | JOHN
...
c1 and c2c1.customer_id < c2.customer_id prevents duplicate pairsSelf joins work well for parent-child relationships:
Tracking a rental requires joining multiple tables:

SELECT
c.first_name || ' ' || c.last_name AS customer,
f.title,
r.rental_date::date AS rented,
r.return_date::date AS returned,
p.amount
FROM customer AS c
JOIN rental AS r
ON c.customer_id = r.customer_id
JOIN inventory AS i
ON r.inventory_id = i.inventory_id
JOIN film AS f
ON i.film_id = f.film_id
JOIN payment AS p
ON r.rental_id = p.rental_id
ORDER BY r.rental_date DESC
LIMIT 5;customer | title | rented | returned | amount
------------------+--------------------+------------+------------+-------
AUSTIN CINTRON | SOMETHING DUCK | 2022-07-27 | 2022-08-02 | 4.99
AUSTIN CINTRON | TITANS JERK | 2022-07-27 | 2022-08-01 | 4.99
AUSTIN CINTRON | SUNRISE LEAGUE | 2022-07-27 | 2022-07-28 | 2.99
...
customer to rental via customer_idrental to inventory via inventory_idinventory to film via film_idrental to payment via rental_idSELECT
a.first_name,
a.last_name,
f.title,
f.release_year,
c.name AS category
FROM actor AS a
JOIN film_actor AS fa
ON a.actor_id = fa.actor_id
JOIN film AS f
ON fa.film_id = f.film_id
JOIN film_category AS fc
ON f.film_id = fc.film_id
JOIN category AS c
ON fc.category_id = c.category_id
WHERE a.last_name = 'GUINESS'
ORDER BY f.release_year, f.title;first_name | last_name | title | release_year | category
-----------+-----------+--------------------+--------------+----------
PENELOPE | GUINESS | ACADEMY DINOSAUR | 2012 | Documentary
PENELOPE | GUINESS | ANACONDA CONFESSIONS| 2020 | Animation
...
Five tables joined through their foreign key relationships.
Sometimes joins need multiple columns to match correctly:
payment_id | customer_id | rental_id | amount | rental_date
-----------+-------------+-----------+--------+------------
17503 | 1 | 76 | 2.99 | 2022-05-25
17504 | 1 | 573 | 0.99 | 2022-05-28
17505 | 1 | 1185 | 5.99 | 2022-06-15
...
When two tables have the same column name:
ERROR: column reference "customer_id" is ambiguous
Forgetting the ON clause creates a cross join:
Every film paired with every category (1000 x 16 = 16,000 rows).
Adding more tables can multiply result sizes:

Prevention:
COUNT(*) before SELECT *LIMIT during developmentBefore running a complex join:
LIMIT first?For each store, find total revenue by film category.
Return:
store_idcategorytotal_revenueOrder by store_id, then total_revenue descending.
SELECT
i.store_id,
c.name AS category,
SUM(p.amount) AS total_revenue
FROM payment AS p
JOIN rental AS r ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
JOIN film AS f ON i.film_id = f.film_id
JOIN film_category AS fc ON f.film_id = fc.film_id
JOIN category AS c ON fc.category_id = c.category_id
GROUP BY i.store_id, c.name
ORDER BY i.store_id, total_revenue DESC;store_id | category | total_revenue
---------+-------------+--------------
1 | Sports | 4892.19
1 | Sci-Fi | 4756.98
1 | Animation | 4656.30
...
Find any actors who have no films in the database.
Return:
actor_idfirst_namelast_nameOrder by last_name, first_name.
actor_id | first_name | last_name
---------+------------+----------
(0 rows - all actors have films in this database)
Create a rental history for customer MARY SMITH (customer_id = 1).
Return:
rental_datetitlecategoryamountOrder by rental_date descending. Limit to 10 rows.
SELECT
r.rental_date::date,
f.title,
c.name AS category,
p.amount
FROM customer AS cu
JOIN rental AS r ON cu.customer_id = r.customer_id
JOIN payment AS p ON r.rental_id = p.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
JOIN film AS f ON i.film_id = f.film_id
JOIN film_category AS fc ON f.film_id = fc.film_id
JOIN category AS c ON fc.category_id = c.category_id
WHERE cu.customer_id = 1
ORDER BY r.rental_date DESC
LIMIT 10;| Join Type | Returns | NULL Handling |
|---|---|---|
| INNER JOIN | Only matching rows | No NULLs from join |
| LEFT JOIN | All left + matched right | NULLs for unmatched right |
| RIGHT JOIN | All right + matched left | NULLs for unmatched left |
| FULL OUTER JOIN | All rows from both | NULLs for unmatched on both sides |
| CROSS JOIN | All combinations | No join condition |
| SELF JOIN | Table joined to itself | Depends on join type used |
Write a query that answers:
Which films were rented in 2022 by customers from store 1?
Return the customer name, film title, and rental date.
Be ready to share your join path and key columns.