DATA 351: Data Management with SQL
Before we begin, load the database schema and data:
From command line (import the schema and data using the files from Canvas for Assignment 3):
Run these queries to confirm your database is ready:
If you see these counts, you’re ready to go!
┌─────────────────────────────────────────────────────┐
│ JOIN Types │
├─────────────────────────────────────────────────────┤
│ INNER JOIN │ Only matching rows from both tables │
│ LEFT JOIN │ All from left + matches from right │
│ RIGHT JOIN │ All from right + matches from left │
│ FULL JOIN │ All rows from both tables │
└─────────────────────────────────────────────────────┘
Today we’ll practice INNER JOIN and LEFT JOIN.
Task: Display film titles with the language name.
Requirements:
title from filmname as language_name from languagelanguage_idtitle, limit 10
Step 1: Identify the tables and their relationship
┌──────────────────┐ ┌──────────────────┐
│ film │ │ language │
├──────────────────┤ ├──────────────────┤
│ film_id (PK) │ │ language_id (PK) │
│ title │ │ name │
│ language_id (FK) │────────▶│ │
└──────────────────┘ └──────────────────┘
The language_id in film references the language table.
INNER JOIN keeps only rows where both tables match:
film language
┌────┬───────────┬────┐ ┌────┬─────────┐
│ id │ title │lang│ │ id │ name │
├────┼───────────┼────┤ ├────┼─────────┤
│ 1 │ Ace Gold │ 1 │──┐ │ 1 │ English │
│ 2 │ Academy │ 1 │──┼───▶│ 2 │ French │
│ 3 │ Affair │ 1 │──┘ └────┴─────────┘
└────┴───────────┴────┘
Result: title + language.name
Key Points:
JOIN (same as INNER JOIN)f for film, l for languageON clause specifies the matching conditionProblem 1 output screenshot
Task: Show customer names with their city.
Requirements:
first_name and last_name as customer_namecity from the city tablecustomer → address → citylast_name, then first_name, limit 10
Step 1: Map out the path between tables
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ customer │ │ address │ │ city │
├──────────────┤ ├──────────────┤ ├──────────────┤
│ customer_id │ │ address_id │ │ city_id (PK) │
│ first_name │ │ address │ │ city │
│ last_name │ │ city_id (FK) │─────▶│ │
│ address_id │─────▶│ │ └──────────────┘
└──────────────┘ └──────────────┘
We need two JOINs to connect customer to city.
Think of it as stepping stones:
Step 1: Start at customer
customer ─────┐
│ JOIN on address_id
Step 2: Land on address
address ──────┤
│ JOIN on city_id
Step 3: Arrive at city|
city ─────────┘
Each JOIN connects to the next table in the chain.
Key Points:
|| operator with ' ' spaceci for city (avoid conflict with c for customer)Problem 2 output screenshot
Task: Count how many rentals each customer has made.
Requirements:
customer_namerental_countcustomer to rental
Concept: Combine rows first, then aggregate
customer rental
┌────┬──────────┐ ┌────────┬──────────┐
│ id │ name │ │rent_id │ cust_id │
├────┼──────────┤ ├────────┼──────────┤
│ 1 │ Mary │◀───│ 101 │ 1 │
│ │ │◀───│ 102 │ 1 │
│ │ │◀───│ 103 │ 1 │
│ 2 │ John │◀───│ 104 │ 2 │
└────┴──────────┘ └────────┴──────────┘
After JOIN: Mary appears 3 times, John appears 1 time
After GROUP BY + COUNT: Mary=3, John=1
Important: When using aggregates, all non-aggregated columns must be in GROUP BY.
SELECT name, COUNT(*) ← name is not aggregated
FROM ...
GROUP BY name ← so name must be here
For customer_id + first_name + last_name:
Key Points:
COUNT(r.rental_id) counts matching rentalsORDER BY rental_count DESC puts highest firstProblem 3 output screenshot
Task: Find films that have never been rented.
Requirements:
film_id and titlefilm → inventory → rentalrental_id IS NULLtitle
LEFT JOIN keeps all rows from the left table:
film inventory
┌────┬───────────┐ ┌─────┬────────┐
│ id │ title │ │inv_id│film_id│
├────┼───────────┤ ├─────┼────────┤
│ 1 │ Ace Gold │◀──────│ 101 │ 1 │
│ 2 │ Academy │◀──────│ 102 │ 2 │
│ 3 │ Affair │ └─────┴────────┘
└────┴───────────┘
LEFT JOIN Result:
┌────┬───────────┬─────────┐
│ id │ title │ inv_id │
├────┼───────────┼─────────┤
│ 1 │ Ace Gold │ 101 │ ← has inventory
│ 2 │ Academy │ 102 │ ← has inventory
│ 3 │ Affair │ NULL │ ← NO inventory!
└────┴───────────┴─────────┘
The “anti-join” pattern:
LEFT JOIN + WHERE right_side IS NULL
= rows with NO match
┌────────────────────────────────────────────┐
│ LEFT JOIN keeps unmatched rows │
│ WHERE ... IS NULL filters TO them │
│ │
│ Result: Only rows that DON'T have │
│ a matching record │
└────────────────────────────────────────────┘
This is how we find “films without rentals.”
film ──LEFT JOIN──▶ inventory ──LEFT JOIN──▶ rental
│
▼
WHERE rental_id IS NULL
A film might have inventory but never been rented, so we need to check all the way to rental.
Key Points:
LEFT JOIN preserves all filmsIS NULL filter finds the unmatched rowsProblem 4 output screenshot
Task: Calculate total revenue for each film category.
Requirements:
nameSUM(amount) as total_revenueCOUNT(rental_id) as rental_count
Map the complete path from payment to category:
payment ──▶ rental ──▶ inventory ──▶ film ──▶ film_category ──▶ category
│ │ │ │ │ │
└──rental_id │ │ │ │
└──inventory_id │ │ │
└──film_id─┴──film_id───┘ │
└──category_id──┘
Each arrow = one JOIN operation.
┌─────────┐ ┌─────────┐ ┌────────────┐ ┌───────┐
│ payment │──▶│ rental │──▶│ inventory │──▶│ film │
│ │ │ │ │ │ │ │
│ amount │ │rental_id│ │inventory_id| │film_id|
│rental_id│ │ │ │film_id │ │ │
└─────────┘ └─────────┘ └────────────┘ └──┬────┘
│
┌───────────────────┘
▼
┌──────────────┐ ┌──────────┐
│film_category │──▶│ category │
│ │ │ │
│film_id │ │ name │
│category_id │ │ │
└──────────────┘ └──────────┘
-- Start with payment (has the money!)
FROM payment p
-- Connect to rental to know what was rented
JOIN rental r ON p.rental_id = r.rental_id
-- Connect to inventory to know which copy
JOIN inventory i ON r.inventory_id = i.inventory_id
-- Connect to film to know which movie
JOIN film f ON i.film_id = f.film_id
-- Connect to junction table for categories
JOIN film_category fc ON f.film_id = fc.film_id
-- Finally get the category name
JOIN category cat ON fc.category_id = cat.category_idSELECT cat.name,
SUM(p.amount) AS total_revenue,
COUNT(r.rental_id) AS rental_count
FROM payment p
JOIN rental r ON p.rental_id = r.rental_id
JOIN inventory i ON r.inventory_id = i.inventory_id
JOIN film f ON i.film_id = f.film_id
JOIN film_category fc ON f.film_id = fc.film_id
JOIN category cat ON fc.category_id = cat.category_id
GROUP BY cat.name
ORDER BY total_revenue DESC;Why start from payment?
amount we need to SUMJunction table pattern:
film_category connects films to categories (many-to-many)film_id AND category_id as composite keyProblem 5 output screenshot
| Pattern | Use Case |
|---|---|
| Simple JOIN | Connect 2 related tables |
| Join Chain | Navigate through multiple tables |
| JOIN + GROUP BY | Aggregate across relationships |
| LEFT JOIN + IS NULL | Find unmatched records |
Take time to practice these patterns.
Join mastery = SQL mastery!