
DATA 351: Data Management with SQL
January 26, 2026
PostgreSQL can do math! Let’s start simple:
| ?column? |
|---|
| 4 |
No table needed. SQL evaluates the expression and returns the result.
| Operator | Operation | Example | Result |
|---|---|---|---|
+ |
Addition | 5 + 3 |
8 |
- |
Subtraction | 10 - 4 |
6 |
* |
Multiplication | 6 * 7 |
42 |
/ |
Division | 15 / 4 |
3 |
% |
Modulo (remainder) | 15 % 4 |
3 |
Wait, what? 👀
15 / 4 = 3? d Yes! Integer division truncates. 🤯
Note
Rule: If both operands are integers, result is integer.
Make at least one operand a decimal to get decimal results.
Calculate annual bonus (10% of salary):
| full_name | salary_usd | annual_bonus |
|---|---|---|
| Michael Scott | 75000.00 | 7500.0000 |
| Dwight Schrute | 62000.00 | 6200.0000 |
| Pam Beesly | 42000.00 | 4200.0000 |
The formula: ((new - old) / old) * 100
Example: If salary goes from $50,000 to $55,000:
| pct_increase |
|---|
| 10.0 |
A 10% raise. (Nice!)
How much does each employee earn per year of experience?
| full_name | salary_usd | years_experience | salary_per_year |
|---|---|---|---|
| Pam Beesly | 42000.00 | 8 | 5250.00 |
| Jim Halpert | 61000.00 | 10 | 6100.00 |
| Dwight Schrute | 62000.00 | 12 | 5166.67 |
ROUND(value, decimal_places) rounds to specified precision:
Note
Always ROUND money calculations for clean output!
ABS() returns the absolute value (distance from zero):
Use case: Finding differences regardless of direction:
Let’s find how far each employee’s salary is from $55,000:
| full_name | salary_usd | difference | absolute_difference |
|---|---|---|---|
| Michael Scott | 75000.00 | 20000.00 | 20000.00 |
| Pam Beesly | 42000.00 | -13000.00 | 13000.00 |
| Dwight Schrute | 62000.00 | 7000.00 | 7000.00 |
| Function | Description | Example | Result |
|---|---|---|---|
^ |
Exponentiation | 2 ^ 3 |
8 |
\|/ |
Square root | \|/ 16 |
4 |
\|\|/ |
Cube root | \|\|/ 27 |
3 |
SQRT() |
Square root (function) | SQRT(16) |
4 |
Sales employees have a commission rate. Calculate their potential commission on a $10,000 sale:
Hint: Commission = sale_amount * commission_rate
Take 2 minutes.
| full_name | commission_rate | commission_on_10k |
|---|---|---|
| Dwight Schrute | 0.08 | 800.00 |
| Jim Halpert | 0.07 | 700.00 |
| Stanley Hudson | 0.05 | 500.00 |
Dwight’s higher rate reflects his #1 salesman status. Obviously.
Aggregate functions compute a single result from multiple rows.

| Function | Description | Example |
|---|---|---|
SUM() |
Total of all values | Total payroll |
AVG() |
Average (mean) | Average salary |
COUNT() |
Number of rows | How many employees |
MIN() |
Smallest value | Lowest salary |
MAX() |
Largest value | Highest salary |
SUM(): Total ValuesWhat is our total payroll?
| total_payroll |
|---|
| 452000.00 |
We spend $452,000 on salaries. (Michael probably thinks he deserves half.)
AVG(): Calculate the MeanWhat is the average salary?
| avg_salary |
|---|
| 56500.000000 |
That is a lot of decimal places. Let’s fix that:
| avg_salary |
|---|
| 56500.00 |
Three ways to count:
| count(*) | count(commission) | count(distinct dept) |
|---|---|---|
| 8 | 3 | 4 |
Only 3 employees have commission rates!
Salary range:
| lowest_salary | highest_salary | salary_range |
|---|---|---|
| 42000.00 | 75000.00 | 33000.00 |
Pam makes the least, Michael makes the most. Shocking.
You can calculate several aggregates in one query:
| num_employees | total_payroll | avg_salary | min_salary | max_salary |
|---|---|---|---|---|
| 8 | 452000.00 | 56500.00 | 42000.00 | 75000.00 |
What if we want average salary BY DEPARTMENT?
| department | num_employees | avg_salary |
|---|---|---|
| Management | 1 | 75000.00 |
| Sales | 3 | 60333.33 |
| Accounting | 3 | 51333.33 |
| Reception | 1 | 42000.00 |

GROUP BY splits data into buckets, then aggregates each bucket separately.
Show only departments with more than 1 employee:
| department | num_employees | avg_salary |
|---|---|---|
| Sales | 3 | 60333.33 |
| Accounting | 3 | 51333.33 |
WHERE filters rows BEFORE grouping. HAVING filters AFTER grouping.
Write a query that shows for each department:
Only include departments where average performance rating > 3.5
Take 4 minutes.
| department | num_employees | total_salary | avg_rating |
|---|---|---|---|
| Accounting | 3 | 154000.00 | 3.8 |
| Sales | 3 | 181000.00 | 3.6 |
Management and Reception did not make the cut!
When we return: Statistical functions and percentiles!
Up next: Finding the median (and why it matters more than average).
Pop quiz: A company has 5 employees with these salaries:
$40,000, $42,000, $45,000, $48,000, $500,000
What is the average salary?
Is $135,000 a good representation of “typical” salary? No!
The CEO’s salary skews the average dramatically.
The median is the middle value when data is sorted.
$40,000, $42,000, $45,000, $48,000, $500,000
The median is $45,000, which better represents “typical.”
Tip
When to use which:
PostgreSQL does not have a built-in MEDIAN() function, but we can use:
| median_salary |
|---|
| 55000 |
The 0.5 means “50th percentile” which is the median.
Let’s break this down:
| Part | Meaning |
|---|---|
percentile_cont(0.5) |
Find the 50th percentile (median) |
WITHIN GROUP |
Required syntax for ordered-set aggregates |
ORDER BY salary_usd |
Which column to calculate percentile on |
Compare median and average for our employee salaries:
| mean_salary | median_salary |
|---|---|
| 56500.00 | 55000 |
Pretty close! Our data does not have extreme outliers.
A percentile tells you what percentage of values fall below a point.
If you score in the 90th percentile on a test, you beat 90% of test-takers.
Quartiles divide data into four equal parts:
| q1 | q2_median | q3 |
|---|---|---|
| 49000 | 55000 | 61250 |
Instead of separate function calls, use an array:
| quartiles |
|---|
| {49000,55000,61250} |
The result is an array. Curly braces indicate array values.
Two versions exist:
| Function | Behavior | Best For |
|---|---|---|
percentile_cont |
Interpolates between values | Continuous data |
percentile_disc |
Returns actual value from data | Discrete data |
Tip
For salaries, percentile_cont is usually more appropriate.
Write a query that shows:
Use a single percentile_cont call with an array.
Take 2 minutes.
| salary_percentiles |
|---|
| {43400,55000,69550} |
Interpretation:
PostgreSQL makes date math intuitive:
| days_between |
|---|
| 365 |
Subtracting dates gives you the number of days between them.
How long has each employee been with the company?
| full_name | hire_date | today | days_employed |
|---|---|---|---|
| Stanley Hudson | 2004-11-20 | 2026-01-19 | 7730 |
| Michael Scott | 2005-03-24 | 2026-01-19 | 7606 |
| Jim Halpert | 2005-10-05 | 2026-01-19 | 7411 |
EXTRACT(part FROM date) gets specific components:
| full_name | hire_date | hire_year | hire_month | day_of_week |
|---|---|---|---|---|
| Michael Scott | 2005-03-24 | 2005 | 3 | 4 |
DOW = Day of Week (0=Sunday, 1=Monday, etc.)
DATE_PART('part', date) does the same thing:
| full_name | hire_year | hire_quarter |
|---|---|---|
| Michael Scott | 2005 | 1 |
| Angela Martin | 2006 | 3 |
Use whichever syntax you prefer. They are equivalent.
How many employees were hired each year?
| hire_year | num_hired |
|---|---|
| 2004 | 1 |
| 2005 | 3 |
| 2006 | 2 |
| 2007 | 2 |
You can add intervals to dates:
| hire_date | one_year_later | ninety_days_later |
|---|---|---|
| 2005-10-05 | 2006-10-05 | 2006-01-03 |
Write a query that shows:
Only include employees who HAVE logged in (not NULL).
Order by most recent login first.
Take 3 minutes.
| full_name | last_login | days_since_login |
|---|---|---|
| Dwight Schrute | 2026-01-16 08:01:00 | 3 |
| Jim Halpert | 2026-01-16 08:03:00 | 3 |
| Kevin Malone | 2026-01-16 09:30:00 | 3 |
Note: I cast last_login to DATE to get whole days.
Homework 2 covers everything we learned today:
\COPYYou will use the stackoverflow database with three tables:
currency_rates (CSV). Create table and import a CSV.country_stats (SQL to execute)response_timeline (SQL to execute)You will define a currency_rates table based on sample CSV data.
Sample data you will see:
rate_date,currency_code,currency_name,exchange_rate,is_major_currency
2025-01-01,USD,US Dollar,1.000000,true
2025-01-01,EUR,Euro,0.8523,true
Choose types carefully:
rate_date needs a DATE typecurrency_code is always 3 charactersexchange_rate needs decimal precisionQ3 asks for ABS():
Remember that ABS gives the distance from zero:
Q4 asks for ROUND() and percentages:
Watch for integer division! Cast if needed.
You will group by currency and filter with HAVING.
Common mistake: Using WHERE instead of HAVING for aggregate conditions.
-- WRONG: WHERE cannot filter on aggregates
SELECT currency_code, COUNT(*)
FROM currency_rates
WHERE COUNT(*) = 12 -- ERROR!
GROUP BY currency_code;
-- RIGHT: Use HAVING for aggregate conditions
SELECT currency_code, COUNT(*)
FROM currency_rates
GROUP BY currency_code
HAVING COUNT(*) = 12; -- Correct!Q6: Finding the median
Do not forget WITHIN GROUP! It is required.
Q7: Using arrays for quartiles
The ARRAY keyword is essential.
Q8: Subtracting dates gives days:
Q9: EXTRACT for grouping by time parts:
Remember to GROUP BY the same expression you SELECT.
The hardest question! You need to compare January and December rates.
Strategy: Join the table to itself:
Use table aliases (jan, dec) to distinguish the two instances.
Before submitting:
Pro tip: Start simple, then add complexity.
Topics we covered today:
\COPYWhat questions do you have?
