Syllabus

Modification Log

Date Modification
1/12/2025 Initial publishing

Course Information

Course Name

DATA 503 Fundamentals of Data Engineering

Location

Campus Room Day
Portland GPC 210 Monday
Salem Ford 102 Wednesday

Professor

Lucas Cordova, Ph.D.
Email: LPCordova@willamette.edu
Office: Ford 210 (Salem)

Office Hours

Office hours are available by 15-minute appointments via the Office Hours page or on a drop-in basis when available. Multiple modalities are offered (in-person, phone, Google Meet). If the scheduled times do not align with your availability, please don’t hesitate to contact me.

Monday Tuesday Wednesday Thursday Friday
10:20–11:20 (Salem)

4:30–5:30 (Portland)
10:20–11:20 (Salem)

4:30–5:30 (Salem)

Course Details

Catalog Description

Data management is core to applied computer science and data science. This course introduces relational databases, file-based databases, cloud storage, and data streaming, with an emphasis on selecting appropriate architectures for different data problems.

Course Description

The course focuses on data acquisition, storage, and organization. Topics include the data engineering pipeline and relational database design and querying using PostgreSQL.

Learning Outcomes

Upon completion, students will be able to:

  1. Understand the role of a data engineer.
  2. Query relational databases using SQL.
  3. Analyze database design decisions.
  4. Execute advanced SQL queries.
  5. Evaluate alternatives to relational databases.

Learning Objectives

Students will be able to:

  1. Explain relational database principles.
  2. Design and implement relational schemas.
  3. Write advanced SQL queries.
  4. Use SQL for text and spatial data.
  5. Implement a full data pipeline.

Textbook and Materials

  • Practical SQL (2nd ed.), Anthony DeBarros. Required
  • CodeGrade Subscription (Bookstore or self-purchase) Required
  • Fundamentals of Data Engineering, Reis & Housley. Optional
  • The Data Warehouse Toolkit (3rd ed.), Kimball & Ross. Optional

Assessments

Grade Distribution

Percentage Grade
≥92 A
90–91.99 A-
88–89.99 B+
82–87.99 B
80–81.99 B-
78–79.99 C+
72–77.99 C
70–71.99 C-
68–69.99 D+
62–67.99 D
60–61.99 D-
≤59.99 F

Assessment Breakdown

Deliverable Individual Weight Total Description
In-Class Exercises ~1% each 10% Attendance and participation based.
Homework ~5% each 30% Design, coding, and analysis assignments.
Midterm 15% 15% Culmination of in-class exercises and homework assignments.
Project 30% Semester-long team project with milestones.
Final Exam 15% 15% Comprehensive exam covering course material.
Total 100%

Course Schedule

The schedule is tentative.

Week Date Topics Homework Project
1 Jan 12 Course overview, pipelines
2 Jan 19 Modeling, normalization Assignment 1
3 Jan 26 SQL basics Assignment 2 Matchmaking Workshop
4 Feb 2 SQL joins Assignment 3
5 Feb 9 SQL constraints Assignment 4 Proposal
6 Feb 16 Shell, grouping
7 Feb 23 Midterm
8 Mar 2 Data generation Assignment 6 Milestone Checkpoint 1
9 Mar 9 Docker, JSON Assignment 7
10 Mar 16 Subqueries, windows Assignment 8
11 Mar 23 Spring break Milestone Checkpoint 2
12 Mar 30 Regex, text
13 Apr 6 APIs, views Assignment 10
14 Apr 13 PostGIS, MongoDB Presentation
15 Apr 20 Project presentations Final Write-Up

Course Policies

Attendance

Consistent attendance and participation are expected. In-class activities cannot be made up.

Late Work

Students receive three late tokens for assignments. Projects incur a 5% per-day penalty up to five days. Late tokens cannot be used for projects. Late token requests are made through Canvas.

Academic Honesty

All work must be your own. Limited collaboration and AI use for idea generation is permitted, but submitting AI-generated work is not allowed.

Willamette Policies

Includes inclusive classroom practices, accessibility, Title IX, religious accommodations, land acknowledgement, SOAR Center resources, and intellectual property guidelines.