These projects were created as part of Codecademy's Data Engineer Career Path course. (Latest projects are added to the top of the list.)
There were several portfolio projects created as part of this course. As they were larger and more extensive programs they can be found in their own repositories:
- Portfolio Project: GitHub Data Engineering Portfolio (TODO)
- Portfolio Project: Subscriber Cancellations Data Pipeline (TODO)
- Portfolio Project: Bike Rental Data Management (TODO)
I have previously completed a number of the projects within Python Fundamentals and all of the projects within SQL Fundamentals as part of Codecademy's Computer Science Career Path course. For completeness, I have included the projects here as well, and have marked them with a *.
- Learn MongoDB
- Learn Git II: Git For Deployment
- Learn Git: Introduction To Version Control
- Advanced Python
- Learn The Command Line
- Intermediate Python
- Introduction To Big Data With PySpark
- Getting Started Off-Platform
- Data Wrangling, Cleaning And Tidying
- Advanced SQL
- Python Pandas
- SQL Fundamentals *
- Python Fundamentals
- Other
This project was to create a shopping cart system for a surf shop, designed to exercise knowledge of errors and unit testing in Python. The goal was to build tests for the shopping cart software, apply more advanced testing techniques, and then identify and fix any bugs exposed by the tests.
The main ShoppingCart class defines the core properties and behaviours.
num_surfboards(integer) β number of surfboards in the cart.checkout_date(datetime) β the date the cart is checked out.locals_discount(boolean) β whether a localsβ discount has been applied.
add_surfboards(quantity=1)β adds surfboards to the cart, raisesTooManyBoardsErrorif the maximum of 4 is exceeded.set_checkout_date(date)β sets the checkout date, raisesCheckoutDateErrorif the date is not in the future.apply_locals_discount()β sets the localsβ discount flag toTrue.
TooManyBoardsErrorβ raised when attempting to add more than 4 surfboards.CheckoutDateErrorβ raised when attempting to set a checkout date in the past or present.
Both exceptions override __str__ to provide clear, user-friendly error messages.
This project demonstrates how Python exceptions and unit testing can be used to enforce business rules and ensure code correctness.
- Unit Testing: Writing tests with unittest ensures each feature works as expected.
- Advanced Testing Features: Skipped tests (
@unittest.skip), expected failures (@unittest.expectedFailure), and parameterised subtests (self.subTest) were used to improve test coverage and clarity. - Encapsulation: Internal state (
num_surfboards,checkout_date,locals_discount) is controlled via methods and not modified directly. - Error Handling: Custom exceptions provide meaningful error messages, improving maintainability and debugging.
- Test-Driven Bug Fixing: Tests highlighted errors such as invalid checkout date handling and surfboard quantity limits, which were then fixed.
The project includes a full suite of unit tests using Pythonβs built-in unittest framework. To run the tests:
python sams_surf_shop.pyYou should see output similar to:
Ran 6 tests in 0.001s
OK (skipped=1, expected failures=1)
@unittest.skipmarks tests that are intentionally skipped.@unittest.expectedFailuremarks tests that are known to fail, for demonstration purposes.self.subTestis used to run the same test with multiple inputs while keeping them separate in the output.
This ensures that all core functionality of the shopping cart, including error handling and state changes, is verified.
- Solution URL:
- β
Refactor:
- Added descriptive messages for custom exceptions.
- Used type hints for clarity (
Optional[datetime]).
- β
Unit Testing Improvements:
- Used
self.subTestfor parameterised checks of multiple surfboard quantities. - Applied decorators:
@classmethodfor shared test setup.@unittest.skipfor tests intentionally skipped.@unittest.expectedFailurefor known failing behaviour.
- Corrected assertions to verify state changes (
cart.locals_discount) rather than method objects.
- Used
This project was to create a digital school catalogue built using object-oriented programming. The goal of the project is to model different types of schools and to provide quick reference information about each.
The catalogue supports three types of schools:
- Primary
- Middle
- High
All inheriting structure and behaviours from a shared base School class.
The parent School class defines the core properties and behaviour.
- Properties:
name(string),level(string: 'primary', 'middle', or 'high'),number_of_students(integer). - Representation Method:
A
__repr__method returning a text description of the school. - Getters and Setters:
Getters for all properties, plus a setter for
number_of_studentswith input validation.
Each class builds on this:
- PrimarySchool: adds a
pickup_policyproperty and extends the description output. - MiddleSchool: inherits the base behaviour without any changes.
- HighSchool: adds
sports_teamsand extends the description output.
Here is the parent class structure:
class School:
def __init__(self, name, level, number_of_students):
levels = ['primary', 'middle', 'high']
if level not in levels:
raise ValueError(f'Level must be one of {levels}')
if not isinstance(name, str):
raise TypeError('Name must be a string.')
if not isinstance(number_of_students, int):
raise TypeError('Number of students must be an integer.')
self._name = name
self._level = level
self._number_of_students = number_of_students
def __repr__(self):
return f'A {self.level} school named {self.name} with {self.number_of_students} students.'
@property
def name(self):
return self._name
@property
def level(self):
return self._level
@property
def number_of_students(self):
return self._number_of_students
@number_of_students.setter
def number_of_students(self, new_number_of_students):
if not isinstance(new_number_of_students, int):
raise TypeError('Number of students must be an integer.')
self._number_of_students = new_number_of_studentsThis project demonstrates how inheritance can reduce repeated logic while allowing each child class to include its own unique features. The base School class defines the common structure, and the child classes extend it with additional information where needed.
- Inheritance: the child classes extend the parent class, reducing duplication and creating a clear hierarchy.
- Encapsulation: access to internal data is controlled using getter and setter methods, and validation prevents invalid values being assigned.
- Overriding Methods: child classes override
__repr__where necessary to include additional details specific to that school type. - Flexible Arguments: the high school class uses
*sports_teamsto allow any number of sports team names to be supplied. - Input Validation: constructors validate input types, ensuring that school objects are initialised with valid data.
- Solution URL: School Catalogue
- β
Refactor:
- Added descriptive error messages for all
TypeErrorandValueErrormessages. - Simplified validation by inverting logic to use
if not. - Changed attribute names to use _ prefix (e.g.
_name,_level) to indicate they're internal and should be accessed via properties.
- Added descriptive error messages for all
- β
Use decorators for properties, getters and setters:
- Replaced getters with
@propertydecorators allowing access to attributes likemy_school.nameinstead ofmy_school.get_name(). - Replaced setter with
@number_of_students.setterso can now usemy_school.number_of_students = 200instead ofmy_school.set_number_of_students(200). - Test code updated to use property-based syntax.
- Replaced getters with
This project focused on building functions that calculate shipping costs, identify the cheapest available driver, and total the revenue from completed trips. The main learning goal was practising how to work with and manipulate different types of arguments β unpacking tuples, using default parameters, and working with *args and **kwargs.
Coordinates needed to be unpacked so latitude and longitude values could be passed to calculate the distance:
from_lat, from_long = from_coords
to_lat, to_long = to_coords
distance = get_distance(from_lat, from_long, to_lat,to_long)
# Equivalent shorthand:
distance = get_distance(*from_coords, *to_coords)Default arguments ensure functions behave sensibly when optional information isnβt supplied. In the shipping cost calculator, the default shipping_type was 'Overnight':
def calculate_shipping_cost(from_coords, to_coords, shipping_type='Overnight'):Knowing when to use each is key:
*argsβ any number of positional, unnamed values β stored as a tuple. Use when you don't know how many unnamed values will be passed.def calculate_driver_cost(distance, *drivers): for driver in drivers: ...
**kwargsβ any number of keyword arguments, named values β stored as a dictionary. Use when values are passed with names (keys).def calculate_money_made(**trips): for trip_id, trip in trips.items(): ...
The aim of this project was to use Pandas to analyse data regarding a websites funnel for visits.csv, cart.csv, checkout.csv and purchase.csv. The funnel describes the process of:
- A user visits the website
- A user adds a product to the cart
- A user clicks "checkout"
- A user completes a purchase
Analysis was done on each step of the funnel to see how many people continue through the funnel and ultimately make a purchase.
-
merge()was used to help merge various DataFrames together e.g.visits_cart = visits.merge(cart, how='left')
-
isnull()was used to check which timestamps hadnullfor that particular column, i.e. the user never went to the next funnel stage e.g.null_purchases = len(checkout_purchase[checkout_purchase.purchase_time.isnull()])
- Solution URL: Page Visits Funnel
- Other files:
The aim of this project was to use Pandas to analyse data from adclicks.csv. It is an A/B testing on the number of people clicking on the ads of a hypothetical website called ShoeFly.com They have two different versions of an ad, which they have placed in emails, on Facebook, Twitter, and Google. Analysis of the data for the two ads was done to see how they are performing on each of the different platforms and on each day of the week.
-
groupby()was used to help organise the data for analysis e.g.ad_clicks_by_source = ad_clicks.groupby('utm_source').user_id.count().reset_index()
-
pivot()was used to pivot the data in order to make it more readable e.g.clicks_pivot = clicks_by_source.pivot( columns='is_click', index='utm_source', values='user_id' ).reset_index() print(clicks_pivot)
-
Solution URL: A/B Testing For ShoeFly.com
-
Other files:
-
I rounded the
percent_clickedto 1dp in all the pivot tables where it was used e.g.a_clicks_grouped['percent_clicked'] = round((a_clicks_grouped['user_id'] / a_clicks_grouped.groupby('day')['user_id'].transform('sum') * 100),1)
The aim of this project was to use Pandas to analyse data from inventory.csv. A number of columns were added to the data to enhance the information that could be extracted from the data, and also making use of lambda functions:
in_stockcolumn: makes use ofquantitycolumn to work out whether in stock i.e.TrueorFalse.total_valuecolumn: makes use ofpriceandquantitycolumns to calculate the total value.full_descriptioncolumn: makes use of and combinesproduct_typeandproduct_descriptioncolumns to create a full description.
inventory['in_stock'] = inventory.quantity.apply(lambda quantity: True if quantity > 0 else False)
inventory['total_value'] = inventory.price * inventory.quantity
combine_lambda = lambda row: '{} - {}'.format(row.product_type, row.product_description)
inventory['full_description'] = inventory.apply(combine_lambda, axis=1)- Solution URL: Petal Blossom
- Other files:
The aim of this project was to set constraints on a database used for keeping track of parts, their manufacturer, location in storeroom, inventory etc, to enable data quality checks to ensure that only valid data can be entered into the database.
The database had the initial following schema:
To ensure data quality the following were utilised on the tables as appropriate:
ADD UNIQUE (column_name)ALTER COLUMN column_name SET NOT NULLADD CHECK (appropriate limitation)ADD PRIMARY KEY (column_name)ADD FOREIGN KEY (column_name) REFERENCES table_name(column_name)
- Solution URL: Building An Inventory Database With PostgreSQL
The aims of this project were to:
- Design a database schema on any topic.
- Implement the schema using Postbird (which is an open source PostgreSQL GUI client).
I chose to design a database around a hypothetical UK secondary school. I focused the database around people who would be closely associated with the school and how they are linked together to decide on what information to add and how to organise it. I designed the following schema for the database (I've included the DBML database markup language file):
I got ChatGPT to generate hypothetical data based of the schema:
While using ChatGPT helped speed up the generation of data, it didn't keep the data consistent to what it was supposed to add for each individual. For instance, it added a staff member with a note to state they were an art teacher, but never added art teacher as a job. So there were some inconsistencies when trying to add the data that I needed to clear up when inserting data into the table.
-
Solution URL: Designing A Database From Scratch
-
I added a few test queries to check that I could pull out information appropriately and as expected.
The aim of this project was to design a database schema based around a fictional restaurant "Bytes of China" and perform the following tasks:
- Create tables
- Define relationships between tables
- Designate appropriate columns as keys
- Insert sample data and
- Make queries from the database
The database has the following schema:
I learnt how to use dbdiagram.io to create an accurate schema and also how to validate and check keys and relationships in the database using information_schema.key_column_usage:
SELECT
constraint_name,
table_name,
column_name
FROM information_schema.key_column_usage
WHERE table_name = 'restaurant';- Solution URL: Build A Menu For Bytes Of China
The aim of this project was make queries to a database containing multiple tables of Lyft trip data information using SQL commands knowledge to date and using JOIN and UNION commands. The database tables have the following schema:
- Example use of
LEFT JOINwhere it was used to create a trip log with the trips and its users:SELECT trips.date, trips.pickup, trips.dropoff, trips.type, trips.cost, riders.first, riders.last, riders.username FROM trips LEFT JOIN riders ON trips.rider_id = riders.id;
- Solution URL: Lyft Trip Data
The aim of this project was make queries to a database table of Hacker News stories information using SQL commands knowledge to date and along with strftime() function. The database table has the following schema:
- Example use of
strftime()where it was used to find the best time for users to post news stories to get the best scores:SELECT strftime('%H', timestamp) AS 'Hour', ROUND(AVG(score), 1) AS 'Average Score', COUNT(*) AS 'Number of Stories' FROM hacker_news WHERE timestamp IS NOT NULL GROUP BY 1 ORDER BY 2 DESC;
- Solution URL: Analyse Hacker News Trends
The aim of this project was make queries using aggregate functions to a database table of startup companies information using SQL commands. The database table has the following schema:
- Aggregate functions used:
COUNT(),SUM(),MAX(),MIN(),AVG(),ROUND() GROUP BY column_name;HAVING aggregrate function conditon;
- Solution URL: Trends In Startups
The aim of this project was make queries to a database table of restaurant information using SQL commands. The database table has the following schema:
SELECT DISTINCT column_name FROM table_name;WHERE column_name condition;WHERE column_name LIKE pattern;WHERE column_name condition AND or OR column_name condition;WHERE column_name IS NULL;ORDER BY column_name DESC LIMIT number;- Also used
CASE,WHEN,THEN,ELSE,END ASSELECT name, CASE WHEN review > 4.5 THEN 'Extraordinary' WHEN review > 4 THEN 'Excellent' WHEN review > 3 THEN 'Good' WHEN review > 2 THEN 'Fair' ELSE 'Poor' END AS 'Review' FROM nomnom;
- Solution URL: New York Restaurants
The aim of this project was to create a friends table and add/delete data to it using basic SQL commands.
CREATE TABLE table_name (column_name data_type);INSERT INTO table_name (column_name) VALUES (data);SELECT column_name or * FROM table_name;UPDATE table_name SET column_name = data;ALTER TABLE table_name ADD COLUMN column_name data_type;DELETE FROM table_name WHERE column_name = data;
- Solution URL: Create A Table
The aim of this project was to create a Python program that reads and writes to files. Extracting Username information from passwords.csv to create a list of user names in compromised_user.csv. Creating boss_message.json and new_passwords.csv.
- Imported
csv,jsonandos. with open("file_name", "w")used to write files, "w" omitted when using only for reading.csv.DictReader()used to read passwords.csv file.json.dump(dictionary_name, json_file_name)used to write information to json file from Python dictionary.file_object.write("text_to_add")used to write information to file.
- Solution URL: Hacking The Fender
- Other files:
The aim of this project was to write Python functions using Jupyter Notebook to code and decode messages using a Caesar Cipher and a Vigenère Cipher.
The Caesar Cipher is a simple offset cipher. However, I found the Vigenère Cipher more difficult to understand how to code as each letter has a different offset depending upon the key word used.
def vigenere_decode(message, key):
decoded_message = ""
key_phrase = ""
key_index = 0
for char in message:
if key_index >= len(key):
key_index = 0
if char in alphabet:
key_phrase += key[key_index]
key_index += 1
else:
key_phrase += char
for i in range(len(message)):
if message[i] in alphabet:
old_char_index = alphabet.index(message[i])
offset_index = alphabet.index(key_phrase[i])
new_char = alphabet[(old_char_index + offset_index) % 26]
decoded_message += new_char
else:
decoded_message += message[i]
return decoded_message- Solution URL: Coded Correspondence
The aim of this project was to create a Python program that takes a list of sales information in a string format (customer name, price, colour(s) of thread purchased and date), and then use a variety of techniques to clean up the data into easier-to-access information.
string.replace()used to help with clarifying appropriate sales transaction.string.split()used to help split up string into appropriate sections.string.strip()used to clear up and remove whitespace in transaction information.forloops used to iterate through lists of transactions.list.appendto add information into smaller appropriate groups of lists e.g. customers, sales price and colour of thread purchased.- Function defined to calculate the total numbers sold for each colour thread.
printand.format()used to print out a formatted string of the number of each colour thread purchased.
- Solution URL: Thread Shed
The aim of this project was to install Jupyter Notebook and use it to write a program to work out which night the most people can attend for a gamers night.
pip install notebookand to run the notebook, I didcd file_directory_nameand thenjupyter notebookin the command line.
- Solution URL: Abruptly Goblins
The aim of this project was to create a Python program that processes some data from a group of friends playing scrabble. Dictionaries are used to organise players, words and points.
-
List comprehension used to create a dictionary from two provided lists of letters and their points.
letters_to_points = {key:value for key, value in zip(letters, points)}
-
Functions defined to:
-
play_word()adds a new word played by a player.- Sets
word.upper()as letters in letters_to_points dictionary are all uppercase. - Sets
player.title()so that if names are entered differently with lowercase or uppercase letters they will still match when compared toplayer_to_wordsdictionary. - If player already exists, word is added to their played list in player_to_words dictionary.
- If player doesn't exist, then the new player along with their word is added to player_to_words dictionary.
- Calls
update_point_totals()function.
- Sets
-
update_point_totals()updates the total points scored for the player.- Calls
score_word()function. - If player already exists, points are added to their total score in player_to_points dictionary.
- If player doesn't exist, then the new player along with their score is added to player_to_points dictionary.
- Calls
-
score_word()calculates and returns the points score of a word. -
play_round()initialises the program.- Gets player name and word from user.
- Calls
play_word()function. - Calls
another_round()function.
-
another_round()asks user whether they wish to enter another player's word.- Gets Y/N input from user. Input changed to uppercase.
- If response is
YorYES, then callsplay_round()function. - If response not
YorYES, then callsshow_results()function.
-
show_results()iterates throughplayer_to_pointsdictionary to print out names and total scores of each player.
-
- Solution URL: Scrabble
- Remove hardcoded player's data and ask for input of name and word from user - ADDED TO CODE.
- Check whether more words to be added and scored - ADDED TO CODE.
- Show overall results of players in formatted strings - ADDED TO CODE.
The aim of this project was to create a Python program that uses classes and functions to work out what menus are available at different times of day and to calculate bill costs. Classes defined for:
- Menu:
- String representation for name of menu and when it is available
calculate_bill()calculates the cost of a particular meal
- Franchise:
- String representation for name of restaurant
available_menus()works out which menus are currently available depending upon the time of day
- Business takes the name of the business and a list of its franchises.
- Solution URL: Basta Fazoolin'
The aim of this project was to create a Python program that uses functions and return to help calculate some fundamental physics properties.
- Functions defined to calculate:
- Fahrenheit to Celsius temperatures
- Celsius to Fahrenheit temperatures
- Force
- Energy
- Work
- Values to test the program are hardcoded into the program.
- Solution URL: Getting Ready For Physics Class
- Have the user choose which physics property to calculate.
- Have the user enter appropriate values for the physics property they have chosen to calculate.
The aim of this project was to create a Python program that uses imports and work with datetime to simulate time travel and to perform calculations using dates.
-
Created custom module that was imported into the main script.
-
Used list for possible destinations.
-
Use of
random,decimal,datetimeto carry out calculations:base_cost = Decimal('1000.00') current_year = current_date.year start_year = -2000 end_year = 3500 target_year = randint(start_year, end_year) cost_multiplier = abs(current_year - target_year) final_cost = round(base_cost * cost_multiplier, 2)
- Solution URL: Time Travelers Toolkit
- Other files:
The aim of this project was to create a Python program that calculates some metrics from lists of data.
- Lists of hairstyles, prices and last week's sales are hardcoded into the program.
for, list comprehensions,range(),len()andifare used to calculate average prices, decreased prices, total revenue, average daily revenue and types of haircuts that cost less than Β£30.
- Solution URL: Carly's Clippers
The aim of this project was to create a Python program that takes pizzas and their prices using lists and alters the lists to organise the data.
- 2D lists of pizzas and prices are hardcoded into the program.
list.count(item)used to count how many pizzas are $2.len(list)used to count the number of different kinds of pizzas.list.sort()to sort the list in ascending order of price.list.pop()to remove the most expensive pizza.list.insert(index, item)to add a new pizza in appropriate position to keep price sorted in list.list[:3]to find the cheapest three pizzas.
- Solution URL: Len's Slice
The aim of this project was to create a Python program that takes student data and organizes subjects and grades using lists.
- 2D lists of subjects and grades are hardcoded into the program.
list.append(item),list[index].remove(item)are used to alter subjects and grades andprint()out gradebook information to the user.
- Solution URL: Gradebook
- Have the user input the initial subjects and grades.
- Have the user be able to alter subjects and grades.
The aim of this project was to create a Python program that asks the user for the weight of their package and then tells them which method of shipping is cheapest and how much it will cost to ship their package using Salβs Shippers.
weightvariable is hardcoded into the program.if,elifandelseused to calculate cost shipping andprint()out costs.
- Solution URL: Sal's Shipping
- Have the user input the packages weight.
The aim of this project was to create a Python program that can answer any "Yes" or "No" question with a different fortune each time it executes.
nameandquestionvariables are hardcoded into the program.randommodule withrandint()used to generate a random number within a specified range.if,elifandelseused to select answers andprint()out appropriate response.
- Solution URL: Magic 8-Ball
- Have the user input their name and question.
- V. Tickner







