These days in the fast changing world we are inundated with big volumes of data. It provokes higher demand for people specialized in processing it. A vast majority of companies and institutions could not work without efficient databases systems what makes an ability of creating and maintaining them a valuable skill. To gain it we developed a project during databases course at Wroclaw University of Science and Technology.
The project generally consists of generating a database and a report for Gamestalgia - a fictional board games rental store. Gamestalgia is a unique place. Not only does it rent games but also sells them and organize tournaments. Moreover, the most active customers are rewarded with a VIP status and hence extra discounts on rentals and purchases.
Above schema presents tables included in Gamestalgia's database and relations between them.
Types of relations are symbolized by endings of lines connecting tables with each other:
- two vertical lines - one and only one
- circle and three branched lines - zero or many
- one vertical and three branched lines - one or many
Making this project involved using following technologies:
- Python It was applied to generate data for all tables. All dependencies between them were taken into consideration to provide a data integrity. Then Python was used for filling the database with valules generated previously.
- MySQL It was used for creating the database schema and analysing data.
- knitr This tool was applied to generate a report.
Every underlined .csv file contains a complete dataset that later is used to fill a table of a similar name. All underlined .py files consist of functions that generate data, process it and return as a dataframe. (e.g. address.py provides data for table address, customer_rental.py for table customers and table rentals.)
- generate_data:
-
connection.py
It connects to the database by SQLAlchemy, reads the create_tables.sql and runs all the queries stored in this file. Then it inserts data from saved csv files. -
const.py
In this file constants are stored. -
create_tables.sql
Here sql code responsible for creating tables is stored. -
main_execute.py
Main function of the program. It run programs that generate data then writes results to csv and run programs responsible for linking to the database by SQLAlchemy and inserting data to tables. -
address
-
create_staff_schedule
-
customer_rental
-
game
-
inventory_buy_purchase
-
inventory_rent
-
staff
-
tournament
-
tournament_rental_consistency
- tournament_rental_consistency.py
It prevents from lack of games during a tournament
- tournament_rental_consistency.py
-
tournament_results
-
- clone repository
git clone https://github.com/Staszek15/store-database.git
or download its zip file instead. - Open the console/terminal.
- Navigate to generate_data folder.
- Run the main_execute.py file to create database
- Open report.Rmd (e.g. in RStudio) located in store-database/generate_report folder.
- Change the user in python path according to the hints provided close to this line of code.
- Knitr the report.
- The saved report is located in store-database/generate_report folder.
1NF:
- every table has its own primary key
- every cell contains only one number, one date or one string
- every cell describes only one object
- order of row does not matter
2NF:
- database is 1NF
- all non-key attributes in any table depend on the entire primary key
3NF:
- databaase is 2NF
- none non-key attribute depends on another non-key attribute within the same table
EKNF:
- database is 3NF
- there is no relation that is not inconsistent with EKNF definition
This project is complex and combines the knowledge from various areas. That's why encountering some problems was unavoidable. Generally there were easy to solve so there's no point in elaborating on them. But there were some difficulties that consumed a lot of time and involved doing extensive research.
- Providing data integrity
It was a priority to prepare a correct dataset. Taking care of consistency between tables and keeping them realistic enabled discussions and modifying tables not only at the beginning of working but also during further work. Some contradictions in seemingly proper datasets were noticed during generating other dependent tables. Sometimes it forced changes in schema of more than one table and modifications of code. - Connecting to server and filling the database in.