Objective of this repository

The objective of this repository is to offer a "quick-and-dirty" way to generate random datasets for a database.

Structure of the created database

The file main.py will create (with the default settings) the SQL code to create the following tables:

Table customers
Table products
Table invoices
Table product_categories

The table names can be changed in the parameters in the file main.py

Structure of table customers

customer_id (INT, not null) - Primary Key
first_name (String, not null)
last_name (String, not null)
street (String, not null)
housenumber (String, not null)
po (int, not null)
city (String, not null)
state (String, not null)

The names and street addresses are completely randomly generated (partly with the help of ChatGPT).

Structure of table products

product_id (INT, not null) - Primary Key
product_kategory_id (INT, not null) - Foreign key to table product_categories
product_name (String, not null)

Structure of table product_categories

product_category_id (INT, not null) - Primary Key
product_category_name (String, not null)

Structure of table invoices

id (INT, not null) - Primary key
date (String, not null)
invoice_id (INT, not null)- Invoice id can be repeated as one invoice id can appear multiple times if multiple items have been purchased)
product_id (INT, not null) - Foreign key to table products
number_units_purchased (INT, not null) - Number of items purchased (referring to a specific product_id)
price_per_unit (FLOAT, not null) - The price for a single unit of a specific product

How to run

To generate the datasets, you simply run python main.py.

The SQL code to create the database can be found in database_files/create_database_customers.sql

The SQL code to create the tables and insert the datasets can be found in database_files/customers.sql.

Parameters

The following parameters can be adjusted in main.py
- database_system: Currently only postgres is supported, but the generated queries should be usuable on other database systems as well with minimal adjustments. For each table creation/dataset insert operation, a template can be found in the folder templates. Each template name ends with the database system name as its suffix.
- output_dir: The name of the output directory for the SQL statements to create the database, the tables and insert the data (default: database_files)
- output_file_create_database_statement: The file name for the SQL to create the database (default: create_database_customers.sql)
- output_file_sql_statements: The file name for the SQL statements to create the tables and insert the data (default: customers.sql)
- database_name: The name of the database to create (default: db_customers)
- table_name_customers: The table name for the customers table (default: customers)
- table_name_invoices: The table name for the invoices table (default: invoices)
- table_name_products: The table name for the products table (default: products)
- table_name_product_categories: The table name for the product categories table (default: product_categories)
- customer_start_value_primary_key: The start value of the primary key in customers table (customer_id is the primary key), default is 1
- number_of_customer_entries_to_generate: How many random customer datasets to create (default: 500).
- start_date_invoice_generation: The minimal date for the randomly generated invoices data (default: '2022-01-01')
- end_date_invoice_generation: The maximum date for the randomly generated invoices data (default: '2023-12-31')
- probability_invoice_per_customer_id_per_day: The probability that for a given customer_id for a given day an invoice dataset will be generated (default: 0.002, so 0.2%). So for each single day between start_date_invoice_generation and end_date_invoice_generation (inclusive), there will be (with the default value) a 0.2% chance that a random invoice dataset will be generated. With this parameter, you can finetune the number of invoices datasets generated.
- minimum_number_different_products_per_invoice: The minimum number of unique products per invoice (default: 1)
- maximum_number_different_products_per_invoice: The maximum number of unique products per invoice (default: 5). The actual number of different products per invoice dataset will be randomly chosen between the minimum and maximum number parameter
- minimum_number_items_per_product: The minimum number of items in the invoice dataset per different product (default: 1)
- maximum_number_items_per_product: The maximum number of items in the invoice dataset per different product (default: 10). The actual number will be randomly chosen between the minimum and maximum number parameter.
- show_stats: Whether to show stats about the created datasets when running the file main.py (default: True)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
database_files		database_files
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
customers.py		customers.py
customers.txt		customers.txt
customers_params.py		customers_params.py
customers_table_test.py		customers_table_test.py
helpers.py		helpers.py
helpers_test.py		helpers_test.py
invoices.py		invoices.py
invoices.txt		invoices.txt
invoices_test.py		invoices_test.py
main.py		main.py
products.py		products.py
products.txt		products.txt
products_params.py		products_params.py
products_table_test.py		products_table_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Objective of this repository

Structure of the created database

Structure of table customers

Structure of table products

Structure of table product_categories

Structure of table invoices

How to run

Parameters

About

Uh oh!

Releases

Packages

Languages

License

andreashuebner/create-database-for-teaching

Folders and files

Latest commit

History

Repository files navigation

Objective of this repository

Structure of the created database

Structure of table customers

Structure of table products

Structure of table product_categories

Structure of table invoices

How to run

Parameters

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages