PyDataFaker is a python package to create fake data with relationships between tables. Creating fake data can be useful for many different applications such as creating product demos or testing software.
Python already has a great package for creating fake data called Faker https://faker.readthedocs.io/en/master/. Faker is great for creating individual fake units of data, but it can be time consuming to create more complicated fake data that is actually related to one another.
Imagine you are developing a new enterprise resource planning (ERP) software to challenge SAP. You may need to create some fake data to test your application. You will need an invoice table, a vendor listing, purchase order table, and more. PyDataFaker allows your to quickly create these tables and generates relationships between them!
PyDataFaker is currently under development. At this time it is possible to create the following entities:
- Business: create a fake business with common ERP like tables
- School: create a fake school
More entities are currently being developed. If you have any ideas of additional entities that should be included please submit an issue here: https://github.com/SamEdwardes/pydatafaker/issues.
pip install pydatafaker
Documentation can be found at https://pydatafaker.readthedocs.io/en/latest/index.html. The package is distributed through PyPi at https://pypi.org/project/pydatafaker/
The business module allows you to create fake business data. Calling business.create_business()
will return a dictionary of related tables.
import pandas as pd
from pydatafaker import business
biz = business.create_business()
biz.keys()
dict_keys(['vendor_table', 'po_table', 'invoice_summary_table', 'invoice_line_item_table', 'employee_table', 'contract_table', 'rate_sheet_table', 'timesheet_table'])
Each value inside the dictionary contains a Pandas DataFrame.
biz['invoice_summary_table']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
invoice_id | amount | invoice_date | po_id | vendor_id | |
---|---|---|---|---|---|
0 | inv_00001 | 59157 | 2011-01-20 | po_00001 | vendor_00001 |
1 | inv_00002 | 87796 | 2007-09-06 | po_00002 | vendor_00002 |
2 | inv_00003 | 57963 | 2000-03-06 | po_00003 | vendor_00003 |
3 | inv_00004 | 59409 | 2001-03-31 | po_00004 | vendor_00004 |
4 | inv_00005 | 86614 | 2002-01-12 | po_00005 | vendor_00005 |
... | ... | ... | ... | ... | ... |
445 | inv_00446 | 83316 | 2012-09-02 | po_00087 | vendor_00087 |
446 | inv_00447 | 45707 | 2008-07-10 | po_00101 | vendor_00098 |
447 | inv_00448 | 111932 | 2002-09-26 | po_00158 | vendor_00012 |
448 | inv_00449 | 35104 | 2012-09-21 | po_00133 | vendor_00075 |
449 | inv_00450 | 15397 | 2015-12-15 | po_00054 | vendor_00054 |
450 rows × 5 columns
Tables can be joined together to add additional details.
invoice_summary = biz['invoice_summary_table']
vendors = biz['vendor_table']
pd.merge(invoice_summary, vendors, how='left', on='vendor_id')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
invoice_id | amount | invoice_date | po_id | vendor_id | vendor_name | vendor_description | address | phone | ||
---|---|---|---|---|---|---|---|---|---|---|
0 | inv_00001 | 59157 | 2011-01-20 | po_00001 | vendor_00001 | Smith-Scott | Front-line multimedia emulation | 75343 Harper Corners Suite 581\nJuanberg, AK 0... | (193)898-1652x129 | ftodd@example.org |
1 | inv_00002 | 87796 | 2007-09-06 | po_00002 | vendor_00002 | Walker-Morgan | Cross-platform radical solution | 941 Susan Isle\nThorntonberg, KS 82841 | +1-636-744-9620x3991 | rdunn@example.com |
2 | inv_00003 | 57963 | 2000-03-06 | po_00003 | vendor_00003 | Noble and Sons | Configurable demand-driven emulation | 1442 Jason Rapid Apt. 409\nEast Jade, RI 44983 | 477-214-2021x973 | tinaschmidt@example.com |
3 | inv_00004 | 59409 | 2001-03-31 | po_00004 | vendor_00004 | Baker, Walker and Davenport | Focused analyzing synergy | 89120 Kimberly Extensions\nSouth Annettetown, ... | (643)621-7544x290 | sarahstephenson@example.com |
4 | inv_00005 | 86614 | 2002-01-12 | po_00005 | vendor_00005 | Patterson LLC | Profound maximized productivity | 880 Bryan Tunnel Apt. 542\nKaylabury, AK 50221 | 586-422-7311x0127 | littleyesenia@example.net |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
445 | inv_00446 | 83316 | 2012-09-02 | po_00087 | vendor_00087 | Wagner-Gutierrez | Multi-lateral motivating projection | 8771 Roger Road Suite 781\nDanielton, ID 88428 | 001-023-820-3050x78454 | colliernicole@example.net |
446 | inv_00447 | 45707 | 2008-07-10 | po_00101 | vendor_00098 | Simmons-Leonard | Focused reciprocal secured line | 9010 Ashley Mountains\nMarthaton, VT 68298 | 391-162-6024 | serranonancy@example.org |
447 | inv_00448 | 111932 | 2002-09-26 | po_00158 | vendor_00012 | Welch LLC | Versatile methodical interface | 4016 Brianna Road\nPort Andrealand, AR 22214 | +1-837-862-5571x172 | williamoliver@example.com |
448 | inv_00449 | 35104 | 2012-09-21 | po_00133 | vendor_00075 | Franklin-Bennett | Digitized holistic methodology | 68125 Vega Plains Apt. 062\nEast Emily, OK 80097 | 001-979-468-2358x530 | leroymoore@example.org |
449 | inv_00450 | 15397 | 2015-12-15 | po_00054 | vendor_00054 | Barton-Oneill | Mandatory 4thgeneration hierarchy | 107 Julie Passage Suite 904\nSouth George, OH ... | (491)397-7771x41615 | jacksonrachel@example.com |
450 rows × 10 columns
import pandas as pd
from pydatafaker import school
skool = school.create_school()
skool.keys()
skool['student_table']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
student_id | name | grade | teacher_id | |
---|---|---|---|---|
0 | student_0001 | Tyler Campbell | 1 | teacher_0007 |
1 | student_0003 | Melissa Coleman | 1 | teacher_0010 |
2 | student_0011 | Crystal Church | 1 | teacher_0014 |
3 | student_0017 | Paul Gray | 1 | teacher_0007 |
4 | student_0023 | Joshua Morales | 1 | teacher_0010 |
... | ... | ... | ... | ... |
31 | student_0258 | Nicole Hoffman | 7 | teacher_0015 |
32 | student_0261 | Joseph Lewis | 7 | teacher_0009 |
33 | student_0294 | Susan Jacobs | 7 | teacher_0015 |
34 | student_0299 | Mark Whitehead | 7 | teacher_0009 |
35 | student_0300 | Melissa Sosa | 7 | teacher_0015 |
300 rows × 4 columns
Please see docs/source/contributing.rst.
Developed by:
- Sam Edwardes
Logo:
- Icon made by Freepik from www.flaticon.com
- Front from fontmeme.com/retro-fonts/
- Logo generated using logomakr.com