Skip to content

Data eng assessment nuthan #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
9 changes: 9 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM python:3.9-slim

WORKDIR /app
COPY app/ /app/
COPY config.json /app/

RUN pip install -r /app/requirements.txt

CMD ["python", "/app/parse_fixed_width.py"]
150 changes: 35 additions & 115 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,125 +1,45 @@
# Exercise
# Data Engineering Challenge (Complete Version)

The goal of the project is to build a simple business loan application system.

The system consists of the following:

- Frontend
- Backend

The backend would integrate with third-party providers such as:

- Decision engine - This is where the final application will be
submitted to present the outcome of the application.
- Accounting software providers will provide a balance sheet for a selected business of the user.

Below is a sequence diagram to help visually understand the flow.

```mermaid

sequenceDiagram
Actor User as User
participant FE as Frontend
participant BE as Backend
participant ASP as Accounting Software
participant DE as Decision Engine

User ->> FE: Start Application

FE ->> BE: Initiate Application
BE ->> FE: Initiate Complete

User ->> FE: Fill Business Details & Loan amount
User ->> FE: Select Accounting provider
User ->> FE: Request Balance Sheet
FE ->> BE: Fetch Balance Sheet
BE ->> ASP: Request Balance Sheet
ASP ->> BE: Return Balance Sheet
BE ->> FE: Return Details for Review

User --> FE: Review Complete
User ->> FE: Submit Application

FE ->> BE: Request outcome
BE ->> BE: Apply Rules to summarise application
BE ->> DE: Request Decision
DE ->> BE: Returns outcome

BE ->> FE: Application Result
FE ->> User: Final Outcome
## Setup Instructions

### 1. Clone the Repository
```sh
git clone <your-repository-url>
cd data-engineering-project
```

Assumptions:

- You may choose from the following language: Javascript, Typescript, Python, Golang / HTML, CSS.
- For frontend, you could use a framework such as React / Vue, though basic HTML is also acceptable.
- The accounting software and decision engine are already implemented. The backend should provide a simulation of the above.
- The frontend can be very basic.
- The accounting provider option on frontend would include Xero, MYOB and more in future.
- A sample balance sheet received from Accounting provider:

```json

sheet = [
{
"year": 2020,
"month": 12,
"profitOrLoss": 250000,
"assetsValue": 1234
},
{
"year": 2020,
"month": 11,
"profitOrLoss": 1150,
"assetsValue": 5789
},
{
"year": 2020,
"month": 10,
"profitOrLoss": 2500,
"assetsValue": 22345
},
{
"year": 2020,
"month": 9,
"profitOrLoss": -187000,
"assetsValue": 223452
}
]
### 2. Set Up Virtual Environment & Install Dependencies
```sh
python -m venv venv
source venv/bin/activate # On Windows use venv\Scripts\activate
pip install -r app/requirements.txt
```

## Rules to be applied before sending to Decision Engine

- If a business has made a profit in the last 12 months. The final value to be sent with a field `"preAssessment": "60"` which means the Loan is favored to be approved 60% of the requested value.
If the average asset value across 12 months is greater than the loan amount then `"preAssessment": "100"`
- Default value to be used `20`

## The Final output to be sent to the decision engine would contain minimum details such as

- Business Details such as:
- Name
- Year established
- Summary of Profit or loss by the year
- preAssessment value as per the rules

## Judging Criteria

- Engineering principles & standards
- System extensibility & Scalability
- Testability
- Brevity and Simplicity

## Bonus Points

- Docker

## FAQ
### 3. Run Fixed Width File Scripts
```sh
python app/generate_fixed_width.py
python app/parse_fixed_width.py
```

### What is the time-limit on exercise ?
### 4. Run CSV Anonymization (Streaming for 2GB+ Files)
```sh
python app/anonymize_csv.py
```

There is none, ensure you submit your best attempt and as soon as you possibly can.
### 5. Run Tests
```sh
python -m unittest discover -s app -p "test_*.py"
```

### How to submit ?
### 6. Run in Docker
```sh
docker build -t data-engineering .
docker run --rm -v $(pwd):/app data-engineering
```

Submit a GitHub / Bitbucket repo for review. No ZIP files!
### 7. Troubleshooting
- **Ensure you have Docker installed before running Docker commands.**
- **If running tests, execute from the root directory:**
```sh
python -m unittest discover -s app -p "test_*.py"
```
Binary file added app/__pycache__/parse_fixed_width.cpython-39.pyc
Binary file not shown.
Binary file added app/__pycache__/test_parser.cpython-39.pyc
Binary file not shown.
41 changes: 41 additions & 0 deletions app/anonymize_csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import os
import csv
import hashlib

INPUT_FILE = "app/input.csv"
OUTPUT_FILE = "app/anonymized_output.csv"
LARGE_FILE_MODE = True # Set to True for handling 2GB+ files efficiently

def hash_value(value):
return hashlib.sha256(value.encode()).hexdigest()[:10]

def generate_sample_csv():
"""Creates input.csv if it doesn't exist."""
if not os.path.exists(INPUT_FILE):
with open(INPUT_FILE, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["first_name", "last_name", "address", "date_of_birth"])
writer.writerow(["John", "Doe", "123 Main St", "1990-01-01"])
writer.writerow(["Jane", "Smith", "456 Elm St", "1992-05-10"])
print("Generated sample input.csv")

def anonymize_large_csv(input_file, output_file):
"""Handles large CSVs using streaming (row-by-row processing)."""
generate_sample_csv() # Ensure input file exists

with open(input_file, "r", encoding="utf-8") as infile, open(output_file, "w", newline="", encoding="utf-8") as outfile:

reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()

for row in reader:
row["first_name"] = hash_value(row["first_name"])
row["last_name"] = hash_value(row["last_name"])
row["address"] = hash_value(row["address"])
writer.writerow(row)

print(f"Anonymized CSV generated: {output_file}")

if __name__ == "__main__":
anonymize_large_csv(INPUT_FILE, OUTPUT_FILE)
3 changes: 3 additions & 0 deletions app/anonymized_output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
first_name,last_name,address,date_of_birth
a8cfcd7483,fd53ef835b,948595bd04,1990-01-01
4f23798d92,9f54259010,9da203103d,1992-05-10
10 changes: 10 additions & 0 deletions app/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"ColumnNames": [
"f1", "f2", "f3", "f4", "f5",
"f6", "f7", "f8", "f9", "f10"
],
"Offsets": ["5", "12", "3", "2", "13", "7", "10", "13", "20", "13"],
"FixedWidthEncoding": "windows-1252",
"IncludeHeader": "True",
"DelimitedEncoding": "utf-8"
}
1 change: 1 addition & 0 deletions app/fixed_width_generator.log
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2025-02-24 13:16:37,194 - INFO - Generated fixed-width file: app/fixed_width_input.txt with 100 rows.
100 changes: 100 additions & 0 deletions app/fixed_width_input.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
REZ5NJ48HIF3X2QC5NJ08T8DO70VYWVITVTHQ1HXDH8QSND5X4K50DF8X90T69ZHQ3A1SDXCH1Y9JIGDN0W4JGFJ7AEL397F2J
KWS5KFCQ4T0CNIA2V9POVJKV50UHAKO5391YCV9JLV9EL37DYRQJYV37DUTXJRDXRVO8N0HWCGK8C2XFZVE1NTNJSLN4U4ZPMP
ULZURENV4K53B4MJOZ2CKMYIL537AJQABDY2TJFC2PQLOHQ7U0GWGZ7I0SJGFX7YST56221AQSVSL02JHKOVGLBOXGG1QBZ6BE
HNTJXVJQ8YC8LVPXYONYKVG1BHYV54O69L1AJ2FT4W4ZTYJHBRDC0NENCQ4329IILERGM163YHQYBR7TQ77M8XVMP2J4G80GAP
IPLUI2NC412V8EFMZHRAEAWUWA2R1DPKIV5Y1X6U4O19ORHBAHHK6LY009FJR1PSE2LUMJVMY6LQBXICLJI2OLHGDIGS5CT1WJ
791PCW1IH5C2IJAPP3KD5V4UGS9522HYDLUB99IK26PNTIN25OGWDOM4MQNGTP6CO55WMFRXZ2J7ETRJK06YQI458MBO6ZBATZ
FMD2JPGKZO3TZLHI6XP18OML3TSZ0YT2IWP9HUH7OH2CL0TOP137WU5EGSS4WNM5XF0JHI7HDT283FULE9PUSI7PGN7FIZJ72Q
10Q84UORSQMBY8ROXWY9I1H3HMXHAVZ1GW83TWUWVGUDGC1G6R4PQ52161HHYWAFD3ZUKTWLHPE5DZ8IOSBC71AH2TF5LWE8RA
VJVETD9VKLF3G3FIBEPV71MLH8AX09MBPVIJ3PJHG3S5D90345E4T4PB60WQ0FDHWZAELNPOP8LACXJNTAW7NSAJHREFUI8HSG
NQNSVGKPQKR9BGGU66A27DHH32NKDGE5AIR3UHVD4OYO83AC3E362B55FV7BQG172PXQ6GH1L7728I9RG2D38WJNRCMO7XHU9E
JIAALW91MHLG7MRMT9TDFO9DAM3PQNTSU4X16MCYBEN3QZG4MMFWMUCHCIW4YD9HOJAR72ECVNOQXWEUXPP3ZJ6F4UOHWRIDJV
ZG3JTFLI5OOSOTSM8RXBHP3CBEZ6IOHE2GMWIQLASOYB4RVX4WMXA1YH98L0GE2QQLY77GGCK7WAQ6ZZO3UKZ8SDOHVKA9BMQG
PBU1L6QZ0TBTH4IVMI758HX23RE8T5UTY1NFG48XBJBL2YROPSATMVA7UQ2OMFU9AGCDNFVBEAFYHXAFIT9F7BMN3R23803JT3
J5VY5E18GHEVS062Q1TEE2C45CDPC05EP7Z0FPWFEWWK1WYSF8QX3H2R3HFZIE8COH02DOEVP8EQUQ1N6OJHW7P37D2PG1NBZH
40LBAHZ4RCK36ERZ9LBGPZTWIQ763LHQ50UR1WBRD7HSQARILU4EEKGGHUVBYJEYV5V4NHUWAHCRJYUEXFQAWW34BHMN98TDA5
KY7CRWFLW2VIZ4OGI31JI1L3M8DC4Q8AAJPD6SY5JLQZFH3A5P4002CMS0OK32PZ5VX4FBTQ08WUFBCS3SJ8M3SWL5I7652SNG
C8COI9A43901LOZVKU2X0FLJVGGBV6CM87D2YHCFAHF1KZ1CRW9QU9WQ9FZ1OEOJ3SHVLLU0LR41ZNXIWPOP1A9XOFMNGP4TTO
J13XV54YREUEE6T3NO0QT4PZSTSOPXT2NIH8H13SLXQJZHUNPELR1ML0567DSDINUOGYH6EAM32VM0C92ZH0REK3CFVHTRRASV
V85XDV7EP6NUXWG8I2627H12KS3LZJ729OC91LZCQ3Z8P5LJSOD5NAFIZ5Z3G7KVSWKB8SEKRH8IAG26GADP6PIIM893G7P9H4
FJX7VGI7FODRLQI7OL42SAUW4J49R8TCRCRE2UYU410YITOMMBTWDEZDLOPJ1WDZV3KUBK6DP6ZT3242RFQ7OGCXYC3H4IPEAC
NO9VAXXC2V923LSJ3UVWDJ3S0TB9WVAOL6W8G75EDKW2WI6DU70B0O10G8KL0GWELK4HYAIIJG8L7FV8L83HSNHZH783XJ29Q3
N6BOHFZQ1FZX88GVB1HS6Z0W5OXXRQNO1RYCN9N0FE9B1RQXDCOWCLKN8YKGD9T2KT0LZXYAGNT5KKKW410BZJPL3A7M2M8Q7E
9R1PIJHEDGMFVB1LBQ0BZSMOR7YR6NA84Y0FVLME37MHCKB6YWCLAPFDERB047GZ96LENNX6DDY3FD5OAM28O5HSPYF3KDOG5B
1K0RFYTE34BDPDDZLD7IJVACWTU4J8FKO4CDJVL426SCEPYXAC3G2IDGFHJS8JTE5WHULD7HC7WY8HZUVHXKUBGNYD9IK501E6
TL1MRQQD4ZD8S13D5WBSCA37G2BXPTYFSZTQLEJNJ8GBZ4Y0GABSB380F4Y6YMV8VSZT16CVWRDKGIFQ1TYFW0W27IIJROLQ3Z
R1QW85AH17EWWQGGOHLIHECHXO4JNHDEJIOD99U53SKJKO4M85Q7ECDHFGEFPN6JMSXZW5QH3DOD0QGZYP83171ZV0YWWRUXUM
V1HQX24I2HQ5QU6XLOWTXY8SMAG1W5UT9KAL1CC4SF6B3NG08HV81OKC9J11SF520WWIA8VD025M1V6GSZUCLLBLO8YL0MG9QK
WH6P5UU6Y43JHA9Y5ZRQ29A0IO2MBSWLJURVKXAEU7C7MNG259J4BISEN726JS2D3ALO11KQZU43S895AKIP1N7K3VGS1NHE44
H5B0XYTB73LVUFXMYJ8QOM42NKST3NSORUN70UOHZ5APGHVGQ9SLQBB9HJI6YI0ODQZSXE0LFBHCK2203FNT7IYUWKR10YWJEC
QZ3Z8X6OT1HRK9VY93YJUE3VF67M8XQR2Y2DWOA5RN4U26O1LYZZDZMHDS1RQWKTECZDEBEDPVJD4BJ68ALFAGYV6WHM88WRL5
SW4H33LWMCQFR3G172B0ZJ2814MP5URJ5PZRL7923A8L8PF5IRJ8MARU4AQZPV14B1B465PM6E61JHUFV0A4R3SV5J7CX4KABJ
66732D935CAC9H7KN3DZQQBITO2XX11CAXBIF26FVZ61VXXLLOKR4T565W09HV71YKA07BZNVVDGCIHAL2ZK7CPJ9F0AH9I3Z4
YBFVBLR4ZH4CX4BY9YZK3LH8Q90BDJEOTPYYIECKJ3OE6Y6OX5JGM1CFQ0LVFOJS0O1O7KY6ZYU0GB3S7I72C15M4OPAJTE5KU
7DME64CFX64K9R1H5H6X4Z07E0PFYBRCLY48Z2GNVWJCNPIXICB84E9X6LIEGX7HEOGXSEHTNKT9URHOTLJ9GD3O2B2VRVLOH1
CQ20VRO28537LJCL8OKTIRJ2RJ0YA9YXMAB8EPP29EUCS4BT4649BZ0OW8316QB0YR6ZL7EPOBFBB7EO3J10WFB78EYT4FGC36
0U9POLYFLX7977MIC8LMCWLI5PF07JJYJFANWWC4YDO2597OVBF4PUTID3YDAGO0IWZ74G59KKNIB553A0H0UHX5R9K9B4HHCY
ASVE3W3S3UQER9CF3ZEHCNHG72LTFVWO6NALTV6KU5F20F30N449B210OGZD6OKH4T3GO4ZJMOCU6DTHB8CNGIF3YKMY72K41F
WLCOOXQ8L7CN3QM7ZBLDJPHA1XIC3OW7G71ATX5QMEA9KF8EZY1FCD462SZQL6ED1F5TBWXVDCRQ82J1038GUPZV5N6BOETVVQ
V467Z3JH9I7AVJ8DTJXDF8FR7DFUP7D56EJZUXKVB5PYLP6EUVUYPNR4XTC4PEM70WONPVU4XPRR0PR7BGO53Q34V6C8STEVN3
UZ06G65FVAEW5Z6RUUOZSC1ADZCBG54MVC09ZVKF5PKAS3TI9NE10TLRVEW0C0LFUS335U73E1DPSWF7552MK6IJA5ZJ84VLJE
K7K84Y1FHNKMRD7D6MR7PP2VUYD0GDNPQZYGXCMUXL6IGK6SVGRJILY4KMDC2CN1C5TRJN1F8XPFCO6SY3THPA030UVZD62ZKY
JHSY2RK9UUH69H0EMN0TIZG8WFMXNSVLLN81DMLXPZ55MNN4W4ZCRJ4H8BX4JWLIS27YG61BSVLDVYGNXN2STQ0FN82CBMCSM6
YC2HQHRNPFPIBHNH2K6EPCR2UJL13LTKWKO0PRCN9U9RC38FEZAAUPPLAZIPB0RNF0NLJA1P7L080A4XVH76HDFRNP6APN4ALI
T9OUYEH8XKONUFL8QH6TU4S6TPUAFESTFG559VJNZR8ZH2U7SN9CGAXL6ZSO6VD32NSOSVNECL0A7XVQ354X0OMIYLIP11K244
TS9SEQH0M2L83LE2U9EA8VCIU3T7GJK2GHR5RS1O7D4KSYZ5NGBW48CMZ5FI8ZBWDF375YM8YXI2HZ8PM7W0GBXDXMW12ZT6GB
Q156ASZC1VG2TZQE3QN75X9924JXUE1P80PPUYWCX5JQF5KHEXKJ67C34T7KWWCM5DW36J5C1P44LSX9K3ISFSUT9NKNSG36T4
GXKWJKT3TEWX396CADOK9PYRSP3TOP1C2OW3D8UQEXS4JOU6D3B34QFXP7ZTGGW6XEFVDWHXEIJKI2F888L5NKE2EXD7K8534P
UZLI3ITAVQXAD6AFT93TG1CTMHPQUIC3VM2JSM9PY8RPFSFH1U349F1N05AJMQNOFTAGZQJTQ26MJU5SPYBDBFKJVHN0ZWEQ0S
BXPI87NZM1MK392237P2TB8D7YN809U3JTIRTSL16NOFVMOHZEMFYCT7SUB2H8MQKVE2Y9F6JRFDBXKHJ1ARQ22H15GLM9PJ45
XMMUQX7E7XI1VZHI1L0TD1E429MKCY68VQK19WQ9RI5RRUKKHSUYE2L4B1A1P4S30PGOXRSNOPK8RSJ92WAB1H31LGF86DF7U1
TOF297DO77CFTT2K3E2KPZQJPYVF5MCFEXV9AK3K9GZA05O0F2UI1Q1X75PS89KJBP8B2PMU5NF18S6CWVK4EGQYKES55QA2OQ
GEB0XBGTFCPDQ0KNODCVZ2077968D0H1FR60V9SJ8AF75S4ZDX8ALJ5IBTL5IH7MA2D783BY0BUNEEY5C48MCSU2RKR36F53V9
J60LBDPUYXZPSPXDGA94L3AMM5FVFZ70EMHLDB6H3OUSH629LPSNDYBDPC67KRWFBKGG2SNTL8QXDVT60JUNRHDHJPRG2BBRHV
OAU3DWZXFTIO6XV4IOJ83L1ZW12JPYG1S106D71US7W36Q5F82L79QIOURCFC4O4QAHEU29RLKSX14DWRG4CMRHENVIZ63VT4V
MLM8KB8RNJ9ARA8ELX26SD63DEMGGEWG3VSUPJPWDD3RZMHKAKU78AWF9QBYFMIKQRC80AGIL9N9OCLQSWUAHIXFYN86KRJKSU
LOQ43U499S6HS31CAZCMN97V1EOWTX0I1UEPD3NNM95K5QTK685Z9L6ULOR1BLYYQWNC8K26VSZ08CU6QS2ZRKOTMBV4F5T8JK
W1WJSEXSW61ISUV3NRF3DVEIY96MWIFCD0RY9G8TBLA4T2BPUXG8PKKFL42HMIKFMNWDDR58UHI93DCKOWG4ZN46SG68BI6NDV
7J0837PKC2K4HNQA7H4MXUXC7C4LMEFU5BHU1D0AX0F0U59LG0X4T804M1EDP9RYQRQ8H8KBA0T7CTXR2KQE9BN1X7NW91OFG0
YIG64TNE1664J4OHOE25BSHZ3MQMXDER3GUPAI2QDD6XTKOLK36US8JJLGTOMCICW4ABA4YM3L4MEIWG69NG3A55IZYTKCHVMO
YWZYMEZX7P3PD6CWQMONNVSA31FYKYZISVVPBMTWBRGII1MT3EG8CPZFCEYIPA55830S0DH8X8WTE27R7TY7VMOSAS5XFG3Q1U
N5OXDAJMJELBD443879V35ZRGQT7CMZ7U3V9DPPVBH5UT8VPJ9SZI1W26J2GQLLCCIGJBN195COY9I36RIFEA7V0NMH8GOAQVB
GYYLMPUR44PPC2ZWUUYA6LNPNP7C5BBBBJMGBATWQRRZD5UJOEZX1NZ89KTE3EXJ6C8LVTUQEGFLD84L286N07L7029N7H909A
XFQ75APVF5A1CB4JA8XZ8KFQO01C95T5F7TOHTTB4X9QCCFUVH4JSLLJ8FK9U0COX4G5FSZQLJUL5IB61KGPWAN753YQB7AM06
DY843H04T0TBO0BA1ZLMQWSW5QYMIOZ5CGEQ3CRCJLLJZ83ESWJ5D60FWL1D0A63D9NM4RYWVWH4EIGKG1ZE4J47W6YQMOOWWK
GGNZ215GYXH2P4O5RF9J5L7PTTZQO4IC7Q47ZB6TBMVY8USPIAA9TE942IRV671UTPDDXVQOJMO087NC17J6X50QGW41D3ZQCL
Z90QAX13Z1MYG59HPRBL5RJNYPT24QWJ9ACAGY9KB1U6H26OP945UCV3K0YPE47M37XP8NVN1DOS1BNFUWQCKOO5ZLL3EKDA8O
32SAP8BK6QD8ZGT890M7I5JODEAHWNU7P6OI1CWN1KTWVURKOIEIEKW2COD08EG7AEMNYQD6DRWLT9OIZ7PMKI030TY32HTFTW
MXAC92EZSJIL87CL8GO5MQ11B9E13LXV9EGDETONOU1H8ZVQI548MTDJTNDT5ODR20NWAVET5SDMNDYCATOZU4CE5VVDY0D1LR
XWO05CA18AHJ0F3U0OQGISUHCNH8MYAJKKLJ3OSO6UUWOC2BTZIGFV5Y4DL8ADVLPW6TYNUWKYWPRB6SU8Q9YQ2T6WLPBUC3SX
XCU2TK1WPE2IFWW76R9RJEV5SYI22EO4P8AC6OUMZQCO89XOED7MO0JMN0EVKMTGSPE2DXKONCH2PL85A5XII3AXRG2U5HPNDE
726C7MV1O3VE5H7LHWEWQRN786GEEZOOJHE5KXPRSFPVS46D0UU2640Q7Q151EX2SJQZNEELP4SBGZ7DZV7OFLGNNZLOF5KHOV
AGMNR4VXYUSIWYEO3XLN3577RLCZ23FAD6FWZPTAZNBHQLGXSLWXGKDOVWIQ00NIWCTI4PZWPYBTKK53B47WUQ4MT6JI99CL5W
7C413983MYEMT0QMLT5BML76KCMZ6M65QIYJ6M1FMF8RNXB7Z2FILBSCBT33XSTDGFCX0T9CMAC5IHUHT5OBRTBKJNK6UUMEMP
IUYB2K04WSZN5CBFNUPVNWD9SI4FH47RPOLJVWX78XHCZTHYAJ41HQ37AG4OSM3804FMKWNM37MZ5F03CO06KJUV91QTRVD3O1
GY6YUBECKGGKZF6NCW2LHKSZDEZMVUE7OY353HPEAPUBASVNMOY7CRSMZ7GHVN2RBPAQ5AAKTJ8QPNQEATDJP68PV2J8PV938J
KSDAX1AD3QDZAUITY88HJIVY11C09EWOHW9KG2OAUJR0MDD1C4MM53LZG5BXG6TTYWOT6M4C1AVNIFMQTEENVZXRC7AGU1KUB1
52BSHSHJ3S8IXPGYAX9N906CFJFQIIPHI767IS21BD4OMUTSXJJH1BMZ0A1NE1S0ZDLE0XCXVI57UCIWCDIJ64M49SSH6Q2223
JSHB2QYSVLUG49BY4X6MD13KW9UVM5JB6ZB0XSZB2EXHQ2QUWERUMAWLW0Z78NCYTSSKNQQ0Q8JHSJN07H4GCVBXJJI9Z2ISP7
YUHPIO58KKNVLC3I889CWKZPBPXCX9NK3S6LGPKWUS2RXHNOU7MIBR2K21I7DWSK5KKP4W4652Y2ADBDTMDD3AW72EACWUTN6A
5TXCTROAGGZ8LIOPZYZYTKZJ8PXT7I2VAY3XO7UFVF55G8GNMCIZD00U2IHAU1IU39P433T38RVQ0HL2CD5SM69ABOWUT4FF0L
JFX5A92V05N8LCM39PHZT3SJERM0FB82JQFBV2GEPJOFYGSP6R0TB6QYN009M0KRNNBKG9BIBN0RKBXZ3MTY9SXPATSWMUAKS6
4SWIXUD0NYY2KXZH8D1ATO27LVNKFNEQGKAGKS8AF8L9BWPHZ5V91HQ5JP6268KIRF6843A8D7JDB1ZLQHNA6EI1BBAUOPV7DY
PR8TO4VQMM5XFXIY91LCDVPCAL5TNFAE2I0QWWC94EF72DEZF40GXTUFSX2TX9BMMIB1R6JMOF46EK0N1Z71DZA5ILM0ACQNMZ
DNJ7MX0NU93Y2PKKZXGNPXTLVVSN6G20KSM3MOTGUITR3J8JA2EYU1WY5XN1TFY4ESWS95TR6NQNU46F7UFI793RWHZKS2JQV1
B1DORSA3ONHMTWPXXY3TQPY3KCW1NHHBN7AN04A9KG7WU1C14TBLZSL9760E8LUFH9BLSK8P8RDPUT34AZJ8YVAHWUNSYZMKEK
6L9XOB2K83VC2XXNR7FAMRDS097JSWKMXVG86WBN82GFX4KHSE1F3QHXXCIS6U10EF500QUILQ1W3TGOZHQCM03CWPIN8PEBEL
6XGYWDCLSQ3FCNMWW9TISERSH36EQWB12J8ZV46SUYPKJQAFLZH7QIQ8UZO91EWW71XNZ3CU70LJTSTR83ZCBDFR3BPPE1284J
WZJVKDLJSPQCFK8KUZ7N1I4YFP66K35QTSAWLRDWLXZAI34VMS9GNUUT3BNKZIX5XYJ9D6O9K9VIVZRG62II6PM43ACCEYZYQ4
1LJG59ASXCMX4GPATGWSCB3OMLNZEURWP67WEVIIY48PKE9I9VO92FDVKNELZACRKBXWCXL6ELKPEYUZTFZOABW20262U9CSC4
5BNKSFRHTK67JCOFUPEF83OOFVV5T3YAKIR5N849W2HDHM558LJU2BFJSRPBQWWDZHMPDG40LKLLSUGA3CL7L2YOMSI01F05K6
R0K568H7CH2DIZUMVM7DZ5JXD59MUEMABY31YN909FCOCFBTYFNG9LZRS22PTRN3ZOJKDKU3JEGYA3EH0S2JNY55CJET6LNYJA
GX7KMQ20IH80OXROK15STSDCAMHS53MTJIJA0V3E5F006ADRCOK25J0C59B161CXWHY1PWO1VONR5FJDHHYYDJNSSK1QH7S7P2
9P9LRL3NIJO4I9CPYDKD3C7V514X6ITE3OBR9J11V2KF3YQXWUYM5UPNQFLD00NM2S77OFRPJ56RYZPD9JT6WUXMTXHUXH1I4K
STZY8TNP7XFTFU6GKRI34VBJLBMADEYNT2MUW37SH9N1OX325E4ISU7C5JFMYGT53M8GBR7GUXFR47G13HOW7C78UPIX5W8S1H
92JI03A47UII2VXQ6QDXO7MNX59U7WK72KB4CRE9RXQVCP7QW9HW4FQU0ILB066FNLVSXR1J3KVE3Z02T333YL477ED7XR4KLS
BAJM4M1TLKBCSXCTRZ1AAVTUU6C3QIHGJQLRGNM1A5P1DTLKD1602EOBC8RGTPUCD8DAZVYOFHAERNNVQPQY1P3U9K3L2XIXL2
F8BXGY7BQKBHKP09MXJ2Y2JSOCX3LQXD6Y04M37P3YIP4IMMBQCIF8BZRTJZ3DCOCT9X666OQB4M79CWY65TL0E62524Y8O18B
7HC07QYW90PJWI4OFDMG0BKY3IPP0QKXC5I3YAPROSUASLD9O9ZQQO74KFGS9L8AZ294LKQBPICARZQ9IRRLM6D88YRLKB429U
XRO5A0AFMWIN3VAFKC11ZIVZEVJMMBDE3N3BIFAW3Z2J1EQV13CZ2ADDR5JDQLED7SNCHGR8C80ZO68PXBIN1VV7G1FW3Q4Y03
B5CIJUYW1IY53AIX0946EXQUB11JSD1IOYDWQGPXI12AOBMMDTLU3STI7R3UALMR9TF4CJIM2NMS36FBHKDIPCJ8ASJRMXUWJ4
28 changes: 28 additions & 0 deletions app/generate_fixed_width.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import json
import random
import string
import logging

logging.basicConfig(filename="app/fixed_width_generator.log", level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

CONFIG_FILE = "app/config.json"

def generate_random_string(length):
return ''.join(random.choices(string.ascii_uppercase + string.digits, k=length))

def generate_fixed_width_file(output_file, num_rows=100):
with open(CONFIG_FILE, "r") as file:
config = json.load(file)

offsets = list(map(int, config["Offsets"]))

with open(output_file, "w", encoding=config["FixedWidthEncoding"]) as fw_file:
for _ in range(num_rows):
row = "".join(generate_random_string(width).ljust(width) for width in offsets)
fw_file.write(row + "\n")

logging.info(f"Generated fixed-width file: {output_file} with {num_rows} rows.")

if __name__ == "__main__":
generate_fixed_width_file("app/fixed_width_input.txt", num_rows=100)
print("Fixed width file generated: app/fixed_width_input.txt")
3 changes: 3 additions & 0 deletions app/input.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
first_name,last_name,address,date_of_birth
John,Doe,123 Main St,1990-01-01
Jane,Smith,456 Elm St,1992-05-10
Loading