An accompanying user guide is available at [https://www.commonapproach.org/wp-content/uploads/2024/07/Guide-for-Excel-to-JSON-ETL-data-utility.pdf]
Input CSV/Excel files are stored in the ./data folder.
Generated JSON files are stored in the ./output folder.
- Create conda environment
conda env create -f environment.yml
In ./src/main.py, you can configure the source files and properties:
INPUT_FILE
: local path to input Excel file, e.g../data/input.xlsx
.OUTPUT_FILE
: local path to output file, e.g../output/output.json
.CIDS_URL
: URL for the CIDS OWL file.CONTEXT_PATH
: URL for the context file.REPLACE_PREFIX
: How should context replacements be done. Options: 'context_only', 'all', 'label_only', 'none'
-
Activate conda environment
conda activate PyExcelToJSONETL
-
Run ETL Pipeline
python -m src.main
Update the input file defined by INPUT_FILE
.
-
Any Indicators in the
Indicators - formatted
sheet are imported. Only those with values are associated with measures and reports. -
Any instances in the
Additional URIs - formatted
sheet are imported. -
Default classes required for the Impact Model are defined in src/generators/organization.py
generate_organization()
.- To change the use case from the defaalt "loan" use case, concepts in must be updated. Program and Service instances added to the URIS will not be associated with the organization.
- If an Organization is provided in the sheet, the default one is not used.
- If an Outcome is provided in the URIs sheet, the default one is not used.
- If an Activity is provided in the URIs sheet, the default one is not used.
- If a Program instance is provided in the URIs sheet, the default one is still used.
- If a Service instances is provided in the URIs sheet, the default one is still used.
-
To epxlictly define the range and restrictions of a class property, the
restrictions
dictionary in ./src/map_data.py is used.
Main flow of the program is the following:
- main.py
- Load URIs from spreadsheet:
load_uris(input_path=INPUT_FILE)
- Load Default organization and related concepts:
generate_organization(input_path=INPUT_FILE)
- Load Indicator instances:
load_indicators(input_path=INPUT_FILE)
- Export data to JSON:
export_json(out_path=OUTPUT_FILE)
- Load URIs from spreadsheet:
Unit tests validate two inputs from the test file tests/ExceltoJSONTemplate-test.xlsx. If the format of the main input file is changed, this file should also be updated for tests.
- tests
- tests/test_excel_indicators.py tests the concepts in the
Indicators - formatted
sheet. - tests/test_excel_uris.py tests the concepts in the
Additional URIs - formatted
sheet. - tests/test_excel_organizations.py tests organizations from two sources:
- Default organization and related concepts (src/generators.organization.py)[src/generators.organization.py]
generate_organization()
. - Organization defined in the
Additional URIs - formatted
sheet. If it defined, those in generate_organization() are not created.
- Default organization and related concepts (src/generators.organization.py)[src/generators.organization.py]
- tests/test_excel_indicators.py tests the concepts in the
- tests/test_excel_context_context_only.py tests the context file configurations.
- Assumes that only defintions in the context file that match namespaces prefixes are renamed.
CONTEXT_PATH
stores the base URL for the first context file.- The test only checks for the
org:hasLegalName
on the first Organization to ensure the context file was read in correctly.
- Activate conda environment
conda activate PyExcelToJSONETL
- Run tests Pipeline
python -m unittest