Note
🇫🇮 Suomenkielinen ohjeistus: Lue ohjeet suomeksi tästä
A Python tool to parse and consolidate HL7 CDA (Clinical Document Architecture) XML files exported from the Maisa patient portal (used by Apotti in Finland).
It extracts key health information into a structured, machine-readable JSON format (patient_history.json).
- Consolidated Patient History: Merges data from multiple
DOC*.XMLfiles into a single chronological timeline. - Narrative Extraction: Intelligently extracts free-text clinical notes ("Päivittäismerkinnät", "Hoidon tarpeen arviointi") while filtering out redundant structured lists (medications, labs) to reduce noise.
- Structured Data Parsing:
- Patient Profile: Demographics, contact info.
- Medications: Active list and history with dates and dosage.
- Lab Results: Test names, values, units, and timestamps.
- Diagnoses: Active problems with ICD-10/SNOMED codes (from Problem List section).
- Procedures: Medical procedures with Finnish national codes (lumbar puncture, ENMG, OCT, etc.).
- Immunizations: Vaccination records with ATC codes and dates.
- Social History: Tobacco use, alcohol consumption status.
- Allergies: Status and substances.
- Deduplication: Handles duplicate entries across multiple documents.
- Clean Output: Produces a clean
patient_history.jsonfile.
- Python 3.8 or higher
pip(Python package installer)
-
Clone this repository or download the script.
-
Install the required dependencies:
pip install -r requirements.txt
(The primary dependency is
lxmlfor efficient XML parsing)
-
Export Data: Download your health data dump from Maisa ("Tilanneyhteenveto" or similar export). After extracting the ZIP file, you'll see a folder structure like this:
Tilanneyhteenveto_DD_Month_YYYY/ ├── HTML/ │ ├── IMAGES/ │ └── STYLE/ ├── IHE_XDM/ │ └── <PatientFolder>/ ← This folder contains the XML files! │ ├── DOC0001.XML │ ├── DOC0002.XML │ ├── ... │ ├── METADATA.XML │ └── STYLE.XSL ├── INDEX.HTM └── README - Open for Instructions.TXT[!IMPORTANT] Point the parser to the
IHE_XDM/<PatientFolder>/directory that contains theDOC*.XMLfiles, not the root extracted folder. -
Run the Parser:
python src/maisa_parser.py /path/to/IHE_XDM/<PatientFolder>/
For example:
python src/maisa_parser.py ~/Downloads/Tilanneyhteenveto_16_joulu_2025/IHE_XDM/Ilias1/If you run the script from inside the data folder, you don't need arguments:
cd ~/Downloads/Tilanneyhteenveto_16_joulu_2025/IHE_XDM/Ilias1/ python /path/to/maisa-parser/src/maisa_parser.py
-
View Output: The script generates a
patient_history.jsonfile in your current working directory.
The generated JSON contains:
{
"patient_profile": {
"full_name": "...",
"dob": "1990-01-15T00:00:00",
"gender": "...",
"address": "...",
"phone": "...",
"email": "..."
},
"clinical_summary": {
"allergies": [ ... ],
"active_medications": [ ... ],
"medication_history": [ ... ]
},
"diagnoses": [
{ "code": "G35", "code_system": "ICD10", "display_name": "Multiple sclerosis", "status": "active" }
],
"procedures": [
{ "code": "TAB00", "name": "Lumbar puncture", "date": "2023-05-10T00:00:00" }
],
"immunizations": [
{ "vaccine_name": "COVID-19 Pfizer", "vaccine_code": "J07BN01", "date": "2021-08-13T00:00:00" }
],
"social_history": {
"tobacco_smoking": "Ex-smoker",
"alcohol": "Current drinker"
},
"lab_results": [ ... ],
"encounters": [
{
"date": "2024-10-10T12:00:00",
"type": "Hoito- ja palveluyhteenveto",
"provider": "Dr. Name",
"notes": "Narrative text of the visit...",
"source_file": "DOC0018.XML"
}
]
}This tool processes sensitive personal health information.
- Do not commit your XML data files or the generated JSON output to GitHub or any public repository.
- A
.gitignorefile is included to help prevent accidental commits of.XMLand.jsonfiles. - Always handle your medical data with care.
- Log in to Maisa.fi.
- Go to Menu > Sharing > Download My Record (Lataa tietoni).
- Select "Lucy XML" (or "Everything").
- Download the ZIP file and unzip it.
- You will see a folder
IHE_XDMcontaining theDOC*.XMLfiles. This is the folder you process.
Disclaimer: This software is for educational and informational purposes only. It is not a medical device and should not be used for diagnosis or treatment. Always consult a professional for medical advice. The authors are not responsible for any errors in parsing or data representation.
By using this tool, you agree that you are solely responsible for safeguarding your own medical data.
Feel free to submit issues or pull requests if you find bugs or want to improve the parsing logic for different types of Maisa documents.
This project is licensed under the MIT License. See the LICENSE file for details.