Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree.ParseError: syntax error on import #14

Closed
fabiosangregorio opened this issue Jul 8, 2023 · 10 comments
Closed

xml.etree.ElementTree.ParseError: syntax error on import #14

fabiosangregorio opened this issue Jul 8, 2023 · 10 comments

Comments

@fabiosangregorio
Copy link

Hi! First of all, thanks for the project, that's amazing! ❤️

I'm running into a crash on the ingester when importing my health data. Please find the logs below:

docker compose up ingester                                                                                                                                               1 ✘  09:15:11
[+] Running 1/0
 ✔ Container apple-health-grafana-ingester-1  Created                                                                                                                                                                                         0.0s
Attaching to apple-health-grafana-ingester-1
apple-health-grafana-ingester-1  | Unzipping the export file...
apple-health-grafana-ingester-1  | Export file unzipped!
apple-health-grafana-ingester-1  | Influx is ready.
apple-health-grafana-ingester-1  | Loading workout routes ...
apple-health-grafana-ingester-1  | Opening Route 2022-03-07 2:14pm
apple-health-grafana-ingester-1  | Opening Route 2021-10-10 3:57pm
...
apple-health-grafana-ingester-1  | Opening Route 2022-11-18 8:07am
apple-health-grafana-ingester-1  | Opening Route 2021-04-20 7:14pm
apple-health-grafana-ingester-1  | Export file is /export/apple_health_export/export.xml
apple-health-grafana-ingester-1  | Traceback (most recent call last):
apple-health-grafana-ingester-1  |   File "//app.py", line 185, in <module>
apple-health-grafana-ingester-1  |     process_health_data(client)
apple-health-grafana-ingester-1  |   File "//app.py", line 140, in process_health_data
apple-health-grafana-ingester-1  |     for _, elem in etree.iterparse(export_file):
apple-health-grafana-ingester-1  |   File "/usr/local/lib/python3.11/xml/etree/ElementTree.py", line 1249, in iterator
apple-health-grafana-ingester-1  |     yield from pullparser.read_events()
apple-health-grafana-ingester-1  |   File "/usr/local/lib/python3.11/xml/etree/ElementTree.py", line 1320, in read_events
apple-health-grafana-ingester-1  |     raise event
apple-health-grafana-ingester-1  |   File "/usr/local/lib/python3.11/xml/etree/ElementTree.py", line 1292, in feed
apple-health-grafana-ingester-1  |     self._parser.feed(data)
apple-health-grafana-ingester-1  | xml.etree.ElementTree.ParseError: syntax error: line 156, column 0
apple-health-grafana-ingester-1 exited with code 1

Would you be able to take a look please?

@k0rventen
Copy link
Owner

Hey @fabiosangregorio !

It looks at lot like #4, which was caused by malformed XML. I still have no clue why this is happening, and I've never been able to reproduce it on my side, despite a lot of exports..

As a first step if you could load the file in a XML parser to verify its validity ? I'll keep the issue open, keep me posted 😄 !

@fabiosangregorio
Copy link
Author

Yep, looks like it!

6e1908e4fb290f36765bf42e387e5155

Here are the culprit lines according to the screenshot:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HealthData [
<!-- HealthKit Export Version: 12 -->
<!ELEMENT HealthData (ExportDate,Me,(Record|Correlation|Workout|ActivitySummary|ClinicalRecord|VisionPrescription)*)>
<!ATTLIST HealthData
  locale CDATA #REQUIRED
>
<!ELEMENT ExportDate EMPTY>
<!ATTLIST ExportDate
  value CDATA #REQUIRED
>
<!ELEMENT Me EMPTY>
<!ATTLIST Me
  HKCharacteristicTypeIdentifierDateOfBirth         CDATA #REQUIRED
  HKCharacteristicTypeIdentifierBiologicalSex       CDATA #REQUIRED
  HKCharacteristicTypeIdentifierBloodType           CDATA #REQUIRED
  HKCharacteristicTypeIdentifierFitzpatrickSkinType CDATA #REQUIRED
>
<!ELEMENT Record ((MetadataEntry|HeartRateVariabilityMetadataList)*)>
<!ATTLIST Record
  type          CDATA #REQUIRED
  unit          CDATA #IMPLIED
  value         CDATA #IMPLIED
  sourceName    CDATA #REQUIRED
  sourceVersion CDATA #IMPLIED
  device        CDATA #IMPLIED
  creationDate  CDATA #IMPLIED
  startDate     CDATA #REQUIRED
  endDate       CDATA #REQUIRED
>
<!-- Note: Any Records that appear as children of a correlation also appear as top-level records in this document. -->
<!ELEMENT Correlation ((MetadataEntry|Record)*)>
<!ATTLIST Correlation
  type          CDATA #REQUIRED
  sourceName    CDATA #REQUIRED
  sourceVersion CDATA #IMPLIED
  device        CDATA #IMPLIED
  creationDate  CDATA #IMPLIED
  startDate     CDATA #REQUIRED
  endDate       CDATA #REQUIRED
>
<!ELEMENT Workout ((MetadataEntry|WorkoutEvent|WorkoutRoute)*)>
<!ATTLIST Workout
  workoutActivityType   CDATA #REQUIRED
  duration              CDATA #IMPLIED
  durationUnit          CDATA #IMPLIED
  totalDistance         CDATA #IMPLIED
  totalDistanceUnit     CDATA #IMPLIED
  totalEnergyBurned     CDATA #IMPLIED
  totalEnergyBurnedUnit CDATA #IMPLIED
  sourceName            CDATA #REQUIRED
  sourceVersion         CDATA #IMPLIED
  device                CDATA #IMPLIED
  creationDate          CDATA #IMPLIED
  startDate             CDATA #REQUIRED
  endDate               CDATA #REQUIRED
>
<!ELEMENT WorkoutActivity EMPTY>
<!ATTLIST WorkoutActivity
  uuid                 CDATA #REQUIRED
  startDate            CDATA #REQUIRED
  endDate              CDATA #IMPLIED
  duration             CDATA #IMPLIED
  durationUnit         CDATA #IMPLIED
>
<!ELEMENT WorkoutEvent EMPTY>
<!ATTLIST WorkoutEvent
  type                 CDATA #REQUIRED
  date                 CDATA #REQUIRED
  duration             CDATA #IMPLIED
  durationUnit         CDATA #IMPLIED
>
<!ELEMENT WorkoutStatistics EMPTY>
<!ATTLIST WorkoutStatistics
  type                 CDATA #REQUIRED
  startDate            CDATA #REQUIRED
  endDate              CDATA #REQUIRED
  average              CDATA #IMPLIED
  minimum              CDATA #IMPLIED
  maximum              CDATA #IMPLIED
  sum                  CDATA #IMPLIED
>
<!ELEMENT WorkoutRoute ((MetadataEntry|FileReference)*)>
<!ATTLIST WorkoutRoute
  sourceName    CDATA #REQUIRED
  sourceVersion CDATA #IMPLIED
  device        CDATA #IMPLIED
  creationDate  CDATA #IMPLIED
  startDate     CDATA #REQUIRED
  endDate       CDATA #REQUIRED
>
<!ELEMENT FileReference EMPTY>
<!ATTLIST FileReference
  path CDATA #REQUIRED
>
<!ELEMENT ActivitySummary EMPTY>
<!ATTLIST ActivitySummary
  dateComponents           CDATA #IMPLIED
  activeEnergyBurned       CDATA #IMPLIED
  activeEnergyBurnedGoal   CDATA #IMPLIED
  activeEnergyBurnedUnit   CDATA #IMPLIED
  appleMoveTime            CDATA #IMPLIED
  appleMoveTimeGoal        CDATA #IMPLIED
  appleExerciseTime        CDATA #IMPLIED
  appleExerciseTimeGoal    CDATA #IMPLIED
  appleStandHours          CDATA #IMPLIED
  appleStandHoursGoal      CDATA #IMPLIED
>
<!ELEMENT MetadataEntry EMPTY>
<!ATTLIST MetadataEntry
  key   CDATA #REQUIRED
  value CDATA #REQUIRED
>
<!-- Note: Heart Rate Variability records captured by Apple Watch may include an associated list of instantaneous beats-per-minute readings. -->
<!ELEMENT HeartRateVariabilityMetadataList (InstantaneousBeatsPerMinute*)>
<!ELEMENT InstantaneousBeatsPerMinute EMPTY>
<!ATTLIST InstantaneousBeatsPerMinute
  bpm  CDATA #REQUIRED
  time CDATA #REQUIRED
>
<!ELEMENT ClinicalRecord EMPTY>
<!ATTLIST ClinicalRecord
  type              CDATA #REQUIRED
  identifier        CDATA #REQUIRED
  sourceName        CDATA #REQUIRED
  sourceURL         CDATA #REQUIRED
  fhirVersion       CDATA #REQUIRED
  receivedDate      CDATA #REQUIRED
  resourceFilePath  CDATA #REQUIRED
>
<!ELEMENT Audiogram EMPTY>
<!ATTLIST Audiogram
  type          CDATA #REQUIRED
  sourceName    CDATA #REQUIRED
  sourceVersion CDATA #IMPLIED
  device        CDATA #IMPLIED
  creationDate  CDATA #IMPLIED
  startDate     CDATA #REQUIRED
  endDate       CDATA #REQUIRED
>
<!ELEMENT SensitivityPoint EMPTY>
<!ATTLIST SensitivityPoint
  frequencyValue   CDATA #REQUIRED
  frequencyUnit    CDATA #REQUIRED
  leftEarValue     CDATA #IMPLIED
  leftEarUnit      CDATA #IMPLIED
  rightEarValue    CDATA #IMPLIED
  rightEarUnit     CDATA #IMPLIED
>
<!ELEMENT VisionPrescription EMPTY>
<!ATTLIST VisionPrescription
  type             CDATA #REQUIRED
  dateIssued       CDATA #REQUIRED
  expirationDate   CDATA #REQUIRED
  brand            CDATA #IMPLIED
<!ELEMENT RightEye EMPTY>
<!ATTLIST RightEye
  sphere           CDATA #IMPLIED
  sphereUnit       CDATA #IMPLIED
  cylinder         CDATA #IMPLIED
  cylinderUnit     CDATA #IMPLIED
  axis             CDATA #IMPLIED
  axisUnit         CDATA #IMPLIED
  add              CDATA #IMPLIED
  addUnit          CDATA #IMPLIED
  vertex           CDATA #IMPLIED
  vertexUnit       CDATA #IMPLIED
  prismAmount      CDATA #IMPLIED
  prismAmountUnit  CDATA #IMPLIED
  prismAngle       CDATA #IMPLIED
  prismAngleUnit   CDATA #IMPLIED
  farPD            CDATA #IMPLIED
  farPDUnit        CDATA #IMPLIED
  nearPD           CDATA #IMPLIED
  nearPDUnit       CDATA #IMPLIED
  baseCurve        CDATA #IMPLIED
  baseCurveUnit    CDATA #IMPLIED
  diameter         CDATA #IMPLIED
  diameterUnit     CDATA #IMPLIED
>
<!ELEMENT LeftEye EMPTY>
<!ATTLIST LeftEye
  sphere           CDATA #IMPLIED
  sphereUnit       CDATA #IMPLIED
  cylinder         CDATA #IMPLIED
  cylinderUnit     CDATA #IMPLIED
  axis             CDATA #IMPLIED
  axisUnit         CDATA #IMPLIED
  add              CDATA #IMPLIED
  addUnit          CDATA #IMPLIED
  vertex           CDATA #IMPLIED
  vertexUnit       CDATA #IMPLIED
  prismAmount      CDATA #IMPLIED
  prismAmountUnit  CDATA #IMPLIED
  prismAngle       CDATA #IMPLIED
  prismAngleUnit   CDATA #IMPLIED
  farPD            CDATA #IMPLIED
  farPDUnit        CDATA #IMPLIED
  nearPD           CDATA #IMPLIED
  nearPDUnit       CDATA #IMPLIED
  baseCurve        CDATA #IMPLIED
  baseCurveUnit    CDATA #IMPLIED
  diameter         CDATA #IMPLIED
  diameterUnit     CDATA #IMPLIED
>
  device           CDATA #IMPLIED
<!ELEMENT MetadataEntry EMPTY>
<!ATTLIST MetadataEntry
  key              CDATA #IMPLIED
  value            CDATA #IMPLIED
>
>
]>

the last few lines look weird

  device           CDATA #IMPLIED
<!ELEMENT MetadataEntry EMPTY>
<!ATTLIST MetadataEntry
  key              CDATA #IMPLIED
  value            CDATA #IMPLIED
>
>
]>

@k0rventen
Copy link
Owner

The only explanation is that the XML exporting process on the iPhone is borked and sometimes doesn't produce a valid XML file. Unfortunately there is nothing i can do to prevent it.

I'll check if the xml module allows reading invalid files, because from the snippet you sent, the error is located before the health records. If the module can skip this and start reading the next section directly, it might workaround this issue.

I'll keep the issue open until I have more infos on the matter. Thx a lot for reporting 👍 !

@k0rventen
Copy link
Owner

Hey @fabiosangregorio !

I released v0.0.5, which should handle any malformed XML like yours.
I've done some tests emulating your file, but could you test in on your side too before closing the issue ?

Just do a quick docker-compose pull to grab the latest image (or change the tag to :v0.0.5) before launching the ingester.
Thx for your help !

@fabiosangregorio
Copy link
Author

Hi @k0rventen! Now the ingester doesn't crash but I get the following output:

Opening Route 2021-10-19 4:43pm
apple-health-grafana-ingester-1  | Opening Route 2022-11-18 8:07am
apple-health-grafana-ingester-1  | Opening Route 2021-04-20 7:14pm
apple-health-grafana-ingester-1  | Export file is /export/apple_health_export/export.xml
apple-health-grafana-ingester-1  | Total number of records: 0
apple-health-grafana-ingester-1  | All done! You can now check grafana.

(and Grafana shows no data).

I also tried:

  • using the DTD of one of your tests
  • modify the DTD of the export XML to repair it, but I can't seem to find the right DTD that does the job.
  • export again from the Health app, to no avail

😢

@k0rventen
Copy link
Owner

I reproduced the same behaviour..

Depending on how malformed the XML is, sometimes lxml is able to reconstruct and parse it properly, and sometimes it can't.

I guess the last resort is to discard the first section all together and start reading from the HealthData section onwards.
I'll see what I can do to resolve this problem once and for all, will update this issue.

@fabiosangregorio
Copy link
Author

fabiosangregorio commented Jul 14, 2023

Looks like it's a known issue, I'll try to follow the steps there and see if it fixes the xml.
If it does, maybe you could add a preprocessing step to the xml before using it in the ingestor 👀

@k0rventen
Copy link
Owner

I played around this morning, and by discarding the whole first section the XML appears valid and lxml can parse it.
I've added a (not very clean but hey it might work) step doing that after unziping the file.

If you could test with your original export.xml that woud be great !
To grab the image with the fix, change the ingester image in the docker-compose.yml file to:

ingester:
    image: k0rventen/apple-health-grafana-ingester:rolling

Then docker-compose pull to make sure you have the latest one before launching the ingester.

Please test it out and report back how it went, I'm hoping this will work 🤞

@fabiosangregorio
Copy link
Author

Amazing, it works! 😍 Thanks a lot 🙏🏻

@k0rventen
Copy link
Owner

Awesome ! I'll cleanup a bit and make a proper release.

Thx for your help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants