Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-s: Report exactly where parsing error occurs when parsing JSON (fixes #514). #503

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

gibber9809
Copy link
Contributor

Description

This PR adds reporting for where a parsing error occurs within a JSON file, and fixes an issue where parsing exceptions were sometimes not caught.

We add some code to the main parsing loop to keep track of how many bytes from the current file have been consumed up to the previous record. When an error occurs we are then able to report how many bytes were parsed successfully .

We also wrap the main "parse_line" function in a try/catch block. Even after getting a valid iterator to a record it seems that certain kinds of invalid fields can throw latent errors in simdjson, which previously went uncaught. Instead of trying to add error handling for every single time we access a field within parse_line we just catch the thrown errors.

Validation performed

  • Validated that we correctly report 0 bytes parsed when the first record is invalid
  • Validated that we report an accurate offset to the invalid record when we have malformed JSON several MB into a file
  • Validated that exceptions are caught and logged correctly for a variety of invalid JSON log files

@gibber9809 gibber9809 requested a review from wraymo July 31, 2024 14:35
Copy link
Contributor

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR titles is good to me.

@gibber9809 gibber9809 linked an issue Aug 12, 2024 that may be closed by this pull request
@gibber9809 gibber9809 changed the title clp-s: Report exactly where parsing error occurs when parsing JSON clp-s: Report exactly where parsing error occurs when parsing JSON (fixes #514) Aug 12, 2024
@gibber9809 gibber9809 changed the title clp-s: Report exactly where parsing error occurs when parsing JSON (fixes #514) clp-s: Report exactly where parsing error occurs when parsing JSON (fixes #514). Aug 12, 2024
@gibber9809 gibber9809 merged commit f05264e into y-scope:main Aug 12, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhandled exceptions for some kinds of invalid JSON
2 participants