Skip to content

mongo: handle inf/-inf/NaN in connector #3319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 7, 2025
Merged

mongo: handle inf/-inf/NaN in connector #3319

merged 3 commits into from
Aug 7, 2025

Conversation

jgao54
Copy link
Contributor

@jgao54 jgao54 commented Aug 7, 2025

Use jsonitor library instead of default encoding/json, add custom extension to handle NaN/Inf/-Inf cases by writing them to String type. As pointed out by @ilidemi , JSON CH does not preserve NaN/Infinity/-Infinity:

SELECT CAST('{"a": "NaN"}' AS JSON(a Float))
SETTINGS enable_json_type=1; // returns {"a": null}

I end up using the default ConfigCompatibleWithStandardLibrary because ConfigFastest results in precision loss -- we could consider enabling it as an option in the future for perf-sensitive users.

Test:

  • add unit tests and e2e tests
  • ran flow locally

@jgao54 jgao54 requested review from ilidemi and serprex August 7, 2025 00:34
@jgao54 jgao54 changed the title mongo: handle inf/-inf/Nan in connector mongo: handle inf/-inf/NaN in connector Aug 7, 2025
@jgao54 jgao54 force-pushed the handle-nan-inf branch 3 times, most recently from ec6b16b to 8c00ae7 Compare August 7, 2025 03:36
Comment on lines 29 to 30
DocumentKey bson.D `bson:"documentKey,omitempty"`
FullDocument bson.D `bson:"fullDocument,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to bson.D so cdc's document type is consistent with initial load, otherwise I have to implement redundant logic for map[string]any datatype in the BsonExtension implementation.

Also realized that bson decoding libary can deserialize to either bson.D or bson.M.

Not sure if there's a huge difference in performance between the two (given bson.D is ordered), we can change this to bson.M later if it's faster.

@jgao54 jgao54 force-pushed the handle-nan-inf branch 2 times, most recently from 84f81f6 to 163fde2 Compare August 7, 2025 04:19
@jgao54 jgao54 enabled auto-merge (squash) August 7, 2025 07:00
@jgao54 jgao54 merged commit bc96d62 into main Aug 7, 2025
17 of 20 checks passed
@jgao54 jgao54 deleted the handle-nan-inf branch August 7, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants