Skip to content

Fix result scan JSON Parsing from ResultSet #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2025

Conversation

osipovartem
Copy link
Contributor

Fix read_query_batches for Correct JSON Parsing from ResultSet

Summary of Changes:

  • Fixed JSON parsing logic:
    • Extracted the rows field from ResultSet.
    • Combined it with column names from ResultSet.columns to generate line-delimited JSON objects (Arrow-compatible).
    • Replaced usage of the original result_json with the newly constructed Arrow-compatible JSON string during schema inference and batch reading.
  • Added helper function convert_resultset_to_arrow_json_lines:
    • Converts Vec and Vec into newline-delimited JSON records:
      {"a": 1, "b": "2", "c": true}
      {"a": 2.0, "b": "4", "c": false}
  • Added a check related to [BUG] Result_scan should be skipped by relations visitor #1055 to skip fully qualified names for table functions. This changes fixes flatten, result_scan and any orher registered table func
  • Added more tests

Copy link
Contributor

SQL Logic Tests Results ❌

Coverage by SLT File

  • select: 19/20 (95.0%)
  • from: 2/2 (100.0%)
  • result_scan: 0/3 (0.0%)

Overall: 21/25 (84.0%)

@DanCodedThis
Copy link
Contributor

dbt seed after scipt like in #1056, probably unrelated to this

11:45:54  Running with dbt=1.9.6
11:45:54  Registered adapter: snowflake=1.9.4
11:45:55  Found 18 models, 103 data tests, 3 seeds, 2 operations, 8 sources, 771 macros
11:45:55  
11:45:55  Concurrency: 1 threads (target='dev')
11:45:55  
11:45:56  
11:45:56  Finished running  in 0 hours 0 minutes and 1.21 seconds (1.21s).
11:45:56  Encountered an error:
Database Error
  000200: DataFusion error: Schema error: No field named all_objects.name. Valid fields are all_tables.table_catalog, all_tables.table_schema, all_tables.table_name, all_tables.table_type, all_tables.is_iceberg.

@osipovartem
Copy link
Contributor Author

dbt seed after scipt like in #1056, probably unrelated to this

11:45:54  Running with dbt=1.9.6
11:45:54  Registered adapter: snowflake=1.9.4
11:45:55  Found 18 models, 103 data tests, 3 seeds, 2 operations, 8 sources, 771 macros
11:45:55  
11:45:55  Concurrency: 1 threads (target='dev')
11:45:55  
11:45:56  
11:45:56  Finished running  in 0 hours 0 minutes and 1.21 seconds (1.21s).
11:45:56  Encountered an error:
Database Error
  000200: DataFusion error: Schema error: No field named all_objects.name. Valid fields are all_tables.table_catalog, all_tables.table_schema, all_tables.table_name, all_tables.table_type, all_tables.is_iceberg.

This is related to show objects and returned fields. Snowplow dbt seed runs show objects, and then result_scan with join by columns

Copy link
Contributor

@DanCodedThis DanCodedThis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@osipovartem osipovartem merged commit d105c0a into main Jun 11, 2025
6 checks passed
@osipovartem osipovartem deleted the issues/1055_fix_result_scan branch June 11, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants