Skip to content

Conversation

codeagent-art
Copy link

@codeagent-art codeagent-art commented May 21, 2025

… doing datacontract import followed by datacontract test after adding server information

  • Tests pass
  • ruff format
  • README.md updated (if relevant)
  •  CHANGELOG.md entry added

Bigquery Type fixing.

  • TIME (config solution)
  • DATETIME alignement import export
  • Now supported ARRAY, ARRAY
  • automatic bigint when INTEGER inside an array or struct as BQ internal representation is INT64 for integer
  • add support for importing JSON and GEOGRAPHY types from bigquery (aligned with awaited format in export of fixtures)
  • aligning the parsing of the struct bigquery fields to generate a datacontract with struct element.
  • recursion in tracking if an integer field is inside an array that should be mapped to bigint
  • add bigqueryType when importing : 1) string with maxlenght, 2) numeric or bignumeric with precision and scale (to have valid import export relation)

elif field_type.lower() == "date":
return "DATE"
elif field_type.lower() == "timestamp_tz":
return "TIME"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, I think. See https://datacontract.com/#data-types timestamp and timestamp_tz are the same (basically alias for each other).

We do not have a time datatype. We could add one to the spec. Until then, we would need to resort to the config options.

Copy link
Author

@codeagent-art codeagent-art May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello ok i have rollbacked that part to your original mapping.
Would be great to add time to the list to arrive at a more complete list.
As far as completeness, would be great to achieve a complete type coverage and most importantly a valid full cercle (meaning when importing a table from bigquery i expect the datacontract test to work after adding the server informations) and iexpect the datacontract import to stay stable and not be changed too much.

We are currently automatising the generation of newer datacontracts per table based on import from source functionnality (comparing models section of newly imported datacontract with the latest previous version to determine if the schema has evolved or not and automatically bump the version etc.). We would like to not have to generate unnecessary datacontracts due to underlying changes in the way types are mapped or not.

Would be great to include tests to ensure that an imported datacontract when refined (adding valid server information) has a valid execution.

return "bytes"
elif bigquery_type_str == "INTEGER":
return "int"
return "bigint" if in_array else "int"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, why does the type change to bigint when being inside an array? Not clear to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi so regarding that INT64 type this is something that is internal to bigquery where the data_type of bq table within a struct or an array is represented as an INT64 and not an INTEGER.

  1. create or replace table project_id.dataset_id.simple as (
    select * from
    (select [1,2,3] as array_int64)
    left join
    (select struct(1 as elem) as struct_int64)
    on true
    )

  2. when looking at the table you see the fields as INTEGER
    Capture d'écran 2025-05-21 131625

  3. But when looking at the information_schema: SELECT * FROM project_id.dataset_id.INFORMATION_SCHEMA.COLUMNS where table_name ='simple'
    you see that the bigquery internal representation is an INT64
    Capture d'écran 2025-05-21 131420

… target when doing datacontract import followed by datacontract test after adding server information
…nd numeric with precision and scale so that it can be executed against bigquery when doing the export /creating the checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants