-
Notifications
You must be signed in to change notification settings - Fork 172
resolve issue #737 fix type conversion issue for bigquery target #769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
elif field_type.lower() == "date": | ||
return "DATE" | ||
elif field_type.lower() == "timestamp_tz": | ||
return "TIME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct, I think. See https://datacontract.com/#data-types timestamp and timestamp_tz are the same (basically alias for each other).
We do not have a time datatype. We could add one to the spec. Until then, we would need to resort to the config options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello ok i have rollbacked that part to your original mapping.
Would be great to add time to the list to arrive at a more complete list.
As far as completeness, would be great to achieve a complete type coverage and most importantly a valid full cercle (meaning when importing a table from bigquery i expect the datacontract test to work after adding the server informations) and iexpect the datacontract import to stay stable and not be changed too much.
We are currently automatising the generation of newer datacontracts per table based on import from source functionnality (comparing models section of newly imported datacontract with the latest previous version to determine if the schema has evolved or not and automatically bump the version etc.). We would like to not have to generate unnecessary datacontracts due to underlying changes in the way types are mapped or not.
Would be great to include tests to ensure that an imported datacontract when refined (adding valid server information) has a valid execution.
return "bytes" | ||
elif bigquery_type_str == "INTEGER": | ||
return "int" | ||
return "bigint" if in_array else "int" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, why does the type change to bigint when being inside an array? Not clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi so regarding that INT64 type this is something that is internal to bigquery where the data_type of bq table within a struct or an array is represented as an INT64 and not an INTEGER.
-
create or replace table
project_id.dataset_id.simple
as (
select * from
(select [1,2,3] as array_int64)
left join
(select struct(1 as elem) as struct_int64)
on true
) -
But when looking at the information_schema: SELECT * FROM
project_id.dataset_id.INFORMATION_SCHEMA.COLUMNS
where table_name ='simple'
you see that the bigquery internal representation is an INT64
… target when doing datacontract import followed by datacontract test after adding server information
37ffaa8
to
6346e62
Compare
…TEGER> and add test cases
…match export format.
…nd numeric with precision and scale so that it can be executed against bigquery when doing the export /creating the checks
… doing datacontract import followed by datacontract test after adding server information
Bigquery Type fixing.