Autodetect for BQ happening automatically even with schema defined. 

## Issue

I'm currently using the python sdk to transform and load data into BQ. One of the fields I have is supposed to be a string but the data will come in as either a long float number (ex: `75257229615211513551680283362634662820`), int number ( ex: `4` ), or string (ex: `http://some-string-url` ). 

So the data representation looks like 
```
 {
            "count": "6",
            "value": "75257229615211513551680283362634662820"
          },
          {
            "count": "4",
            "value": "5"
          },
          {
            "count": "4",
            "value": "http://some-string-url"
          }
```

Since BQ does the auto detection, these values can be recognized as either a FLOAT, INT, or STRING. I followed docs, and stack recommendations to assign a hard schema to the config which looks like

```
        table = bigquery.table.Table(table_ref)

        job_config = bigquery.LoadJobConfig()
        if table_name == self.incident_table_name:
            job_config.schema = self.incidents_schema

        if table_name == self.kpi_table_name:
            job_config.schema = self.kpi_schema
        
        job_config.autodetect = False  # should already be assigned to False in the SDK ( just for a hard assurance )
```
The schema is read in from a json file that looks like 
```
  "fields": [
          {
            "mode": "NULLABLE",
            "name": "count",
            "type": "INTEGER"
          },
          {
            "mode": "NULLABLE",
            "name": "value",
            "type": "STRING"
          }
        ],
 ```
 
 and parsed into the SchemaField() object.. 
 ```
     field_schema = bigquery.SchemaField(name=name,
        field_type=field_type,
        mode=mode,
        fields=subschema
    )
```

Just to be clear i do believe that this is apart of a `subschema` and assigned to the `fields` as it's an array of objects.

then i submit the data through the SDK via the `load_table_from_json` which looks like 
```
            job = params.bq_client.load_table_from_json([incident],
                                                        job_config=job_config,
                                                        destination=table)
            result = job.result()
```

All fairly straightforward, and follows closely with what is documented and shared by others. BUT for whatever reason, the API request will fail and continue to try parse `value` as either a STRING, FLOAT or INT even with the schema defined. 

So a couple questions here... is this a bug with BQ, or the SDK not accepting the schema? Am i doing something wrong here? Does the autodetect feature always do its thing even when turned off and a schema defined? 

Any communication here would be greatly appreciated.

Cheers

## info

*sdk version:* `google-cloud-bigquery    2.20.0`
*os:* MacOS
*environment:* Cloud Function
*python version:* 3.9


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autodetect for BQ happening automatically even with schema defined. #847

Issue

info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Autodetect for BQ happening automatically even with schema defined. #847

Description

Issue

info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions