Description
In many cases INTEGER columns in bigquery are used to store IDs. Such IDs are then likely to be silently corrupted on conversion to dataframe. There seems no good way around this, even if it's clear a priori that the query result will only return non-null INTEGER columns.
I understand the motivation (the ability to represent NULL values easily and efficiently), but this both seems an undesirable situation (at the very least pandas should give a warning for integers that are not exactly convertible to floats) and inconsistent to how sql queries work in pandas.
I tried to see if it would be possible to consider the 'mode' (NULLABLE or not) of the field when doing the conversion, but that part of the schema information turns out to be of very limited value in bigquery since even trivial projections turn REQUIRED fields into NULLABLE.
I'm not sure what the best solution is, but I think bpq and sql should behave identically with respect to NULLs and that at the very least there should be a warning on information losing integer->float conversions.
So my preference would be to check the returned json query data for nulls and use np.dtype(int64) if there are none in the INTEGER column, and otherwise 'object'. Where that's deemed to inefficient, I think it's better to add a read_csv
style dtypes argument to allow users to explicitly request float representation for a particular column (or just cast directly in the pseudo-SQL query itself). I'd rather have my code be slower (and possibly see a runtime warning to that effect) than unwittingly corrupt data (without any warning).