-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Course
data-engineering-zoomcamp
Question
After loading taxi data into BigQuery, I see unexpected years like 2008, 2041, or 2062 in lpep_pickup_datetime.
Why is this happening?
Answer
This usually happens due to a corrupted or incorrect load process.
Common causes:
CSV schema autodetect misinterpreting timestamp format
Mixing Parquet and CSV loads into the same table
Appending instead of replacing during reload
Partial failed loads
First, verify the date range:
SELECT
MIN(lpep_pickup_datetime),
MAX(lpep_pickup_datetime)
FROM project.dataset.table;
If values fall outside expected years (e.g., 2019–2020), reload the table using:
A clean source
--replace
Preferably Parquet format instead of CSV
Example:
bq load
--source_format=PARQUET
--replace
dataset.table
gs://bucket/path/*.parquet
Checklist
- I have searched existing FAQs and this question is not already answered
- The answer provides accurate, helpful information
- I have included any relevant code examples or links