Skip to content

[FAQ] Strange years (2008, 2041, 2062) appearing in taxi pickup timestamps #193

@AsherJD-io

Description

@AsherJD-io

Course

data-engineering-zoomcamp

Question

After loading taxi data into BigQuery, I see unexpected years like 2008, 2041, or 2062 in lpep_pickup_datetime.

Why is this happening?

Answer

This usually happens due to a corrupted or incorrect load process.

Common causes:

CSV schema autodetect misinterpreting timestamp format

Mixing Parquet and CSV loads into the same table

Appending instead of replacing during reload

Partial failed loads

First, verify the date range:

SELECT
MIN(lpep_pickup_datetime),
MAX(lpep_pickup_datetime)
FROM project.dataset.table;

If values fall outside expected years (e.g., 2019–2020), reload the table using:

A clean source

--replace

Preferably Parquet format instead of CSV

Example:

bq load
--source_format=PARQUET
--replace
dataset.table
gs://bucket/path/*.parquet

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions