Skip to content

Link Airtable data about datasets to GTFS feed data.... or retire agencies.yml in favor of airtable data? #775

Closed

Description

User Story

As a research data analyst,
I want to maintain data about GTFS URLs in only one place
so that I don't need to update 2 databases and the pipeline is always in sync with the source of truth information

Acceptance Criteria

Given the decided-upon place to maintain GTFS URLs (which should probably be airtable)
When (data pipeline) the data pipeline decides which feeds to download and stores data about which feeds are associated with which agencies and services
When (GTFS Realtime Archiver) the GTFS Realtime archiver data pipeline decides which feeds to download
Then (data pipeline) the data pipeline should be using data from the decided-upon place to maintain GTFS URLs (which should probably be airtable)
Then (GTFS Realtime Archiver) the GTFS Realtime archiver should be collecting GTFS Realtime URLs to download feeds from the decided-upon place to maintain GTFS URLs (which should probably be airtable)

Not recommended alternative: keep agencies.yml and link existing ITP ID + Url Number to airtable data

There is currently a bunch of data in airtable about GTFS feeds that should serve as the ultimate source of truth for our list of active feeds. However, the airtable data is identified with a gtfs_dataset_id whereas the current structure of data in the warehouse is identified by calitp_id and url_number. A solution should be figured out to determine how to link this data. Two possible solutions are as follows:

  1. Add the gtfs_dataset_id to each entry within the agencies.yml feed
  2. Refactor most of the data warehouse and do some data migrations such that gtfs_dataset_id retires the calitp_id and url_number key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

airtableItems related to pulling data from Cal-ITP's airtable database. Evan Siroky is product owner.blockedepicfeature implementation or projectproduct: transit-data-qualityItems that are a part of the Transit Data Quality Product of which @evansiroky is the product owner.project-msdIssues related to the mobility services data projectsize: XLtwo-week effort (one sprint)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions