GTFSTK is a Python 3.5 tool kit for processing General Transit Feed Specification (GTFS) data in memory without a database. It is mostly for computing statistics, such as daily service distance per route and daily number of trips per stop. It uses Pandas and Shapely to do the heavy lifting.
Create a Python 3.5 virtual environment and pip install gtfstk
.
You can play with ipynb/examples.ipynb
in a Jupyter notebook
Documentation is in docs/
and also on RawGit here.
- Development status is Alpha
- This project uses semantic versioning
- Thanks to MRCagney for partially funding this project
- Constructive feedback is welcome and is best placed in this repository's issues section with an appropriate label, e.g. 'feature request'.
- Alex Raichev (2014-05)
- Changed
feed.read_gtfs
to unzip to temporary directory - Enabled
feed.write_gtfs
to write to a directory
- Improved function names, e.g.
compute_trips_stats
->compute_trip_stats
- Added functions to
cleaner.py
and changed cleaning function outputs to feed instances - Made
feed.copy
a method - Simplified Feed objects and added auto-updates to private attributes
- Changed the signatures of a few functions, e.g.
calculator.append_dist_to_shapes
now returns a feed instead of a shapes data frame - Fixed formatting of properties field in
calculator.trip_to_geojson
andcalculator.route_to_geojson
- Bugfix: Added
'from_stop_id'
and'to_stop_id'
to list of string data types inconstants.py
. Previously, they were sometimes getting interpreted as floats, which stripped leading zeros from the IDs, which then did not match the IDs in the stops data frame
- Added trip ID parameter to
calculator.get_stops
- Created
calculator.trip_to_geojson
- Added whitespace stripping to
cleaner.clean_route_short_names
- Renamed the function
calculator.get_feed_intersecting_polygon
tocalculator.restrict_by_polygon
- Added the function
calculator.restrict_by_routes
- Added the function
calculator.get_start_and_end_times
- Added the functions
calculator.compute_center
,calculator. compute_bounds
,calculator.route_to_geojson
- Extended the function
calculator.get_stops
to accept an optional route ID - Extended the function
calculator.build_geometry_by_shape
to accept and optional set of shape IDs - Extended the function
calculator.build_geometry_by_stop
to accept and optional set of stop IDs
- Improved distance sanity checks in
calculator.compute_trip_stats
andcalculator.append_dist_to_stop_times
- Bugfixed
feed.copy
so that thedist_units_in
of the copy equalsdist_units_out
of the original - Added some more distance sanity checks to
calculator.compute_trip_stats
andcalculator.append_dist_to_stop_times
- Improved
cleaner.clean_route_short_names
- Removed
utilities.clean_series
- Improved
cleaner.aggregate_routes
- Removed some unnecessary print statements
- Deleted an extraneous print statement in
calculator.create_shapes
- Added
utilities.is_not_null
- Changed
calculator.shapes_to_geojson
to return a dictionary instead of a string - Upgraded to Pandas 0.18.1 and fixed
calculator.downsample
accordingly - Added
cleaner.aggregate_routes
- Bugfix: formatted
parent_station
as a string inconstants.DTYPE
- Changed signature and behavior of
create_shapes
- Added duplicate route short name count to
assess
- Changed the behavior of
clean_route_short_names
- Changed
INT_COLS
toINT_COLUMNS
- Moved some functions
- Added some functions, such as a function to copy feeds
- Added more functions to
calculator.py
, some of which are optional and depend on GeoPandas - Documented more
- Made
read_gtfs
raise a more helpful error when an input path does not exist
- Made Matplotlib import optional
- Updated plotter function chart colors
- Moved the
Feed
class into a separate file - Fixed a fatal bug in
plot_routes_time_series
and renamed itplot_feed_time_series
- Added
route_type
to trips stats and routes stats - Added more functions to the
cleaner
module
- Modularized more
- Refactored the Feed class, exporting most methods to functions
- Changed function names, favoring a
compute_
prefix over aget_
prefix for complex functions - Bug fix: in
INT_COLUMNS
changed'dropoff_type'
to'drop_off_type'
.
- Changed to return empty data frames instead of
None
where appropriate - Added
Feed.clean_route_short_names
- Changed the inputs and outputs of
get_stops_stats
andget_stops_time_series
- Replaced
assert
statements with exceptions
- Changed name to
gtfstk
- Added
route_short_name
andmin_headway
to trips stats and routes stats - Changed the default handling of distance units in
Feed
- Assembled
feed.py
andutils.py
into a unified top-level package by tweaking__init__.py
- Renamed
get_linestring_by_shape
andget_point_by_stop
toget_geometry_by_shape
andget_geometry_by_stop
, respectively
- Added
min_transfer_time
toINT_COLUMNS
- Fixed
get_route_timetable
sort order
- Added data frame empty checks to
Feed.__init__
, because i was getting errors on feeds with emptycalendar.txt
files
- Removed
parent_station
fromINT_COLUMNS
, which should have never been there in the first place
- Now you can specify the output distance units
- Changed most functions to return an empty data frame instead of
None
- Fixed
export
so that integer columns, such as 'bike_allowed', that have at least on NaN value no longer get formatted as floats in the output CSVs
- Reduced columns in
get_trips_activity
- Added
clean_series
- Fixed a bug/typo in the computation of the
service_distance
andservice_duration
columns of feed stats
- Fixed a bug in the computation of the
peak_start_time
andpeak_end_time
columns of routes stats and feed stats
- Added more columns to
get_routes_stats
- Added
get_feed_stats
andget_feed_time_series
and removed the similaragg_routes_stats
andagg_routes_time_series
- Removed
dump_all_stats
, because it wasn't very useful - Replaced
get_busiest_date_of_first_week
withget_busiest_date
- Cleaned code slightly
- Added 'speed' column in trips stats
- Added 'is_loop' column in trips stats and routes stats
- Added more tests
- Added route and stop timetable methods
- Improved tests slightly
- Tidied code slightly
- Change occurrences of 'vehicle' to 'trips', because that's clearer
- Updated some packages
- Changed name to gtfs-tk
- Add
get_shapes_geojson
- Renamed
get_active_trips
andget_active_stops
toget_trips
andget_stops
- Upgraded to Pandas 0.15.2
- Scooped out main logic from
Feed.get_stops_stats
andFeed.get_stops_time_series
and put it into top level functions for the sake of greater flexibility. Similar to what i did forFeed.get_routes_stats
andFeed.get_routes_time_series
- Fixed a bug in computing the last stop of each trip in
get_trips_stats
- Improved the accuracy of trip distances in
get_trips_stats
- Upgraded to Pandas 0.15.1
- Added
fill_nan_route_short_names
- Switched back to version numbering in the style of major.minor.micro, because that seems more useful
- Fixed a bug in
Feed.get_routes_stats
that modified the input data frame and therefore affected the same data frame outside of the function (dumb Pandas gotcha). Changed it to operate on a copy of the data frame instead.
- Speeded up time series computations by at least a factor of 10
- Switched from representing dates as
datetime.date
objects to '%Y%m%d' strings (the GTFS way of representing dates), because that's simpler and faster. Added an export method to feed objects - Minor tweaks to
append_dist_to_stop_times
.
- Scooped out main logic from
Feed.get_routes_stats
andFeed.get_routes_time_series
and put it into top level functions for the sake of greater flexibility. I at least need that flexibility to plug into another project.
- Simplified methods to accept a single date instead of a list of dates.
- Whoops, lost track of the changes for this version.
- Changed
seconds_to_time
totimestr_to_seconds.
. Addedget_busiest_date_of_first_week
.
- Converted headways to minutes
- Added option to change headway start and end time cutoffs in
get_stops_stats
andget_stations_stats
- Fixed a bug in get_trips_stats that caused a failure when a trip was missing a shape ID
- Switched from major.minor.micro version numbering to major.minor numbering
- Added
get_vehicle_locations
.
- Added
append_dist_to_stop_times
andappend_dist_to_shapes
- Changed
get_xy_by_stop
name and output type
- Changed from period indices to timestamp indices for time series, because the latter are better supported in Pandas.
- Upgraded to Pandas 0.14.1.
- Restructured modules
- Created stats and time series aggregating functions
- Added
get_dist_from_shapes
keyword toget_trips_stats
- Fixed some typos and cleaned up the directory
- Changed
get_routes_stats
headway calculation - Fixed inconsistent outputs in time series functions.
- Minor tweak to
downsample
- Improved
get_trips_stats
and cleaned up code
- Changed time series format
- Added documentation
- Upgraded to Python 3.4
- Created
utils.py
and updated Pandas to 0.14.0
-Minor refactoring and tweaks to packaging
- Minor tweaks to packaging
- Initial version