Skip to content

epw1624/transit-data-analysis

Repository files navigation

Route and arrival data used in this product or service is provided by permission of TransLink. TransLink assumes no responsibility for the accuracy or currency of the Data used in this product or service.

To Use:

Modify the filepaths on lines 35 and 37 of map_bus_stops.py to point to the stops data available from TransLink (downloadable from here and a .csv file containing some Compass Card usage data, respectively. Alternatively, use generate_random_trips.py to create some fake data to map.

python map_bus_stops.py

Open the generated file "map.html" in your browser to see the results. Here is an example, which can be viewed interactively by opening "sample_map.html in a browser: image
This map was generated using random data created with generate_random_trips.py, so don't come looking for me at these bus stops...

Project Details

compass_card_data.py

This file defines the CompassCardData class, which is essentially a wrapper around a pandas.DataFrame object containing the data from a Compass Card usage .csv. The purpose of this class is to perform some common operations that need to happen every time this data gets loaded in:

  • extract the "Transaction" column into 2 more usable columns: "Action" and "Location"
    • "Action" denotes the card action (one of "Tap in", "Tap out", "Transfer", "Loaded", "Refund")
    • "Location" denotes where the transaction occurred (eg. a Bus Stop code or skytrain station name)
  • remove unneeded columns (see CompassCardData.COLS_TO_DROP)

The class also contains some helpful methods for filtering the data by bus trips, which is the main focus of all the subsequent scripts (for now)

bus_stops.py

BusStops is a similar wrapper class for the bus stop data from the TransLink dataset linked above. The constructor filters out skytrain stations, and removes unnecessary columns.

map_bus_stops.py

Defines the MapPin class. Pins on the folium map require a latitude and longitude MapPin.name is the description that will appear in the pin's pop-up on the final map The frequency is stored to allow for choosing the pin colour based on the frequency at which a stop was visited (see MapPin.construct_folium_icon)

Constructs CompassCardData and BusStops objects, and merges them on the stop_code. The result of this operation is a pandas.DataFrame that contains each transaction, including the coordinates of the associated bus stop. A MapPin object is then generated for each stop, and all the pins are added to a folium.Map that is saved to the file "map.html".

generate_random_trips.py

I wanted to be able to show an example of the map that gets generated by map_bus_stops.py without using my own Compass Card data, so I wrote this script to generate some sample data with a few constraints:

date_times

Random datetime values between 01/31/2023 and 01/31/2024

transactions

Sample data includes only Zone 1 bus stops. Unforunately for me, the TransLink data does not include the zone on bus stops, so I had to approximate zone 1 using some landmarks:

  • Eastern boundary: Boundary Road
  • Southern boundary: YVR Airport
  • Northern boundary: Stanley Park

I took all bus stops, and filtered for coordinates within this range.

From this filtered list, 30 stops are randomly selected to be included in the data. The final dataset contains 500 transactions. Since a real person would tend to visit some bus stops more than others (home, work, etc) rather than an even distribution over all 30, I used a normal distribution to select the stops.

Everything else

None of the other lists required any significant wrangling

The random lists are combined into a pandas.DataFrame, which is then saved to a .csv. The map above features data generated in this way

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published