Skip to content

Commit

Permalink
Datasets - Added Chicago Taxi Trips dataset (#3775)
Browse files Browse the repository at this point in the history
* Datasets - Added Chicago Taxi Trips dataset

* Added details to the description

* Improved querying by URL-encoding all query parameters

* Renamed teh directory to datasets

* Fixed the container image

I though alpine had curl

* Renamed the components file

* Fixed the quoting

* Renamed the directory
  • Loading branch information
Ark-kun committed May 27, 2020
1 parent 76f7476 commit 5e3d9aa
Showing 1 changed file with 41 additions and 0 deletions.
41 changes: 41 additions & 0 deletions components/datasets/Chicago_Taxi_Trips/component.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Chicago Taxi Trips dataset
description: |
City of Chicago Taxi Trips dataset: https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew
The input parameters configure the SQL query to the database.
The dataset is pretty big, so limit the number of results using the `Limit` or `Where` parameters.
Read [Socrata dev](https://dev.socrata.com/docs/queries/) for the advanced query syntax
metadata:
annotations:
author: Alexey Volkov <alexey.volkov@ark-kun.com>
inputs:
- {name: Where, type: String, default: 'trip_start_timestamp>="1900-01-01" AND trip_start_timestamp<"2100-01-01"'}
- {name: Limit, type: Integer, default: '1000', description: 'Number of rows to return. The rows are randomly sampled.'}
- {name: Select, type: String, default: 'trip_id,taxi_id,trip_start_timestamp,trip_end_timestamp,trip_seconds,trip_miles,pickup_census_tract,dropoff_census_tract,pickup_community_area,dropoff_community_area,fare,tips,tolls,extras,trip_total,payment_type,company,pickup_centroid_latitude,pickup_centroid_longitude,pickup_centroid_location,dropoff_centroid_latitude,dropoff_centroid_longitude,dropoff_centroid_location'}
- {name: Format, type: String, default: 'csv', description: 'Output data format. Suports csv,tsv,cml,rdf,json'}
outputs:
- {name: Table, description: 'Result type depends on format. CSV and TSV have header.'}
implementation:
container:
image: curlimages/curl
command:
- sh
- -c
- |
set -e -x -o pipefail
output_path="$0"
select="$1"
where="$2"
limit="$3"
format="$4"
mkdir -p "$(dirname "$output_path")"
curl --get 'https://data.cityofchicago.org/resource/wrvz-psew.'"${format}" \
--data-urlencode '$limit='"${limit}" \
--data-urlencode '$where='"${where}" \
--data-urlencode '$select='"${select}" \
| tr -d '"' > "$output_path" # Removing unneeded quotes around all numbers
- {outputPath: Table}
- {inputValue: Select}
- {inputValue: Where}
- {inputValue: Limit}
- {inputValue: Format}

0 comments on commit 5e3d9aa

Please sign in to comment.