-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Datasets - Added Chicago Taxi Trips dataset (#3775)
* Datasets - Added Chicago Taxi Trips dataset * Added details to the description * Improved querying by URL-encoding all query parameters * Renamed teh directory to datasets * Fixed the container image I though alpine had curl * Renamed the components file * Fixed the quoting * Renamed the directory
- Loading branch information
Showing
1 changed file
with
41 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
name: Chicago Taxi Trips dataset | ||
description: | | ||
City of Chicago Taxi Trips dataset: https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew | ||
The input parameters configure the SQL query to the database. | ||
The dataset is pretty big, so limit the number of results using the `Limit` or `Where` parameters. | ||
Read [Socrata dev](https://dev.socrata.com/docs/queries/) for the advanced query syntax | ||
metadata: | ||
annotations: | ||
author: Alexey Volkov <alexey.volkov@ark-kun.com> | ||
inputs: | ||
- {name: Where, type: String, default: 'trip_start_timestamp>="1900-01-01" AND trip_start_timestamp<"2100-01-01"'} | ||
- {name: Limit, type: Integer, default: '1000', description: 'Number of rows to return. The rows are randomly sampled.'} | ||
- {name: Select, type: String, default: 'trip_id,taxi_id,trip_start_timestamp,trip_end_timestamp,trip_seconds,trip_miles,pickup_census_tract,dropoff_census_tract,pickup_community_area,dropoff_community_area,fare,tips,tolls,extras,trip_total,payment_type,company,pickup_centroid_latitude,pickup_centroid_longitude,pickup_centroid_location,dropoff_centroid_latitude,dropoff_centroid_longitude,dropoff_centroid_location'} | ||
- {name: Format, type: String, default: 'csv', description: 'Output data format. Suports csv,tsv,cml,rdf,json'} | ||
outputs: | ||
- {name: Table, description: 'Result type depends on format. CSV and TSV have header.'} | ||
implementation: | ||
container: | ||
image: curlimages/curl | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
set -e -x -o pipefail | ||
output_path="$0" | ||
select="$1" | ||
where="$2" | ||
limit="$3" | ||
format="$4" | ||
mkdir -p "$(dirname "$output_path")" | ||
curl --get 'https://data.cityofchicago.org/resource/wrvz-psew.'"${format}" \ | ||
--data-urlencode '$limit='"${limit}" \ | ||
--data-urlencode '$where='"${where}" \ | ||
--data-urlencode '$select='"${select}" \ | ||
| tr -d '"' > "$output_path" # Removing unneeded quotes around all numbers | ||
- {outputPath: Table} | ||
- {inputValue: Select} | ||
- {inputValue: Where} | ||
- {inputValue: Limit} | ||
- {inputValue: Format} |