Skip to content

Commit

Permalink
Final revision for review?
Browse files Browse the repository at this point in the history
  • Loading branch information
thomcom committed May 19, 2020
1 parent 70a41a6 commit 6c3d853
Showing 1 changed file with 25 additions and 59 deletions.
84 changes: 25 additions & 59 deletions notebooks/nyc_taxi_years_correlation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
},
{
"cell_type": "code",
"execution_count": 103,
"execution_count": 2,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -73,7 +73,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -90,7 +90,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [
{
Expand All @@ -99,7 +99,7 @@
"{'DOLocationID', 'PULocationID'}"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -119,17 +119,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"tzones = gpd.GeoDataFrame.from_file('tzones_lonlat.json')\n",
"tzones.to_file('cu_taxi_zones.shp')"
"# tzones = gpd.GeoDataFrame.from_file('tzones_lonlat.json')\n",
"# tzones.to_file('cu_taxi_zones.shp')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -141,24 +141,21 @@
"metadata": {},
"source": [
"## Converting lon/lat coordinates to LocationIDs with cuSpatial\n",
"Looking at the taxi zones and the taxi2015 data, you can see that\n",
"- 12.7 million pickup locations\n",
"- 12.7 million dropoff locations\n",
"Looking at the taxi zones and the taxi2016 data, you can see that\n",
"- 10.09 million pickup locations\n",
"- 10.09 million dropoff locations\n",
"- 263 LocationID features\n",
"- 354 LocationID rings\n",
"- 98,192 LocationID coordinates\n",
"\n",
"Now that we've collected the set of pickup locations and dropoff locations, we can use `cuSpatial.point_in_polygon` to quickly determine which pickups and dropoffs occur in each borough. That is, 353 LocationID rings composed of a total of 263 LocationID features.\n",
"Now that we've collected the set of pickup locations and dropoff locations, we can use `cuspatial.point_in_polygon` to quickly determine which pickups and dropoffs occur in each borough.\n",
"\n",
"To do this in a memory efficient way, instead of creating two massive 12.7 million x 263 arrays, we're going to use the 31 polygon limit to our advantage and map the resulting true values in the array a new `PULocationID` and `DOLocationID`, matching the 2017 schema. Two things to note:\n",
"\n",
"1. we had to go in a reversed order for this to work. \n",
"1. locations outside of the `LocationID` areas are `264` and `265`. We'll be using 264 to indicate our out-of-bounds zones as no guidance was given on how to decide between the two."
"To do this in a memory efficient way, instead of creating two massive 10.09 million x 263 arrays, we're going to use the 31 polygon limit to our advantage and map the resulting true values in the array a new `PULocationID` and `DOLocationID`, matching the 2017 schema. Locations outside of the `LocationID` areas are `264` and `265`. We'll be using 264 to indicate our out-of-bounds zones."
]
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -177,15 +174,15 @@
},
{
"cell_type": "code",
"execution_count": 65,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.5 s, sys: 6.28 s, total: 16.7 s\n",
"Wall time: 16.9 s\n"
"CPU times: user 11.2 s, sys: 6.27 s, total: 17.5 s\n",
"Wall time: 17.7 s\n"
]
}
],
Expand All @@ -206,40 +203,9 @@
},
{
"cell_type": "code",
"execution_count": 66,
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 605\n",
"1 1\n",
"2 32\n",
"3 34894\n",
"4 3\n",
" ... \n",
"259 143\n",
"260 7187\n",
"261 47599\n",
"262 133634\n",
"264 179098\n",
"Name: PULocationID, Length: 259, dtype: int32\n",
"0 15546\n",
"1 1\n",
"2 611\n",
"3 58248\n",
"4 56\n",
" ... \n",
"259 1464\n",
"260 14324\n",
"261 46783\n",
"262 144201\n",
"264 188379\n",
"Name: DOLocationID, Length: 261, dtype: int32\n"
]
}
],
"outputs": [],
"source": [
"del pickups\n",
"del dropoffs"
Expand All @@ -254,14 +220,14 @@
},
{
"cell_type": "code",
"execution_count": 75,
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.10632302994054166\n"
"0.10632302994054163\n"
]
}
],
Expand All @@ -278,7 +244,7 @@
},
{
"cell_type": "code",
"execution_count": 84,
"execution_count": 11,
"metadata": {},
"outputs": [
{
Expand All @@ -302,7 +268,7 @@
},
{
"cell_type": "code",
"execution_count": 93,
"execution_count": 12,
"metadata": {},
"outputs": [
{
Expand All @@ -329,7 +295,7 @@
},
{
"cell_type": "code",
"execution_count": 94,
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -346,7 +312,7 @@
},
{
"cell_type": "code",
"execution_count": 99,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand Down

0 comments on commit 6c3d853

Please sign in to comment.