Skip to content

Commit 6661732

Browse files
version 1.4
1 parent 5f30171 commit 6661732

14 files changed

+1762
-149
lines changed

.ipynb_checkpoints/OOP Toolkit-checkpoint.ipynb

+790
Large diffs are not rendered by default.

.ipynb_checkpoints/PCA-Preprocessing Toolkit-checkpoint.ipynb

+29-1
Original file line numberDiff line numberDiff line change
@@ -520,6 +520,34 @@
520520
" perm.fit(X_train, y_train)"
521521
]
522522
},
523+
{
524+
"cell_type": "markdown",
525+
"metadata": {},
526+
"source": [
527+
"## Geospatial Data\n",
528+
"\n",
529+
"### convert start : longitude/latitude and end: longitude/latitude to distance\n",
530+
"\n",
531+
" '''Distance equation for long,lat data used via stackoverflow from user Michael0x2a. \n",
532+
" Updated to a function that converts to mileage'''\n",
533+
" # constant values, if need to change end lat, long points, change the lat2, lon2 information\n",
534+
" lat2 = np.array(clean.Latitude)\n",
535+
" lon2 = np.array(clean.Longitude)\n",
536+
" latr = np.array(list(map(lambda x: np.radians(x), lat2)))\n",
537+
" lonr = np.array(list(map(lambda x: np.radians(x), lon2)))\n",
538+
" def distance(lat1,lon1):\n",
539+
" lat1 = np.radians(lat1)\n",
540+
" lon1 = np.radians(lon1)\n",
541+
" dlon = np.array(list(map(lambda x: (x - lon1), lonr)))\n",
542+
" dlat = np.array(list(map(lambda x: (x - lat1), latr)))\n",
543+
" a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2\n",
544+
" c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))\n",
545+
" # 6373.0 represents earth radius in kilometers\n",
546+
" kilo = 6373.0 * c\n",
547+
" miles = kilo * 0.62137119\n",
548+
" return miles\n"
549+
]
550+
},
523551
{
524552
"cell_type": "markdown",
525553
"metadata": {},
@@ -826,7 +854,7 @@
826854
"name": "python",
827855
"nbconvert_exporter": "python",
828856
"pygments_lexer": "ipython3",
829-
"version": "3.7.3"
857+
"version": "3.7.4"
830858
},
831859
"toc": {
832860
"base_numbering": 1,

.ipynb_checkpoints/Pandas Toolkit-checkpoint.ipynb

+16-3
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,9 @@
114114
"- df.groupby(col1)[col2] | Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics module)\n",
115115
"- df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3\n",
116116
"- df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group\n",
117-
"- df.apply(np.mean) | Apply the function np.mean() across each column\n",
117+
"- df.apply(function, axis=) | Apply the function across the entire DataFrame\n",
118+
" - axis must be specified 0=column, 1=row)\n",
119+
" - can be used with lambda function\n",
118120
"- df.applymap() | Apply function element-wise\n",
119121
"- df.rank() | assign ranks to entries"
120122
]
@@ -153,13 +155,24 @@
153155
"#### Datetime\n",
154156
"\n",
155157
"- pd.to_datetime - converts a date to a datetime object\n",
156-
"- pd.to_local - "
158+
"- dt.tz_localize('America/New_York', ambiguous='NaT') - ambiguous argument replaces ambiguous times with NaT (not a time)\n",
159+
"- dt.tz_convert('Europe/London') - converts to stated timezone\n",
160+
"- dt.weekday_name - lists the day of the week for each datetime"
157161
]
158162
},
159163
{
160164
"cell_type": "markdown",
161165
"metadata": {},
162-
"source": []
166+
"source": [
167+
"#### Iterating\n",
168+
"\n",
169+
"- iterrows() - creates an indexed list of each row observation (like iloc, but creates index) and stores as index and Series\n",
170+
"- itertuples() - like itterrows, but strores data as a special tuple, that when calling the named value can call index and all columns by . method. ex. tuple.Index, tuple.Col1, tuple.Coln\n",
171+
"\n",
172+
"##### including Numpy to iterate\n",
173+
"- pandas is built on Numpy, so DataFrames can essentially use broadcasting methods to perform functions\n",
174+
"- df['column'].values will return a Numpy array of that column's values\n"
175+
]
163176
},
164177
{
165178
"cell_type": "code",

0 commit comments

Comments
 (0)