Jason-M-Richards
diff --git a/‎.ipynb_checkpoints/Pandas Toolkit-checkpoint.ipynb
+206 b/‎.ipynb_checkpoints/Pandas Toolkit-checkpoint.ipynb
+206
@@ -0,0 +1,206 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Importing Data\n",
+    "- pd.read_csv(filename) | From a CSV file\n",
+    "- pd.read_table(filename) | From a delimited text file (like TSV)\n",
+    "- pd.read_excel(filename) | From an Excel file\n",
+    "- pd.read_sql(query, connection_object) | Read from a SQL table/database\n",
+    "- pd.read_json(json_string) | Read from a JSON formatted string, URL or file.\n",
+    "- pd.read_html(url) | Parses an html URL, string or file and extracts tables to a list of dataframes\n",
+    "- pd.read_clipboard() | Takes the contents of your clipboard and passes it to read_table()\n",
+    "- pd.DataFrame(dict) | From a dict, keys for columns names, values for data as lists"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Exporting Data\n",
+    "- df.to_csv(filename) | Write to a CSV file\n",
+    "- df.to_excel(filename) | Write to an Excel file\n",
+    "- df.to_sql(table_name, connection_object) | Write to a SQL table\n",
+    "- df.to_json(filename) | Write to a file in JSON format"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Create Test Objects\n",
+    "Useful for testing code segements\n",
+    "\n",
+    "- pd.DataFrame(np.random.rand(20,5)) | 5 columns and 20 rows of random floats\n",
+    "- pd.Series(my_list) | Create a series from an iterable my_list\n",
+    "- df.index = pd.date_range('1900/1/30', periods=df.shape[0]) | Add a date index"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Viewing/Inspecting Data\n",
+    "- df.head(n) | First n rows of the DataFrame\n",
+    "- df.tail(n) | Last n rows of the DataFrame\n",
+    "- df.shape | Number of rows and columns\n",
+    "- df.info() | Index, Datatype and Memory information\n",
+    "- df.describe() | Summary statistics for numerical columns\n",
+    "- s.value_counts(dropna=False) | View unique values and counts\n",
+    "- df.apply(pd.Series.value_counts) | Unique values and counts for all columns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Selection\n",
+    "- df[col] | Returns column with label col as Series\n",
+    "- df[[col1, col2]] | Returns columns as a new DataFrame\n",
+    "- s.iloc[0] | Selection by position\n",
+    "- s.loc['index_one'] | Selection by index\n",
+    "- df.iloc[0,:] | First row\n",
+    "- df.iloc[0,0] | First element of \n",
+    "- df.iat([0],[0]) | row & column\n",
+    "- df.at([0], ['Country']) | row & column\n",
+    "- df.ix[2] | Select single row in subset of rows\n",
+    "- df.ix[:,'Capital'] | Select a single column of subset of columns\n",
+    "- df.ix[1,'Capital'] | Select rows and columns\n",
+    "- s[~(s > 1)] | Series s where value is not >1\n",
+    "- s[(s < -1) | (s > 2)] | s where value is <-1 or >2\n",
+    "- df[df['Population']>1200000000] | Use filter to adjust DataFrame\n",
+    "- s['a'] = 6 | set index a of Series s to 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Data Cleaning\n",
+    "- df.columns = ['a','b','c'] | Rename columns\n",
+    "- pd.isnull() | Checks for null Values, Returns Boolean Arrray\n",
+    "- pd.notnull() | Opposite of pd.isnull()\n",
+    "- df.dropna() | Drop all rows that contain null values\n",
+    "- df.dropna(axis=1) | Drop all columns that contain null values\n",
+    "- df.dropna(axis=1,thresh=n) | Drop all rows have have less than n non null values\n",
+    "- df.fillna(x) | Replace all null values with x\n",
+    "- s.fillna(s.mean()) | Replace all null values with the mean (mean can be replaced with almost any function from the statistics module)\n",
+    "- s.astype(float) | Convert the datatype of the series to float\n",
+    "- s.replace(1,'one') | Replace all values equal to 1 with 'one'\n",
+    "- s.replace([1,3],['one','three']) | Replace all 1 with 'one' and 3 with 'three'\n",
+    "- df.rename(columns=lambda x: x + 1) | Mass renaming of columns\n",
+    "- df.rename(columns={'old_name': 'new_ name'}) | Selective renaming\n",
+    "- df.set_index('column_one') | Change the index\n",
+    "- df.rename(index=lambda x: x + 1) | Mass renaming of index\n",
+    "- df.drop(['a', 'c']) | Drop value from row (axis=0)\n",
+    "- df.drop('Col_name', axis=1) | Drop values from columns (axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Filter, Sort, and Groupby\n",
+    "- df[df[col] > 0.5] | Rows where the column col is greater than 0.5\n",
+    "- df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows where 0.7 > col > 0.5\n",
+    "- df.sort_values(col1) | Sort values by col1 in ascending order\n",
+    "- df.sort_values(col2,ascending=False) | Sort values by col2 in descending order\n",
+    "- df.sort_values([col1,col2],ascending=[True,False]) | Sort values by col1 in ascending order then col2 in descending order\n",
+    "- df.sort_index | Sort by labels along an axis\n",
+    "- df.groupby(col) | Returns a groupby object for values from one column\n",
+    "- df.groupby([col1,col2]) | Returns groupby object for values from multiple columns\n",
+    "- df.groupby(col1)[col2] | Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics module)\n",
+    "- df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3\n",
+    "- df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group\n",
+    "- df.apply(np.mean) | Apply the function np.mean() across each column\n",
+    "- df.applymap() | Apply function element-wise\n",
+    "- df.rank() | assign ranks to entries"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Join/Combine\n",
+    "- df1.append(df2) | Add the rows in df1 to the end of df2 (columns should be identical)\n",
+    "- pd.concat([df1, df2],axis=1) | Add the columns in df1 to the end of df2 (rows should be identical)\n",
+    "- df1.join(df2,on=col1,how='inner') | SQL-style join the columns in df1 with the columns on df2 where the rows forcol have identical values. 'how' can be one of 'left', 'right', 'outer', 'inner'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Statistics\n",
+    "These can all be applied to a series as well.\n",
+    "\n",
+    "- df.describe() | Summary statistics for numerical columns\n",
+    "- df.mean() | Returns the mean of all columns\n",
+    "- df.corr() | Returns the correlation between columns in a DataFrame\n",
+    "- df.count() | Returns the number of non-null values in each DataFrame column\n",
+    "- df.max() | Returns the highest value in each column\n",
+    "- df.min() | Returns the lowest value in each column\n",
+    "- df.median() | Returns the median of each column\n",
+    "- df.std() | Returns the standard deviation of each column"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Datetime\n",
+    "\n",
+    "- pd.to_datetime - converts a date to a datetime object\n",
+    "- pd.to_local - "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": true
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}