Clean up

palewire · palewire · commit ccde06208af8 · 2023-03-14T11:21:46.000-07:00
diff --git a/docs/src/charts.md b/docs/src/charts.md
@@ -130,7 +130,7 @@ What important facet of the data is this chart *not* showing? There are two Robi
 We have that `latimes_make` column in our original dataframe, but it got lost when we created our ranking because we didn't include it in our `groupby` command. We can fix that by scrolling back up our notebook and adding it to the command. You will need to replace what's there with a list containing both columns we want to keep.
 
 ```{code-cell}
-accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().reset_index()
+accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().rename("accidents").reset_index()
 ```
 
 Rerun all of the cells below to update everything you're working with. Now if you inspect the ranking you should see the `latimes_make` column included.
diff --git a/docs/src/columns.md b/docs/src/columns.md
@@ -16,7 +16,7 @@ kernelspec:
 
 # Columns
 
-We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a period to the variable followed by the column’s name. 
+We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a pair of flat brackets with the column’s name in quotes inside. 
 
 ```{code-cell}
 :tags: [hide-cell]
@@ -27,13 +27,13 @@ accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-py
 
 ```{code-cell}
 :tags: [show-input]
-accident_list.latimes_make_and_model
+accident_list['latimes_make_and_model']
 ```
 
 That will list the column out as a `Series`, just like the ones we created from scratch earlier. Just as we did then, you can now start tacking on additional methods that will analyze the contents of the column.
 
 ````{note}
-You can also access columns a second way, like this: `accident_list['latimes_make_and_model']`. This method isn’t as pretty, but it’s required if your column has a space in its name, which would break the simpler dot-based method.
+You can also access columns a second way, like this: `accident_list.latimes_make_and_model`. This method is quicker to type, but it won't work if your column has a space in its name. So we're teaching the universal bracket method instead.
 ````
 
 ## Count a column's values
@@ -44,7 +44,7 @@ There’s another built-in pandas tool that will total up the frequency of value
 
 ```{code-cell}
 :tags: [show-input]
-accident_list.latimes_make_and_model.value_counts()
+accident_list['latimes_make_and_model'].value_counts()
 ```
 
 Congratulations, you've made your first finding. With that little line of code, you've calculated an important fact: During the period being studied, the Robinson R44 had more fatal accidents than any other helicopter.
@@ -55,7 +55,7 @@ You may notice that even though the result has two columns, pandas did not retur
 
 ```{code-cell}
 :tags: [show-input]
-accident_list.latimes_make_and_model.value_counts().reset_index()
+accident_list['latimes_make_and_model'].value_counts().reset_index()
 ```
 
 Why does a Series behave differently than a DataFrame? Why does `reset_index` have such a weird name?
diff --git a/docs/src/compute.md b/docs/src/compute.md
@@ -32,14 +32,14 @@ In many cases, it’s no more complicated than combining two series using a math
 
 ```{code-cell}
 :tags: [show-input]
-merged_list.accidents / merged_list.total_hours
+merged_list['accidents'] / merged_list['total_hours']
 ```
 
 The resulting series can be added to your dataframe by assigning it to a new column. You name your column by providing it as a quoted string inside of flat brackets. Let's call this column something brief and clear like `per_hour`.
 
 ```{code-cell}
 :tags: [show-input]
-merged_list['per_hour'] = merged_list.accidents / merged_list.total_hours
+merged_list['per_hour'] = merged_list['accidents'] / merged_list['total_hours']
 ```
 
 Which, like everything else, you can inspect with the `head` command.
@@ -53,5 +53,5 @@ You can see that the result is in [scientific notation](https://en.wikipedia.org
 
 ```{code-cell}
 :tags: [show-input]
-merged_list['per_100k_hours'] = (merged_list.accidents / merged_list.total_hours) * 100_000
+merged_list['per_100k_hours'] = merged_list['per_hour'] * 100_000
 ```
diff --git a/docs/src/filters.md b/docs/src/filters.md
@@ -40,14 +40,14 @@ In the next cell we will ask pandas to narrow down our list of accidents to just
 
 ```{code-cell}
 :tags: [show-input]
-accident_list[accident_list.state == my_state]
+accident_list[accident_list['state'] == my_state]
 ```
 
 Now we should save the results of that filter into a new variable separate from the full list we imported from the CSV file. Since it includes only the sites for the state we want, let’s call it `my_accidents`.
 
 ```{code-cell}
 :tags: [show-input]
-my_accidents = accident_list[accident_list.state == my_state]
+my_accidents = accident_list[accident_list['state'] == my_state]
 ```
 
 To check our work and find out how many records are left after the filter, let's run the DataFrame inspection commands we learned earlier.
diff --git a/docs/src/groupby.md b/docs/src/groupby.md
@@ -43,7 +43,7 @@ The result is much like `value_counts`, but we're allowed run to all kinds of st
 
 ```{code-cell}
 :tags: [show-input]
-accident_list.groupby("latimes_make_and_model").total_fatalities.sum()
+accident_list.groupby("latimes_make_and_model")['total_fatalities'].sum()
 ```
 
 Again our data has come back as an ugly Series. To reformat it as a pretty DataFrame use the `reset_index` method again.
@@ -53,18 +53,18 @@ Again our data has come back as an ugly Series. To reformat it as a pretty DataF
 accident_list.groupby("latimes_make_and_model").size().reset_index()
 ```
 
-Now save that as a variable.
+You can clean up the `0` column name assigned by pandas with the `rename` method.
 
 ```{code-cell}
 :tags: [show-input]
-accident_counts = accident_list.groupby("latimes_make_and_model").size().reset_index()
+accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
 ```
 
-You can clean up the `0` column name assigned by pandas with the `rename` method. The `inplace` option, found on many pandas methods, will save the change to your variable automatically.
+Now save that as a variable.
 
 ```{code-cell}
 :tags: [show-input]
-accident_counts.rename(columns={0: "accidents"}, inplace=True)
+accident_counts = accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
 ```
 
 The result is a DataFrame with the accident totals we'll want to merge with the FAA survey data to calculate rates.