Skip to content

Commit ccde062

Browse files
committed
Clean up
1 parent 6a963c8 commit ccde062

File tree

5 files changed

+16
-16
lines changed

5 files changed

+16
-16
lines changed

docs/src/charts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ What important facet of the data is this chart *not* showing? There are two Robi
130130
We have that `latimes_make` column in our original dataframe, but it got lost when we created our ranking because we didn't include it in our `groupby` command. We can fix that by scrolling back up our notebook and adding it to the command. You will need to replace what's there with a list containing both columns we want to keep.
131131

132132
```{code-cell}
133-
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().reset_index()
133+
accident_counts = accident_list.groupby(["latimes_make", "latimes_make_and_model"]).size().rename("accidents").reset_index()
134134
```
135135

136136
Rerun all of the cells below to update everything you're working with. Now if you inspect the ranking you should see the `latimes_make` column included.

docs/src/columns.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ kernelspec:
1616

1717
# Columns
1818

19-
We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a period to the variable followed by the column’s name.
19+
We’ll begin with the `latimes_make_and_model` column, which records the standardized name of each helicopter that crashed. To access its contents separate from the rest of the DataFrame, append a pair of flat brackets with the column’s name in quotes inside.
2020

2121
```{code-cell}
2222
:tags: [hide-cell]
@@ -27,13 +27,13 @@ accident_list = pd.read_csv("https://raw.githubusercontent.com/palewire/first-py
2727

2828
```{code-cell}
2929
:tags: [show-input]
30-
accident_list.latimes_make_and_model
30+
accident_list['latimes_make_and_model']
3131
```
3232

3333
That will list the column out as a `Series`, just like the ones we created from scratch earlier. Just as we did then, you can now start tacking on additional methods that will analyze the contents of the column.
3434

3535
````{note}
36-
You can also access columns a second way, like this: `accident_list['latimes_make_and_model']`. This method isn’t as pretty, but it’s required if your column has a space in its name, which would break the simpler dot-based method.
36+
You can also access columns a second way, like this: `accident_list.latimes_make_and_model`. This method is quicker to type, but it won't work if your column has a space in its name. So we're teaching the universal bracket method instead.
3737
````
3838

3939
## Count a column's values
@@ -44,7 +44,7 @@ There’s another built-in pandas tool that will total up the frequency of value
4444

4545
```{code-cell}
4646
:tags: [show-input]
47-
accident_list.latimes_make_and_model.value_counts()
47+
accident_list['latimes_make_and_model'].value_counts()
4848
```
4949

5050
Congratulations, you've made your first finding. With that little line of code, you've calculated an important fact: During the period being studied, the Robinson R44 had more fatal accidents than any other helicopter.
@@ -55,7 +55,7 @@ You may notice that even though the result has two columns, pandas did not retur
5555

5656
```{code-cell}
5757
:tags: [show-input]
58-
accident_list.latimes_make_and_model.value_counts().reset_index()
58+
accident_list['latimes_make_and_model'].value_counts().reset_index()
5959
```
6060

6161
Why does a Series behave differently than a DataFrame? Why does `reset_index` have such a weird name?

docs/src/compute.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,14 @@ In many cases, it’s no more complicated than combining two series using a math
3232

3333
```{code-cell}
3434
:tags: [show-input]
35-
merged_list.accidents / merged_list.total_hours
35+
merged_list['accidents'] / merged_list['total_hours']
3636
```
3737

3838
The resulting series can be added to your dataframe by assigning it to a new column. You name your column by providing it as a quoted string inside of flat brackets. Let's call this column something brief and clear like `per_hour`.
3939

4040
```{code-cell}
4141
:tags: [show-input]
42-
merged_list['per_hour'] = merged_list.accidents / merged_list.total_hours
42+
merged_list['per_hour'] = merged_list['accidents'] / merged_list['total_hours']
4343
```
4444

4545
Which, like everything else, you can inspect with the `head` command.
@@ -53,5 +53,5 @@ You can see that the result is in [scientific notation](https://en.wikipedia.org
5353

5454
```{code-cell}
5555
:tags: [show-input]
56-
merged_list['per_100k_hours'] = (merged_list.accidents / merged_list.total_hours) * 100_000
56+
merged_list['per_100k_hours'] = merged_list['per_hour'] * 100_000
5757
```

docs/src/filters.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,14 @@ In the next cell we will ask pandas to narrow down our list of accidents to just
4040

4141
```{code-cell}
4242
:tags: [show-input]
43-
accident_list[accident_list.state == my_state]
43+
accident_list[accident_list['state'] == my_state]
4444
```
4545

4646
Now we should save the results of that filter into a new variable separate from the full list we imported from the CSV file. Since it includes only the sites for the state we want, let’s call it `my_accidents`.
4747

4848
```{code-cell}
4949
:tags: [show-input]
50-
my_accidents = accident_list[accident_list.state == my_state]
50+
my_accidents = accident_list[accident_list['state'] == my_state]
5151
```
5252

5353
To check our work and find out how many records are left after the filter, let's run the DataFrame inspection commands we learned earlier.

docs/src/groupby.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The result is much like `value_counts`, but we're allowed run to all kinds of st
4343

4444
```{code-cell}
4545
:tags: [show-input]
46-
accident_list.groupby("latimes_make_and_model").total_fatalities.sum()
46+
accident_list.groupby("latimes_make_and_model")['total_fatalities'].sum()
4747
```
4848

4949
Again our data has come back as an ugly Series. To reformat it as a pretty DataFrame use the `reset_index` method again.
@@ -53,18 +53,18 @@ Again our data has come back as an ugly Series. To reformat it as a pretty DataF
5353
accident_list.groupby("latimes_make_and_model").size().reset_index()
5454
```
5555

56-
Now save that as a variable.
56+
You can clean up the `0` column name assigned by pandas with the `rename` method.
5757

5858
```{code-cell}
5959
:tags: [show-input]
60-
accident_counts = accident_list.groupby("latimes_make_and_model").size().reset_index()
60+
accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
6161
```
6262

63-
You can clean up the `0` column name assigned by pandas with the `rename` method. The `inplace` option, found on many pandas methods, will save the change to your variable automatically.
63+
Now save that as a variable.
6464

6565
```{code-cell}
6666
:tags: [show-input]
67-
accident_counts.rename(columns={0: "accidents"}, inplace=True)
67+
accident_counts = accident_list.groupby("latimes_make_and_model").size().rename("accidents").reset_index()
6868
```
6969

7070
The result is a DataFrame with the accident totals we'll want to merge with the FAA survey data to calculate rates.

0 commit comments

Comments
 (0)