Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _episodes/03-starting-with-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -544,12 +544,12 @@ is much larger than the wave heights classified as 'windsea'.
> 2. What happens when you group by two columns using the following syntax and
> then calculate mean values?
> - `grouped_data2 = waves_df.groupby(['Seastate', 'Quadrant'])`
> - `grouped_data2.mean()`
> - `grouped_data2.mean(numeric_only=True)`
> 3. Summarize Temperature values for swell and windsea states in your data.
>
>> ## Solution
>> 1. The most complete answer is `waves_df.groupby("Quadrant").count()["record_id"][["north", "west"]]` - note that we could use any column that has a value in every row - but given that `record_id` is our index for the dataset it makes sense to use that
>> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. depending on your version of python, you might need `grouped_data2.mean(numeric_only=True)`)
>> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. older versions of python might need `grouped_data2.mean()` without the `numeric_only=True` parameter)
>> 3.
>>
>> ~~~
Expand Down
2 changes: 1 addition & 1 deletion _episodes/04-data-types-and-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ dates.apply(datetime.datetime.strftime, args=("%a",))
{: .language-python}

>## Watch out for tuples!
> _Tuples_ are data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas:
> _Tuples_ are a data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas:
> `my_tuple = (1, 2, 3)`
> However, putting parentheses around a single object does not make it a tuple! Creating a tuple of length 1 still needs a trailing comma.
> Test these: `type(("a"))` and `type(("a",))`.
Expand Down
4 changes: 2 additions & 2 deletions _episodes/06-merging-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,9 @@ new_output = pd.read_csv('data/out.csv', keep_default_na=False, na_values=[""])
>> # group by buoy_id, and output some summary statistics
>> combined_data.groupby("buoy_id").describe()
>> # write to csv
>> combined_data.to_csv("combined_wave_data.csv", index=False)
>> combined_data.to_csv("data/combined_wave_data.csv", index=False)
>> # read in the csv
>> cwd = pd.read_csv("combined_wave_data.csv", keep_default_na=False, na_values=[""])
>> cwd = pd.read_csv("data/combined_wave_data.csv", keep_default_na=False, na_values=[""])
>> # check the results are the same
>> cwd.groupby("buoy_id").describe()
>> ~~~
Expand Down
22 changes: 13 additions & 9 deletions _episodes/07-pandas-matplotlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ import matplotlib.pyplot as plt
Now, let's read data and plot it!

~~~
waves = pd.read_csv("data/waves.csv")
my_plot = waves.plot("Tpeak", "Wave Height", kind="scatter")
waves_df = pd.read_csv("data/waves.csv")
my_plot = waves_df.plot("Tpeak", "Wave Height", kind="scatter")
plt.show() # not necessary in Jupyter Notebooks
~~~
{: .language-python}
Expand Down Expand Up @@ -229,7 +229,7 @@ provide, offering a consistent environment to make publication-quality visualiza
~~~
fig, ax1 = plt.subplots() # prepare a matplotlib figure

waves.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
waves_df.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)

# Provide further adaptations with matplotlib:
ax1.set_xlabel("Tpeak (highest energy wave periodicity; seconds)")
Expand Down Expand Up @@ -271,6 +271,10 @@ plt.show() # not necessary in Jupyter Notebooks
What about plotting after joining DataFrames? Let's plot the water depths at each of the buoys

~~~
# reload the buoys data just in case we don't have it loaded still
buoys_df = pd.read_csv("data/buoy_data.csv")


# water depth in the buoys dataframe is currently a string (it's suffixed by "m") so we need to fix that
def fix_depth_string(i, depth):
if type(depth) == str:
Expand Down Expand Up @@ -317,11 +321,11 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
> >
> > ~~~
> > fig, ax1 = plt.subplots()
> > waves[waves["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
> > waves_df[waves_df["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
> > ax1.set_xlabel("Highest energy wave period")
> > ax1.tick_params(labelsize=16, pad=8)
> > ax1.set_xbound(0, waves[waves["buoy_id"] == 16].Tpeak.max()+1)
> > ax1.set_ybound(0, waves[waves["buoy_id"] == 16]["Wave Height"].max()+1)
> > ax1.set_xbound(0, waves_df[waves_df["buoy_id"] == 16].Tpeak.max()+1)
> > ax1.set_ybound(0, waves_df[waves_df["buoy_id"] == 16]["Wave Height"].max()+1)
> > fig.suptitle('Scatter plot of wave height versus Tpeak for West Hebrides', fontsize=15)
> > ~~~
> > {: .language-python}
Expand All @@ -335,7 +339,7 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
> > ## Answers
> >
> > ~~~
> > data = waves.groupby("buoy_id").max("Wave Height")
> > data = waves_df.groupby("buoy_id").max("Wave Height")
> > x = data["Temperature"]
> > y = data["Wave Height"]
> > fig, plot = plt.subplots() # although we're not using the `fig` variable, subplots returns 2 objects
Expand All @@ -354,8 +358,8 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
> >
> > ~~~
> > fig, ax = plt.subplots()
> > wh = waves[waves["buoy_id"] == 16]
> > pb = waves[waves["buoy_id"] == 11]
> > wh = waves_df[waves_df["buoy_id"] == 16]
> > pb = waves_df[waves_df["buoy_id"] == 11]
> >
> > ax.scatter(wh["Tpeak"], wh["Wave Height"])
> > ax.scatter(pb["Tpeak"], pb["Wave Height"], marker="*")
Expand Down
2 changes: 1 addition & 1 deletion _episodes/08-geopandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ We can even display the Cairngorms data directly over the Scotland plot, which v

~~~
scotland_plot = scotland.explore()
cairngorms.explore(map=scotland_plot, style_kwds={"fillColor":"lime"})
cairngorms.explore(m=scotland_plot, style_kwds={"fillColor":"lime"})
~~~
{: .language-python}

Expand Down
Loading