NOC-OI · colinsauze · Jun 24, 2025 · Jun 24, 2025 · Jun 24, 2025 · Jun 24, 2025
diff --git a/_episodes/03-starting-with-data.md b/_episodes/03-starting-with-data.md
@@ -544,12 +544,12 @@ is much larger than the wave heights classified as 'windsea'.
 > 2. What happens when you group by two columns using the following syntax and
 >    then calculate mean values?
 >   - `grouped_data2 = waves_df.groupby(['Seastate', 'Quadrant'])`
->   - `grouped_data2.mean()`
+>   - `grouped_data2.mean(numeric_only=True)`
 > 3. Summarize Temperature values for swell and windsea states in your data.
 >
 >> ## Solution
 >> 1. The most complete answer is `waves_df.groupby("Quadrant").count()["record_id"][["north", "west"]]` - note that we could use any column that has a value in every row - but given that `record_id` is our index for the dataset it makes sense to use that
->> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. depending on your version of python, you might need `grouped_data2.mean(numeric_only=True)`)
+>> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. older versions of python might need `grouped_data2.mean()` without the `numeric_only=True` parameter)
 >> 3.
 >>
 >> ~~~

diff --git a/_episodes/04-data-types-and-format.md b/_episodes/04-data-types-and-format.md
@@ -374,7 +374,7 @@ dates.apply(datetime.datetime.strftime, args=("%a",))
 {: .language-python}
 
 >## Watch out for tuples!
-> _Tuples_ are data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas: 
+> _Tuples_ are a data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas: 
 > `my_tuple = (1, 2, 3)`
 > However, putting parentheses around a single object does not make it a tuple! Creating a tuple of length 1 still needs a trailing comma.
 > Test these: `type(("a"))` and `type(("a",))`.

diff --git a/_episodes/06-merging-data.md b/_episodes/06-merging-data.md
@@ -127,9 +127,9 @@ new_output = pd.read_csv('data/out.csv', keep_default_na=False, na_values=[""])
 >> # group by buoy_id, and output some summary statistics
 >> combined_data.groupby("buoy_id").describe()
 >> # write to csv
->> combined_data.to_csv("combined_wave_data.csv", index=False)
+>> combined_data.to_csv("data/combined_wave_data.csv", index=False)
 >> # read in the csv
->> cwd = pd.read_csv("combined_wave_data.csv", keep_default_na=False, na_values=[""])
+>> cwd = pd.read_csv("data/combined_wave_data.csv", keep_default_na=False, na_values=[""])
 >> # check the results are the same
 >> cwd.groupby("buoy_id").describe()
 >> ~~~

diff --git a/_episodes/07-pandas-matplotlib.md b/_episodes/07-pandas-matplotlib.md
@@ -108,8 +108,8 @@ import matplotlib.pyplot as plt
 Now, let's read data and plot it!
 
 ~~~
-waves = pd.read_csv("data/waves.csv")
-my_plot = waves.plot("Tpeak", "Wave Height", kind="scatter")
+waves_df = pd.read_csv("data/waves.csv")
+my_plot = waves_df.plot("Tpeak", "Wave Height", kind="scatter")
 plt.show() # not necessary in Jupyter Notebooks
 ~~~
 {: .language-python}
@@ -229,7 +229,7 @@ provide, offering a consistent environment to make publication-quality visualiza
 ~~~
 fig, ax1 = plt.subplots() # prepare a matplotlib figure
 
-waves.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
+waves_df.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
 
 # Provide further adaptations with matplotlib:
 ax1.set_xlabel("Tpeak (highest energy wave periodicity; seconds)")
@@ -271,6 +271,10 @@ plt.show() # not necessary in Jupyter Notebooks
 What about plotting after joining DataFrames? Let's plot the water depths at each of the buoys
 
 ~~~
+# reload the buoys data just in case we don't have it loaded still
+buoys_df = pd.read_csv("data/buoy_data.csv")
+
+
 # water depth in the buoys dataframe is currently a string (it's suffixed by "m") so we need to fix that
 def fix_depth_string(i, depth):
     if type(depth) == str:
@@ -317,11 +321,11 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
 > >
 > > ~~~
 > > fig, ax1 = plt.subplots()
-> > waves[waves["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
+> > waves_df[waves_df["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1)
 > > ax1.set_xlabel("Highest energy wave period")
 > > ax1.tick_params(labelsize=16, pad=8)
-> > ax1.set_xbound(0, waves[waves["buoy_id"] == 16].Tpeak.max()+1)
-> > ax1.set_ybound(0, waves[waves["buoy_id"] == 16]["Wave Height"].max()+1)
+> > ax1.set_xbound(0, waves_df[waves_df["buoy_id"] == 16].Tpeak.max()+1)
+> > ax1.set_ybound(0, waves_df[waves_df["buoy_id"] == 16]["Wave Height"].max()+1)
 > > fig.suptitle('Scatter plot of wave height versus Tpeak for West Hebrides', fontsize=15)
 > > ~~~
 > > {: .language-python}
@@ -335,7 +339,7 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
 > > ## Answers
 > >
 > > ~~~
-> > data = waves.groupby("buoy_id").max("Wave Height")
+> > data = waves_df.groupby("buoy_id").max("Wave Height")
 > > x = data["Temperature"]
 > > y = data["Wave Height"]
 > > fig, plot = plt.subplots() # although we're not using the `fig` variable, subplots returns 2 objects
@@ -354,8 +358,8 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum
 > >
 > > ~~~
 > > fig, ax = plt.subplots()
-> > wh = waves[waves["buoy_id"] == 16]
-> > pb = waves[waves["buoy_id"] == 11]
+> > wh = waves_df[waves_df["buoy_id"] == 16]
+> > pb = waves_df[waves_df["buoy_id"] == 11]
 > >
 > > ax.scatter(wh["Tpeak"], wh["Wave Height"])
 > > ax.scatter(pb["Tpeak"], pb["Wave Height"], marker="*")

diff --git a/_episodes/08-geopandas.md b/_episodes/08-geopandas.md
@@ -252,7 +252,7 @@ We can even display the Cairngorms data directly over the Scotland plot, which v
 
 ~~~
 scotland_plot = scotland.explore()
-cairngorms.explore(map=scotland_plot, style_kwds={"fillColor":"lime"})
+cairngorms.explore(m=scotland_plot, style_kwds={"fillColor":"lime"})
 ~~~
 {: .language-python}