Skip to content

Commit

Permalink
update PIC16B#3
Browse files Browse the repository at this point in the history
  • Loading branch information
yx-ath committed Apr 15, 2021
1 parent a97446c commit 4160e34
Showing 1 changed file with 59 additions and 12 deletions.
71 changes: 59 additions & 12 deletions _posts/2021-04-13-HW-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ Now, import and visualize the countries dataset:
# this is the countries data
countries_url = "https://raw.githubusercontent.com/mysociety/gaze/master/data/fips-10-4-to-iso-country-codes.csv"
countries = pd.read_csv(countries_url)
# countries.head()
countries.head()
```
<div>
<style scoped>
Expand Down Expand Up @@ -212,7 +212,7 @@ Now, import and visualize the stations dataset:
# this is the stations data
stations_url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/noaa-ghcn/station-metadata.csv"
stations = pd.read_csv(stations_url)
# stations.head()
stations.head()
```
<div>
<style scoped>
Expand Down Expand Up @@ -600,7 +600,7 @@ query_climate_database(country = "India",
<td>...</td>
</tr>
<tr>
<th>7965</th>
<th>3147</th>
<td>DARJEELING</td>
<td>27.050</td>
<td>88.270</td>
Expand All @@ -610,7 +610,7 @@ query_climate_database(country = "India",
<td>5.10</td>
</tr>
<tr>
<th>7966</th>
<th>3148</th>
<td>DARJEELING</td>
<td>27.050</td>
<td>88.270</td>
Expand All @@ -620,7 +620,7 @@ query_climate_database(country = "India",
<td>6.90</td>
</tr>
<tr>
<th>7967</th>
<th>3149</th>
<td>DARJEELING</td>
<td>27.050</td>
<td>88.270</td>
Expand All @@ -630,7 +630,7 @@ query_climate_database(country = "India",
<td>8.10</td>
</tr>
<tr>
<th>7968</th>
<th>3150</th>
<td>DARJEELING</td>
<td>27.050</td>
<td>88.270</td>
Expand All @@ -640,7 +640,7 @@ query_climate_database(country = "India",
<td>5.60</td>
</tr>
<tr>
<th>7969</th>
<th>3151</th>
<td>DARJEELING</td>
<td>27.050</td>
<td>88.270</td>
Expand All @@ -651,10 +651,11 @@ query_climate_database(country = "India",
</tr>
</tbody>
</table>
<p>7970 rows × 7 columns</p>
<p>3152 rows × 7 columns</p>
</div>

Now, get an almost similar dataset as the example one. However, its dimension is (7970, 7) instead of (3152, 7). I'm really confused right here. Any comments and suggestions would helps a lot. Thank you! :)
Nice! Now get the exact database as desired.
- Small remark: when want to rewrite the whole database, simply delete it from the files.

## §3. Geographical Scatter Function

Expand All @@ -664,8 +665,8 @@ Now, get an almost similar dataset as the example one. However, its dimension is
- `country`, a string giving the name of a country for which data should be returned.
- `year_begin` and `year_end`, two integers giving the earliest and latest years for which should be returned.
- `month`, an integer giving the month of the year for which should be returned.
- `min_obs`, the minimum required number of years of data for any given station. Only data for stations with at least min_obs years worth of data in the specified month should be plotted; the others should be filtered out. df.transform() plus filtering is a good way to achieve this task.
- `**kwargs`, additional keyword arguments passed to px.scatter_mapbox(). These can be used to control the colormap used, the mapbox style, etc.
- `min_obs`, the minimum required number of years of data for any given station. Only data for stations with at least `min_obs` years worth of data in the specified month should be plotted; the others should be filtered out. `df.transform()` plus filtering is a good way to achieve this task.
- `**kwargs`, additional keyword arguments passed to `px.scatter_mapbox()`. These can be used to control the colormap used, the mapbox style, etc.

- Output Contents:
- an interactive geographic scatterplot constructed using Plotly Express as described in `INTRO` part.
Expand Down Expand Up @@ -694,11 +695,57 @@ def coef(data_group):
Now, start writing the plotting-return function:

```python

def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):

"""
plot heat-colored map of stations in a specific country
and specified month, with specified time interval
Input
--------
country: string, tell which country for database to return
year_begin: int, earliest year to be returned
year_end: int, last year to be returned
month: int, month of the year to be returned
min_obs: the minimum years of observations
**kwargs: random keyword arguments to be applied in the plotting
Output
--------
The temperature plot
"""

# 1. write desired dataset from database with query function
df = query_climate_database(country, year_begin, year_end, month)

# 2. only keep stations with years more than min_obs
# store the number of years considered as year_num variable for each row entry
num_years = data.groupby(["NAME"])["Year"].transform("count")
# cut the dataset with those stations only greater than min_obs
df = df[num_years >= min_obs]

# 3. get yearly temperature average increase dataset
yearly_temp_increase = df.groupby(["NAME", "Month"]).apply(coef).reset_index()
# rename column as yearly_temperature_increase
yearly_temp_increase = yearly_temp_increase.rename(columns={0: "Yearly Temperature Increase"})

# 4. append geographic spotpoints (LATI & LONG)
# create temporary dataset with only geographic INFO
spots = df[["NAME", "LATITUDE", "LONGITUDE"]].drop_duplicates()
# append spotpoints locations by merging two datasets
final = pd.merge(yearly_temp_increase, spots, on = ["NAME"])

# 5. create the graph
fig = px.scatter_mapbox(final, hover_name = "NAME",
lat = "LATITUDE", lon = "LONGITUDE",
color = "Yearly Temperature Increase", **kwargs)
return fig
```

### Step 2: Sample Output Check

For example, create a plot of estimated yearly increases in temperature during the month of January, in the interval 1980-2020, in India.

```python
color_map = px.colors.diverging.RdGy_r # choose a colormap
fig = temperature_coefficient_plot("India", 1980, 2020, 1,
Expand Down

0 comments on commit 4160e34

Please sign in to comment.