Skip to content
This repository was archived by the owner on Jun 17, 2024. It is now read-only.

removing date from data-visualisation.md #97

Merged
merged 2 commits into from
Jun 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 0 additions & 80 deletions episodes/data-visualisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,86 +33,6 @@ Let’s look at the data:
df_long.head()
```

| | branch | address | city | zip code | ytd | year | month | circulation |
|-----|----------------|-------------------------|---------|----------|--------|------|---------|-------------|
| 0 | Albany Park | 5150 N. Kimball Ave. | Chicago | 60625.0 | 120059 | 2011 | january | 8427 |
| 1 | Altgeld | 13281 S. Corliss Ave. | Chicago | 60827.0 | 9611 | 2011 | january | 1258 |
| 2 | Archer Heights | 5055 S. Archer Ave. | Chicago | 60632.0 | 101951 | 2011 | january | 8104 |
| 3 | Austin | 5615 W. Race Ave. | Chicago | 60644.0 | 25527 | 2011 | january | 1755 |
| 4 | Austin-Irving | 6100 W. Irving Park Rd. | Chicago | 60634.0 | 165634 | 2011 | january | 12593 |

## Convert year and month to datetime

In order to plot this data over time we need to do two things to prepare it first. First, we need to combine the year and month columns into a single [datetime](https://docs.python.org/3/library/datetime.html) column using the Pandas `to_datetime` function. Second, we assign the date column as our index for the data. These two steps will set up our data for plotting.

``` python
df_long['date'] = pd.to_datetime(df_long['year'].astype(str) + '-' + df_long['month'], format='%Y-%B')
```

Let's unpack that code:

- `df_long['date']` - First, we create a new `date` column.
- `pd.to_datetime()` - Next we package everything into a datetime object.
- `df_long['year'].astype(str)` - We use the `.astype(str)` method to convert the year column to a string
- `+ '-' + df_long['month'],` - We concatenate a `-` to the string as a separator, followed by the month column.
- `format='%Y-%B'` - We pass the datetime parameter to tell Python to expect a 4 digit year (%Y), followed by a dash, followed by the month's full name (%B).

If we take a look at the date column, we'll see that datetime automatically adds a day (always `01`) in the absence of any specific day input.

```python
df_long['date']
```
```output
0 2011-01-01
1 2011-01-01
2 2011-01-01
3 2011-01-01
4 2011-01-01
...
11551 2022-12-01
11552 2022-12-01
11553 2022-12-01
11554 2022-12-01
11555 2022-12-01
Name: date, Length: 11556, dtype: datetime64[ns]
```

``` python
df_long.info()
```

``` output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11556 entries, 0 to 11555
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 branch 11556 non-null object
1 address 7716 non-null object
2 city 7716 non-null object
3 zip code 7716 non-null float64
4 ytd 11556 non-null int64
5 year 11556 non-null object
6 month 11556 non-null object
7 circulation 11556 non-null int64
8 date 11556 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2), object(5)
memory usage: 812.7+ KB
```

That worked! Now, we can make the datetime column the index of our DataFrame. In the Pandas episode we looked at Pandas default numerical index, but we can also use `.set_index()` to declare a specific column as the index of our DataFrame. Using a datetime index will make it easier for us to plot the DataFrame over time. The first parameter of `.set_index()` is the column name and the `inplace=True` parameter allows us to modify the DataFrame without assigning it to a new variable.


``` python
df_long.set_index('date', inplace=True)
```

If we look at the data again, we will see our index will be set to date.

``` python
df_long.head()
```

| | branch | address | city | zip code | ytd | year | month | circulation |
|------------|----------------|-------------------------|---------|----------|--------|------|---------|-------------|
| date | | | | | | | | |
Expand Down
Loading