forecast pm2.5 using multi variate time series data #853

priyankatuteja · 2020-12-07T16:38:34Z

forecast pm2.5 using multi variate time series data

Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

All imports are in the first cell? First block of imports are standard libraries, second block are 3rd party libraries, third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
All GIS object instantiations are one of the following?
- gis = GIS()
- gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123')
- gis = GIS(profile="your_online_profile")
- gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')
- gis = GIS(profile="your_enterprise_portal")
If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
Code refactored & split out across multiple cells, useful comments?
Consistent voice/tense/narrative style? Thoroughly checked for typos?
All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @ DavidJVitale so he can add it to the list for the next deploy

review-notebook-app · 2020-12-07T16:38:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2020-12-11T18:48:25Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:25Z
----------------------------------------------------------------

The tools available in the ArcGIS API for Python geoanalytics module require an ArcGIS Enterprise licensed and configured with the linux based ArcGIS GeoAnalytics server.

Does this mean a GeoAnalytics Server site installed on Windows will not work? Perhaps a note to explicitly state that would help clarify.

priyankatuteja commented on 2020-12-14T09:43:06Z
----------------------------------------------------------------

Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows.

priyankatuteja commented on 2020-12-14T09:46:50Z
----------------------------------------------------------------

So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it

review-notebook-app · 2020-12-11T18:48:26Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:26Z
----------------------------------------------------------------

Using the power of distributed compute, we can find answers in large datasets much faster than other non-distributed systems.

I suggest:

...we can analyze large datasets...

Maybe a link to background text about what PM is?
...forecast hourly PM2.5 based on historic time series data...

The most common air pollutants used in this analysis are PM2.5, PM10, wind speed, wind direction, and relative humidity.
Are wind speed, direction and relative humidity considered pollutants? Did the sentence mean to say?: The most common air pollutants used in this analysis are PM2.5 and PM10, and the impact that wind speed, wind direction, and relative humidity have on them.

priyankatuteja commented on 2020-12-14T09:56:51Z
----------------------------------------------------------------

Thanks, done!

review-notebook-app · 2020-12-11T18:48:27Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:27Z
----------------------------------------------------------------

Based on feedback from Enterprise Product Management, proper terminology would be
...After connecting to your Enterprise organization...

priyankatuteja commented on 2020-12-14T09:57:18Z
----------------------------------------------------------------

Thanks, done!

review-notebook-app · 2020-12-11T18:48:28Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:27Z
----------------------------------------------------------------

Perhaps links to the API reference?

priyankatuteja commented on 2020-12-14T09:58:14Z
----------------------------------------------------------------

Good catch!

review-notebook-app · 2020-12-11T18:48:28Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:28Z
----------------------------------------------------------------

Do you want to expose this url? Perhaps changing it to something generic?:
https://geoanalytics_server.company.com:6443/arcgis/admin>

priyankatuteja commented on 2020-12-14T10:02:40Z
----------------------------------------------------------------

i always remove the output of the cell ...done!

review-notebook-app · 2020-12-11T18:48:29Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:29Z
----------------------------------------------------------------

Hyperlinks:

add_bigdata()

review-notebook-app · 2020-12-11T18:48:30Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:29Z
----------------------------------------------------------------

The text entered for the big data file share name, Air_Quality_17_18_19, does not match any of the outputs because of a typo somewhere:

<Datastore title:"/bigDataFileShares/Air_Auality_17_18_19" type:"bigDataFileShare">

priyankatuteja commented on 2020-12-14T10:04:19Z
----------------------------------------------------------------

done

review-notebook-app · 2020-12-11T18:48:30Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:30Z
----------------------------------------------------------------

Not sure how important this is, but:

The typo is what was registered percolates through the rest of the notebook:
bigDataFileShares_Air_Auality_17_18_19

review-notebook-app · 2020-12-11T18:48:31Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:31Z
----------------------------------------------------------------

a hyperlink to api reference for type of object would help:
API Layer object -
is it a Layer? or FeatureLayer? Is it some other type of layer?

priyankatuteja commented on 2020-12-14T10:23:39Z
----------------------------------------------------------------

removed....

review-notebook-app · 2020-12-11T18:48:32Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:31Z
----------------------------------------------------------------

A sentence telling the user what is happening here would help clarify instead of just searching for a feature layer. Why are users searching for this layer? What are they searching for?
Now we'll search for a feature layer depicting all the counties in the United States. We'll use the county polygons to....

review-notebook-app · 2020-12-11T18:48:33Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:32Z
----------------------------------------------------------------

Hyperlink references to the api reference would be helpful:
...item is a Feature Layer Collection
...list of Feature layer objects

review-notebook-app · 2020-12-11T18:48:33Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:33Z
----------------------------------------------------------------

Hyperlinks:
describe_dataset
Change dict to dictionary

review-notebook-app · 2020-12-11T18:48:34Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:34Z
----------------------------------------------------------------

Might help to explain that this attribute dot chain syntax returns a Spatially Enabled DataFrame

review-notebook-app · 2020-12-11T18:48:35Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:35Z
----------------------------------------------------------------

Perhaps adding another line to give users brief overview of what the different colored symbols mean:
m1.legend = True

review-notebook-app · 2020-12-11T18:48:36Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:35Z
----------------------------------------------------------------

Changing the here hyperlink to this location: https://aqs.epa.gov/aqsweb/airdata/FileFormats.html#_hourly_data_files would be more precise
I think a brief explanation of the next 5 code cells below would help users understand what you are doing so that they might be able to reproduce this workflow on their own. A hyperlink to run_python_script would help:
Why are you defining a function right now? What is required in this function? How is it going to be used?
Explaining that the output of the save method creates an item in the gis would explain why users need to do a search a couple cells later.
Explaining that the output is an item with a tables attribute would also help. How are users going to know the type of the attributes of the method_item?

review-notebook-app · 2020-12-11T18:48:36Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:36Z
----------------------------------------------------------------

Remove the word Similarly

review-notebook-app · 2020-12-11T18:48:37Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:37Z
----------------------------------------------------------------

Typo createa should be creates

review-notebook-app · 2020-12-11T18:48:37Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:37Z
----------------------------------------------------------------

I have no experience with pyspark sql functions nor how to use them, nor what pyspark really is, but after reviewing the introductory text twice and then looking through this code, it becomes clear what is happening. But, my experience:
it's very difficult to follow what is happening in this code cell or why users are padding, then dropping without any information as to what the concat, lit, or col functions are doing
How do users know they need to do all this? Is this something data scientists would already know they need to do? Are we assuming the type of data frame manipulation above is commonplace knowledge for someone who would find this sample?

priyankatuteja commented on 2020-12-14T11:01:46Z
----------------------------------------------------------------

Explaination for the code:

df = layers[0] #converts layer to spark dataframe

cols = ['Site Num', 'County Code', 'State Code', 'Date Local', 'Time Local', 'Parameter Name', 'Sample Measurement'] #create list of columns needed

df = df.select(cols) #create a subset of the dataset with only selected columns

#the 4 lines below pads the values to make all the rows of that value with same no. of digits. This code is common to create a unique station id

df = df.withColumn('Site_Num', F.lpad(df['Site Num'], 4, '0'))

df = df.withColumn('County_Code', F.lpad(df['County Code'], 3, '0'))

df = df.withColumn('State_Code', F.lpad(df['State Code'], 2, '0'))

df = df.withColumn('unique_id', F.concat(F.col('State_Code'), F.col('County_Code'), F.col('Site_Num')))

# drop_cols = ['Site_Num', 'County_Code', 'State_Code', 'Site Num', 'County Code', 'State Code']

df = df.drop('Site_Num', 'County_Code', 'Staate_Code', 'Site Num', 'County Code', 'State Code') #drop columns as not needed anymore when we have a unique id column

df = df.withColumn('datetime', concat(col("Date Local"), lit(" "), col("Time Local")))

# drop_cols = ['Time Local', 'Date Local']

df = df.drop('Time Local', 'Date Local')

df = df.filter(df.unique_id == df.first().unique_id) #filter by only one station

df = df.groupby(df['datetime'], df['unique_id']).pivot("Parameter Name").avg("Sample Measurement") #pivot the table to get variables used for prediction as columns

df.write.format("webgis").save("timeseries_data_17_18_19_1station" + str(dt.now().microsecond))

priyankatuteja commented on 2020-12-14T11:06:01Z
----------------------------------------------------------------

concat, lit, or col are basic operations in spark, need to concat for creating timeseries data specifically datetime column

review-notebook-app · 2020-12-11T18:48:38Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:38Z
----------------------------------------------------------------

Being an inexperienced user of pyspark, Pandas and general AI/ML/DL workflows, I do not know what pandas UDF functions are nor what the fb-prophet model is
perhaps hyperlinks where users can get more information if they are new to this?:
Pandas user-defined functions and Apache Arrow
Time Series Forecasting with FB Prophet

review-notebook-app · 2020-12-11T18:48:39Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:38Z
----------------------------------------------------------------

I am assuming that the comments in the code will help those who understand this technology to follow along what they are doing and why they are defining the function this way, and what the output columns mean
I think text explaining the output table would be helpful

review-notebook-app · 2020-12-11T18:48:39Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:39Z
----------------------------------------------------------------

Perhaps a sentence saying

See the Dashboard api reference for details.

review-notebook-app · 2020-12-11T18:48:40Z

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:40Z
----------------------------------------------------------------

@priyankatuteja - I think this notebook is super useful and comprehensive. Please note that I have very little experience with this technology, so I'm giving the perspective of someone who is at beginner level. I'll leave it to you and your team as to whether my suggestions are helpful given that for samples we assume some knowledge of the subject.
I didn't see this references section while reviewing the notebook.
I think making note of these References throughout the notebook would be helpful
perhaps adding a note in various sections:
See References section for more information.

jyaistMap

Very thorough and easy to understand why this would be a really useful notebook. Some of the concepts within the notebook seem very advanced, so I'm not certain what level of explanations we would want to include.

priyankatuteja · 2020-12-14T09:43:07Z

Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows.

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T09:46:50Z

So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T09:56:52Z

Thanks, done!

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T09:57:19Z

Thanks, done!

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T09:58:15Z

Good catch!

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T10:02:41Z

i always remove the output of the cell ...done!

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T10:04:20Z

done

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T10:23:40Z

removed....

View entire conversation on ReviewNB

priyankatuteja · 2020-12-14T11:01:47Z