Skip to content

Conversation

@priyankatuteja
Copy link
Collaborator

@priyankatuteja priyankatuteja commented Dec 7, 2020

forecast pm2.5 using multi variate time series data

Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

  • All imports are in the first cell? First block of imports are standard libraries, second block are 3rd party libraries, third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
  • All GIS object instantiations are one of the following?
    • gis = GIS()
    • gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123')
    • gis = GIS(profile="your_online_profile")
    • gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')
    • gis = GIS(profile="your_enterprise_portal")
  • If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
  • If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
  • Code refactored & split out across multiple cells, useful comments?
  • Consistent voice/tense/narrative style? Thoroughly checked for typos?
  • All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
  • All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
  • IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @ DavidJVitale so he can add it to the list for the next deploy

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:25Z
----------------------------------------------------------------

The tools available in the ArcGIS API for Python geoanalytics module require an ArcGIS Enterprise licensed and configured with the linux based ArcGIS GeoAnalytics server.

Does this mean a GeoAnalytics Server site installed on Windows will not work? Perhaps a note to explicitly state that would help clarify.


priyankatuteja commented on 2020-12-14T09:43:06Z
----------------------------------------------------------------

Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows.

priyankatuteja commented on 2020-12-14T09:46:50Z
----------------------------------------------------------------

So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:26Z
----------------------------------------------------------------

  • Using the power of distributed compute, we can find answers in large datasets much faster than other non-distributed systems.

I suggest:

...we can analyze large datasets...

  • Maybe a link to background text about what PM is?
  • ...forecast hourly PM2.5 based on historic time series data...

  • The most common air pollutants used in this analysis are PM2.5, PM10, wind speed, wind direction, and relative humidity.
  • Are wind speed, direction and relative humidity considered pollutants? Did the sentence mean to say?: The most common air pollutants used in this analysis are PM2.5 and PM10, and the impact that wind speed, wind direction, and relative humidity have on them.


priyankatuteja commented on 2020-12-14T09:56:51Z
----------------------------------------------------------------

Thanks, done!

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:27Z
----------------------------------------------------------------

  • Based on feedback from Enterprise Product Management, proper terminology would be
  • ...After connecting to your Enterprise organization...

priyankatuteja commented on 2020-12-14T09:57:18Z
----------------------------------------------------------------

Thanks, done!

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:27Z
----------------------------------------------------------------

Perhaps links to the API reference?


priyankatuteja commented on 2020-12-14T09:58:14Z
----------------------------------------------------------------

Good catch!

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:28Z
----------------------------------------------------------------

  • Do you want to expose this url? Perhaps changing it to something generic?:
  • https://geoanalytics_server.company.com:6443/arcgis/admin>

priyankatuteja commented on 2020-12-14T10:02:40Z
----------------------------------------------------------------

i always remove the output of the cell ...done!

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:29Z
----------------------------------------------------------------

Hyperlinks:


@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:29Z
----------------------------------------------------------------

The text entered for the big data file share name, Air_Quality_17_18_19, does not match any of the outputs because of a typo somewhere:

<Datastore title:"/bigDataFileShares/Air_Auality_17_18_19" type:"bigDataFileShare">


priyankatuteja commented on 2020-12-14T10:04:19Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:30Z
----------------------------------------------------------------

Not sure how important this is, but:

  • The typo is what was registered percolates through the rest of the notebook:
  • bigDataFileShares_Air_Auality_17_18_19

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:31Z
----------------------------------------------------------------

  • a hyperlink to api reference for type of object would help:
  • API Layer object -
  • is it a Layer? or FeatureLayer? Is it some other type of layer?

priyankatuteja commented on 2020-12-14T10:23:39Z
----------------------------------------------------------------

removed....

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:31Z
----------------------------------------------------------------

  • A sentence telling the user what is happening here would help clarify instead of just searching for a feature layer. Why are users searching for this layer? What are they searching for?
  • Now we'll search for a feature layer depicting all the counties in the United States. We'll use the county polygons to....

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:32Z
----------------------------------------------------------------


@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:33Z
----------------------------------------------------------------


@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:34Z
----------------------------------------------------------------


@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:35Z
----------------------------------------------------------------

  • Perhaps adding another line to give users brief overview of what the different colored symbols mean:
  • m1.legend = True

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:35Z
----------------------------------------------------------------

  • Changing the here hyperlink to this location: https://aqs.epa.gov/aqsweb/airdata/FileFormats.html#_hourly_data_files would be more precise
  • I think a brief explanation of the next 5 code cells below would help users understand what you are doing so that they might be able to reproduce this workflow on their own. A hyperlink to run_python_script would help:
  • Why are you defining a function right now? What is required in this function? How is it going to be used?
  • Explaining that the output of the save method creates an item in the gis would explain why users need to do a search a couple cells later.
  • Explaining that the output is an item with a tables attribute would also help. How are users going to know the type of the attributes of the method_item?

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:36Z
----------------------------------------------------------------

  • Remove the word Similarly

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:37Z
----------------------------------------------------------------

  • Typo createa should be creates

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:37Z
----------------------------------------------------------------

  • I have no experience with pyspark sql functions nor how to use them, nor what pyspark really is, but after reviewing the introductory text twice and then looking through this code, it becomes clear what is happening. But, my experience:
  • it's very difficult to follow what is happening in this code cell or why users are padding, then dropping without any information as to what the concat, lit, or col functions are doing
  • How do users know they need to do all this? Is this something data scientists would already know they need to do? Are we assuming the type of data frame manipulation above is commonplace knowledge for someone who would find this sample?

priyankatuteja commented on 2020-12-14T11:01:46Z
----------------------------------------------------------------

Explaination for the code:

df = layers[0] #converts layer to spark dataframe

  cols = ['Site Num', 'County Code', 'State Code', 'Date Local', 'Time Local', 'Parameter Name', 'Sample Measurement'] #create list of columns needed

  df = df.select(cols) #create a subset of the dataset with only selected columns

#the 4 lines below pads the values to make all the rows of that value with same no. of digits. This code is common to create a unique station id

  df = df.withColumn('Site_Num', F.lpad(df['Site Num'], 4, '0'))

  df = df.withColumn('County_Code', F.lpad(df['County Code'], 3, '0'))

  df = df.withColumn('State_Code', F.lpad(df['State Code'], 2, '0'))

  df = df.withColumn('unique_id', F.concat(F.col('State_Code'), F.col('County_Code'), F.col('Site_Num')))

#   drop_cols = ['Site_Num', 'County_Code', 'State_Code', 'Site Num', 'County Code', 'State Code']

  df = df.drop('Site_Num', 'County_Code', 'Staate_Code', 'Site Num', 'County Code', 'State Code') #drop columns as not needed anymore when we have a unique id column

  df = df.withColumn('datetime', concat(col("Date Local"), lit(" "), col("Time Local")))

#   drop_cols = ['Time Local', 'Date Local']

  df = df.drop('Time Local', 'Date Local')

  df = df.filter(df.unique_id == df.first().unique_id) #filter by only one station

  df = df.groupby(df['datetime'], df['unique_id']).pivot("Parameter Name").avg("Sample Measurement") #pivot the table to get variables used for prediction as columns

  df.write.format("webgis").save("timeseries_data_17_18_19_1station" + str(dt.now().microsecond))

priyankatuteja commented on 2020-12-14T11:06:01Z
----------------------------------------------------------------

concatlit, or col are basic operations in spark, need to concat for creating timeseries data specifically datetime column

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:38Z
----------------------------------------------------------------


@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:38Z
----------------------------------------------------------------

  • I am assuming that the comments in the code will help those who understand this technology to follow along what they are doing and why they are defining the function this way, and what the output columns mean
  • I think text explaining the output table would be helpful

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:39Z
----------------------------------------------------------------

Perhaps a sentence saying

  • See the Dashboard api reference for details.

@review-notebook-app
Copy link

review-notebook-app bot commented Dec 11, 2020

View / edit / reply to this conversation on ReviewNB

jyaistMap commented on 2020-12-11T18:48:40Z
----------------------------------------------------------------

  • @priyankatuteja - I think this notebook is super useful and comprehensive. Please note that I have very little experience with this technology, so I'm giving the perspective of someone who is at beginner level. I'll leave it to you and your team as to whether my suggestions are helpful given that for samples we assume some knowledge of the subject.
  • I didn't see this references section while reviewing the notebook.
  • I think making note of these References throughout the notebook would be helpful
  • perhaps adding a note in various sections:
  • See References section for more information.

Copy link
Collaborator

@jyaistMap jyaistMap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough and easy to understand why this would be a really useful notebook. Some of the concepts within the notebook seem very advanced, so I'm not certain what level of explanations we would want to include.

@AtmaMani AtmaMani added the changes requested remove this label after addressing reviewer comments label Dec 14, 2020
Copy link
Collaborator Author

Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows.


View entire conversation on ReviewNB

Copy link
Collaborator Author

So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it


View entire conversation on ReviewNB

Copy link
Collaborator Author

Thanks, done!


View entire conversation on ReviewNB

1 similar comment
Copy link
Collaborator Author

Thanks, done!


View entire conversation on ReviewNB

Copy link
Collaborator Author

Good catch!


View entire conversation on ReviewNB

Copy link
Collaborator Author

i always remove the output of the cell ...done!


View entire conversation on ReviewNB

Copy link
Collaborator Author

done


View entire conversation on ReviewNB

Copy link
Collaborator Author

removed....


View entire conversation on ReviewNB

Copy link
Collaborator Author

Explaination for the code:

df = layers[0] #converts layer to spark dataframe

  cols = ['Site Num', 'County Code', 'State Code', 'Date Local', 'Time Local', 'Parameter Name', 'Sample Measurement'] #create list of columns needed

  df = df.select(cols) #create a subset of the dataset with only selected columns

#the 4 lines below pads the values to make all the rows of that value with same no. of digits. This code is common to create a unique station id

  df = df.withColumn('Site_Num', F.lpad(df['Site Num'], 4, '0'))

  df = df.withColumn('County_Code', F.lpad(df['County Code'], 3, '0'))

  df = df.withColumn('State_Code', F.lpad(df['State Code'], 2, '0'))

  df = df.withColumn('unique_id', F.concat(F.col('State_Code'), F.col('County_Code'), F.col('Site_Num')))

#   drop_cols = ['Site_Num', 'County_Code', 'State_Code', 'Site Num', 'County Code', 'State Code']

  df = df.drop('Site_Num', 'County_Code', 'Staate_Code', 'Site Num', 'County Code', 'State Code') #drop columns as not needed anymore when we have a unique id column

  df = df.withColumn('datetime', concat(col("Date Local"), lit(" "), col("Time Local")))

#   drop_cols = ['Time Local', 'Date Local']

  df = df.drop('Time Local', 'Date Local')

  df = df.filter(df.unique_id == df.first().unique_id) #filter by only one station

  # group the dataframe by TextType field and count the number of calls for each call type. 

  df = df.groupby(df['datetime'], df['unique_id']).pivot("Parameter Name").avg("Sample Measurement") #pivot the table to get variables used for prediction as columns

  df.write.format("webgis").save("timeseries_data_17_18_19_1station" + str(dt.now().microsecond))


View entire conversation on ReviewNB

Copy link
Collaborator Author

concatlit, or col are basic operations in spark, need to concat for creating timeseries data specifically datetime column


View entire conversation on ReviewNB

priyankatuteja added 2 commits December 14, 2020 16:58
@priyankatuteja
Copy link
Collaborator Author

@jyaistMap Thanks for the detailed review. I found them really useful and it kinda gave sense of how a non-spark user might feel when skimming through this one. Keeping in mind your feedback, I have added text and explanations.

@priyankatuteja priyankatuteja removed the changes requested remove this label after addressing reviewer comments label Dec 14, 2020
@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AtmaMani commented on 2020-12-14T16:01:17Z
----------------------------------------------------------------

Good idea to show the use of sample_layer property


@AtmaMani AtmaMani merged commit 5011e11 into Esri:master Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants