-
Notifications
You must be signed in to change notification settings - Fork 1.1k
forecast pm2.5 using multi variate time series data #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:25Z The tools available in the ArcGIS API for Python geoanalytics module require an ArcGIS Enterprise licensed and configured with the linux based ArcGIS GeoAnalytics server.
Does this mean a GeoAnalytics Server site installed on Windows will not work? Perhaps a note to explicitly state that would help clarify. priyankatuteja commented on 2020-12-14T09:43:06Z Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows. priyankatuteja commented on 2020-12-14T09:46:50Z So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:26Z
I suggest: ...we can analyze large datasets...
priyankatuteja commented on 2020-12-14T09:56:51Z Thanks, done! |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:27Z
priyankatuteja commented on 2020-12-14T09:57:18Z Thanks, done! |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:27Z Perhaps links to the API reference? priyankatuteja commented on 2020-12-14T09:58:14Z Good catch! |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:28Z
priyankatuteja commented on 2020-12-14T10:02:40Z i always remove the output of the cell ...done! |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:29Z Hyperlinks: |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:29Z The text entered for the big data file share name, Air_Quality_17_18_19, does not match any of the outputs because of a typo somewhere: <Datastore title:"/bigDataFileShares/Air_Auality_17_18_19" type:"bigDataFileShare"> priyankatuteja commented on 2020-12-14T10:04:19Z done |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:30Z Not sure how important this is, but:
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:31Z
priyankatuteja commented on 2020-12-14T10:23:39Z removed.... |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:31Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:32Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:33Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:34Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:35Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:35Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:36Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:37Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:37Z
priyankatuteja commented on 2020-12-14T11:01:46Z Explaination for the code: df = layers[0] #converts layer to spark dataframe
cols = ['Site Num', 'County Code', 'State Code', 'Date Local', 'Time Local', 'Parameter Name', 'Sample Measurement'] #create list of columns needed df = df.select(cols) #create a subset of the dataset with only selected columns #the 4 lines below pads the values to make all the rows of that value with same no. of digits. This code is common to create a unique station id df = df.withColumn('Site_Num', F.lpad(df['Site Num'], 4, '0')) df = df.withColumn('County_Code', F.lpad(df['County Code'], 3, '0')) df = df.withColumn('State_Code', F.lpad(df['State Code'], 2, '0')) df = df.withColumn('unique_id', F.concat(F.col('State_Code'), F.col('County_Code'), F.col('Site_Num')))
# drop_cols = ['Site_Num', 'County_Code', 'State_Code', 'Site Num', 'County Code', 'State Code'] df = df.drop('Site_Num', 'County_Code', 'Staate_Code', 'Site Num', 'County Code', 'State Code') #drop columns as not needed anymore when we have a unique id column df = df.withColumn('datetime', concat(col("Date Local"), lit(" "), col("Time Local"))) # drop_cols = ['Time Local', 'Date Local'] df = df.drop('Time Local', 'Date Local') df = df.filter(df.unique_id == df.first().unique_id) #filter by only one station df = df.groupby(df['datetime'], df['unique_id']).pivot("Parameter Name").avg("Sample Measurement") #pivot the table to get variables used for prediction as columns
df.write.format("webgis").save("timeseries_data_17_18_19_1station" + str(dt.now().microsecond)) priyankatuteja commented on 2020-12-14T11:06:01Z concat, lit, or col are basic operations in spark, need to concat for creating timeseries data specifically datetime column |
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:38Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:38Z
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:39Z Perhaps a sentence saying
|
|
View / edit / reply to this conversation on ReviewNB jyaistMap commented on 2020-12-11T18:48:40Z
|
jyaistMap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thorough and easy to understand why this would be a really useful notebook. Some of the concepts within the notebook seem very advanced, so I'm not certain what level of explanations we would want to include.
|
Yes that's correct! GeoAnalytics Server site installed on Windows won't work not because of any underlying bug at our side but because the spark operations that use pyarrow library are not supported on windows. View entire conversation on ReviewNB |
|
So mentioning that it won't work on windows can confuse users as in leaving a negative remark on windows setup but those who will use this notebook for spark operations should know that some operations are not supported on windows . Do you suggest if I should explicitly mention it View entire conversation on ReviewNB |
|
Thanks, done! View entire conversation on ReviewNB |
1 similar comment
|
Thanks, done! View entire conversation on ReviewNB |
|
Good catch! View entire conversation on ReviewNB |
|
i always remove the output of the cell ...done! View entire conversation on ReviewNB |
|
done View entire conversation on ReviewNB |
|
removed.... View entire conversation on ReviewNB |
|
Explaination for the code: df = layers[0] #converts layer to spark dataframe
cols = ['Site Num', 'County Code', 'State Code', 'Date Local', 'Time Local', 'Parameter Name', 'Sample Measurement'] #create list of columns needed df = df.select(cols) #create a subset of the dataset with only selected columns #the 4 lines below pads the values to make all the rows of that value with same no. of digits. This code is common to create a unique station id df = df.withColumn('Site_Num', F.lpad(df['Site Num'], 4, '0')) df = df.withColumn('County_Code', F.lpad(df['County Code'], 3, '0')) df = df.withColumn('State_Code', F.lpad(df['State Code'], 2, '0')) df = df.withColumn('unique_id', F.concat(F.col('State_Code'), F.col('County_Code'), F.col('Site_Num')))
# drop_cols = ['Site_Num', 'County_Code', 'State_Code', 'Site Num', 'County Code', 'State Code'] df = df.drop('Site_Num', 'County_Code', 'Staate_Code', 'Site Num', 'County Code', 'State Code') #drop columns as not needed anymore when we have a unique id column df = df.withColumn('datetime', concat(col("Date Local"), lit(" "), col("Time Local"))) # drop_cols = ['Time Local', 'Date Local'] df = df.drop('Time Local', 'Date Local') df = df.filter(df.unique_id == df.first().unique_id) #filter by only one station # group the dataframe by TextType field and count the number of calls for each call type. df = df.groupby(df['datetime'], df['unique_id']).pivot("Parameter Name").avg("Sample Measurement") #pivot the table to get variables used for prediction as columns
df.write.format("webgis").save("timeseries_data_17_18_19_1station" + str(dt.now().microsecond)) View entire conversation on ReviewNB |
|
concat, lit, or col are basic operations in spark, need to concat for creating timeseries data specifically datetime column View entire conversation on ReviewNB |
|
@jyaistMap Thanks for the detailed review. I found them really useful and it kinda gave sense of how a non-spark user might feel when skimming through this one. Keeping in mind your feedback, I have added text and explanations. |
|
View / edit / reply to this conversation on ReviewNB AtmaMani commented on 2020-12-14T16:01:17Z Good idea to show the use of sample_layer property |
forecast pm2.5 using multi variate time series data
Checklist
Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.
imports are in the first cell? First block of imports are standard libraries, second block are 3rd party libraries, third block are allarcgisimports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.GISobject instantiations are one of the following?gis = GIS()gis = GIS('https://www.arcgis.com', 'arcgis_python', 'P@ssword123')gis = GIS(profile="your_online_profile")gis = GIS('https://pythonapi.playground.esri.com/portal', 'arcgis_python', 'amazing_arcgis_123')gis = GIS(profile="your_enterprise_portal")./misc/setup.pyand/or./misc/teardown.py?<img src="base64str_here">instead of<img src="https://some.url">? All map widgets contain a static image preview? (Callmapview_inst.take_screenshot()to do so)os.path.join()? (Instead ofr"\foo\bar",os.path.join(os.path.sep, "foo", "bar"), etc.)