File tree 2 files changed +7
-7
lines changed
2 files changed +7
-7
lines changed Original file line number Diff line number Diff line change 1
- """This is one place where machine learning with Spark could occur. A previously
2
- trained classification algorithm could perhaps be used to classify the new batch of
3
- incoming data and predict whether or not the features in there describe a malicious
4
- file. Additional ML engineering can be done to feed the algorithm with new data
5
- and also improve its accuracy, but that is out of the scope of this project.
1
+ """This is one place where data processing or machine learning with Spark could occur.
2
+ A previously trained classification algorithm could perhaps be used to classify the
3
+ new batch of incoming data and predict whether or not the features in there describe
4
+ a malicious file. Additional ML engineering can be done to feed the algorithm with
5
+ new data and also improve its accuracy, but that is out of the scope of this project.
6
6
7
7
In this script, I am just selecting some columns that I think might be useful to
8
8
display on a daily dashboard, and will not be doing any machine learning. The
37
37
38
38
# sys.argv[2] is also the full S3 URI for the output destination folder that EMR will write to
39
39
# this is going to be the `stage` folder on S3
40
- new_df .write .format ("parquet " ).mode ("overwrite" ).save (sys .argv [2 ])
40
+ new_df .write .format ("csv " ).mode ("overwrite" ).save (sys .argv [2 ])
Original file line number Diff line number Diff line change @@ -135,7 +135,7 @@ def _pause_redshift_cluster(cluster_identifier: str):
135
135
cluster_state = redshift_hook .cluster_status (cluster_identifier = cluster_identifier )
136
136
137
137
try :
138
- if cluster_state == ' paused' :
138
+ if cluster_state == " paused" :
139
139
return
140
140
141
141
redshift_hook .get_conn ().pause_cluster (ClusterIdentifier = cluster_identifier )
You can’t perform that action at this time.
0 commit comments