Giving historical airplane on time performance data, offer suggestions for two-hop flights that minimize the chance of missing a connection.
Driver:org.neu.RoutePrediction- Arguments
<iterations> <queryFile> <input> <output> <report-loc> <training-year-length>
- Arguments
Job:org.neu.job.RouteComputeJob
Used with Experiment 1 mentioned in the report
conf/pseudo-distributed/core-site.xmlconf/pseudo-distributed/hdfs-site.xmlconf/pseudo-distributed/mapred-site.xmlconf/pseudo-distributed/yarn-site.xml
-
Create your query with format: "YYYY, MM, DD, SOURCE_AIRPORT, DESTINATION_AIRPORT" (e.g. 2001, 09, 11, DEN, DCA). Put your queries in
query/query.csv. -
Put your Input Files(flight csv data) at
input/all. -
Modify
HADOOP_HOMEandHADOOP_VERSIONinMakefileto your hadoop home and version.
- Goto
<project-root> - Run
make
Make sure you have AWS CLI working with your KEY+SECRET
make setup-s3
make cloud AWS_REGION=us-east-1 AWS_BUCKET_NAME=mr-neighbor AWS_SUBNET_ID=subnet-51e4fd7a AWS_NUM_NODES=1 AWS_INSTANCE_TYPE=m1.medium INPUT_TYPE=books AWS_NUM_NODES=1
- Score :
make get-output-row-count (gives o/p count. Use 'hdfs dfs -get output output' to copy all output files to local) - R Markdown Report :
report/report.Rmd - HTML Report :
report/report.html - PDF Report :
report/report.pdf - Input Queries :
query/query.csv - Output Routes :
report/finalOutputRoutes.csv