-
Notifications
You must be signed in to change notification settings - Fork 0
yjkogan/CS205-Final-Project
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
To run the project, use the following command line arguments: python [LAUNCHER] [MAPPER] [DATASET_TO_COMPARE_WITH] [DATASET_TO_READ_FROM] By default, the script runs locally. With a properly configured .mrjob file, the flag -emr can be used to run the script on Amazon EMR. For example: python mr_launcher2.py matrix_mr.py s3://temp-205/large_dataset.csv large_dataset.csv -emr To change the number of mapper instances you need to go into mr_launcher2.py. While there, you can also set a limit on the number of times to launch a job on EMR. The result of a successful run is the creation of file scoreDict.txt and a line in the file perflog.csv. Lines in perflog.csv have the following format: [ROWS COMPUTED],[TOTAL RUNTIME],[AVGRUNTIME],[TIMESTAMP] You can also just directly run one of the scripts in the typical mr.job way using the following form: python [MAPPER] [-r emr] [--jobconfmapred.red.tasks=n] [DATASET_TO_COMPARE_WITH] > [OUTPUT_FILE] However, you need to be careful that you give mappers the correctly formatted data file. The files serial_example.py, n2_inside_mr.py and foremr_headts.py all need to be run locally and need some hard coded values to be altered in order to run properly. They are run with the following command line arguments: python [MAPPER] [DATABASE] e.g. python foremr_headts.py user2_export.csv serial_example.py needs the path to the database hard-coded to the variable filename n2_inside_mr.py needs the path to the database hard-coded to the variable filename. foremr_headts.py needs the path to the database of pre-computed teachers and the list of results from that computation. The output of n2_inside_mr.py can serve as this list as it is the correct format. Running n2_inside_mr.py on a subset of the database was how we generated our list.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published