Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python formatting, and gitignore additions. #326

Merged
merged 4 commits into from
Jul 18, 2019
Merged

Conversation

ruebot
Copy link
Member

@ruebot ruebot commented Jul 8, 2019

What does this Pull Request do?

Follow-on to 7a61f0e

  • Run black and isort on Python files.
  • Move Spark config to example file.
  • Update gitignore

How should this be tested?

I tested locally, and it was good to go. @ianmilligan1 if you want to test on your end, grab a small WARC (990/8471 is perfect!), then:

  1. Make sure you have your Python environment setup:
  • conda install pyspark
  • conda install tensorflow
  • conda install pyarrow
  1. Export your Python setup (for example):
  • export PYSPARK_PYTHON=/home/ruestn/anaconda3/bin/python
  • export PYSPARK_DRIVER_PYTHON=/home/ruestn/anaconda3/bin/python
  1. Build the branch locally

  2. Pull down the models:

  • cd /tmp && wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • tar -xzvf ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • cp /tmp/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/frozen_inference_graph.pb /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/category/
  • cd /PATH/TO/aut/src/main/python/tf/model/category/
  • wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt
  1. Tweak Spark conf:
  • cp /PATH/TO/aut/src/main/python/tf/util/spark.conf.example /PATH/TO/aut/src/main/python/tf/util/spark.conf
spark.sql.execution.arrow.enabled true
spark.sql.execution.arrow.maxRecordsPerBatch 50
spark.executor.memory 4G
spark.cores.max 4
spark.executor.cores 4
spark.driver.memory 4G
spark.task.cpus 2
  1. Start up Spark master/slave:
  • /PATH/TO/SPARK/sbin/start-master.sh
  • /PATH/TO/SPARK/sbin/start-slave.sh 127.0.1.1:7077
  1. Run the first step (for example):
  • python /PATH/TO/aut/src/main/python/tf/detect.py --web_archive "/home/nruest/tmp/auk/990/8471/warcs/*" --aut_jar /home/nruest/Projects/au/aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar --spark /home/nruest/bin/spark-2.4.1-bin-hadoop2.7/bin --master spark://127.0.1.1:7077 --img_model ssd --filter_size 50 50 --output_path /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03
  1. Run the second step (for example):
  • python /PATH/TO/src/main/python/tf/extract_images.py --res_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03 --output_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-image-output-03 --threshold 0.85
  1. Check out the directory you dumped the images to!

- Run black and isort on Python files.
- Move Spark config to example file.
- Update gitignore for 7a61f0e
additions.
@ruebot ruebot requested a review from ianmilligan1 July 8, 2019 13:57
@ruebot
Copy link
Member Author

ruebot commented Jul 8, 2019

@ianmilligan1 I have all these steps save locally, so we can use them for documentation when the time comes

@codecov-io
Copy link

codecov-io commented Jul 17, 2019

Codecov Report

Merging #326 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #326   +/-   ##
=======================================
  Coverage   74.97%   74.97%           
=======================================
  Files          39       39           
  Lines        1123     1123           
  Branches      197      197           
=======================================
  Hits          842      842           
  Misses        215      215           
  Partials       66       66

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f35d54e...78ef407. Read the comment docs.

Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo. Very great stuff. Lots of politician faces in this sample web archive of Canadian political parties:

Screen Shot 2019-07-18 at 1 56 48 PM

Apologies for the delay on this – a few dozen Slack messages and wrangling and this was successfully built in a conda virtual environment (this guide was useful for future reference.

For documentation purposes, on MacOS, the default URL for Spark master was formatted as spark://Ians-MacBook-Pro-3.local:7077, the 127.0.1.1:7077 didn't work on my end. Also, the Python version that ultimately worked was 3.7.1.

@ruebot
Copy link
Member Author

ruebot commented Jul 18, 2019

Oh, that's good to know about the mac side of things.

@ianmilligan1 ianmilligan1 merged commit bd5ef14 into master Jul 18, 2019
@ianmilligan1 ianmilligan1 deleted the tf-follow-on branch July 18, 2019 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants