-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to evaluate the performance number of Bert-Large training #83
Comments
The BERT large squad training log will have values like |
…Torch SPR) (#83) * Add specs, docs, and quickstarts for BERT inference and training * Add build and run scripts * Update mount paths * update base FROM * Update spec to add quickstarts * update wrapper to include run.sh * Update path * Update pip install -y * Update bert installs * Regenerate dockerfile * Update dockerfile for bert train * Update installs * Doc updates * Update dockerfile and run after testing training * remove bert inf files from dockerfile * Small doc updates * Add shm-size 8G * Fix error message * Fix env var usages in build.sh * Regenerate dockerfiles * update conda activate partial * Add build tools * quickstart script updates * Clarify dataset download instructions and switch CHECKPOINT_DIR to CONFIG_FILE * Update quickstart and docs to have phase 2 use checkpoints from phase 1 * Fix script
@zhixingheyi-tian can you try our latest optimizations for tensorflow bert-large by referring to the link here https://www.intel.com/content/www/us/en/developer/articles/containers/cpu-reference-model-containers.html |
I encountered some confusion when I followed the guide--https://github.com/IntelAI/models/tree/master/benchmarks/language_modeling/tensorflow/bert_large to run training workload.
Running command:
Result:
I didn’t see the “throughput((num_processed_examples-threshod_examples)/Elapsedtime)” information like inference workload from the training log. I also read the script code: models/models/language_modeling/tensorflow/bert_large/training/fp32/run_squad.py, I have not found about “throughput”. But the ./models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py used by inference has code about ” throughput((num_processed_examples-threshod_examples)/Elapsedtime)”.
So how to evaluate the performance number of Bert-Large training. There is neither "throughput" nor "Elapsedtime" in the log and running script?
@ashahba @dmsuehir
Thanks
The text was updated successfully, but these errors were encountered: