Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up remote file download cache checks in iree_tests #285

Open
ScottTodd opened this issue Jul 10, 2024 · 0 comments
Open

Speed up remote file download cache checks in iree_tests #285

ScottTodd opened this issue Jul 10, 2024 · 0 comments
Assignees

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Jul 10, 2024

The iree_tests/download_remote_files.py script is used to download files listed in test case JSON files like this one:

{
"file_format": "test_cases_v0",
"test_cases": [
{
"name": "splats",
"runtime_flagfile": "splat_data_flags.txt",
"remote_files": []
},
{
"name": "real_weights",
"runtime_flagfile": "real_weights_data_flags.txt",
"remote_files": [
"https://sharkpublic.blob.core.windows.net/sharkpublic/scotttodd/iree_tests/2024-03-12/resnet50/inference_input.0.bin",
"https://sharkpublic.blob.core.windows.net/sharkpublic/scotttodd/iree_tests/2024-03-12/resnet50/inference_output.0.bin",
"https://sharkpublic.blob.core.windows.net/sharkpublic/scotttodd/iree_tests/2024-03-12/resnet50/real_weights.irpa"
]
}
]
}

The script is used from CI like here:

# Download remote files.
- name: "Downloading remote files for real weight model tests"
run: |
source ${VENV_DIR}/bin/activate
python3 iree_tests/download_remote_files.py --root-dir iree_special_models
python3 iree_tests/download_remote_files.py --root-dir iree_tests/pytorch/models
python3 iree_tests/download_remote_files.py --root-dir iree_tests/sharktank

Sample runs:

(mi300 machine seems to be slower than mi250 machine? 30s vs 2 minutes)

We should be able to optimize this cache checking and file downloading so it takes < 10 seconds, instead of 1-3 minutes.

Ideas:

  • Use a thread pool to parallelize the checks and downloads
  • Zip the remote files for each test that are always used together (e.g. .zip, .tar, or .tar.gz) so only one local hash calculation, remote hash query, and download request is needed instead of 2-50.
  • Switch more files from Azure to Hugging Face
@ScottTodd ScottTodd self-assigned this Jul 10, 2024
ScottTodd added a commit that referenced this issue Jul 22, 2024
…297)

Progress on #285

This is a simple improvement over serial processing, but it could still
be improved further.

Looks like this shaves ~10 seconds off runs in this repo:
* Before 45s:
https://github.com/nod-ai/SHARK-TestSuite/actions/runs/9996869262/job/27632145027#step:6:15
* After 35s:
https://github.com/nod-ai/SHARK-TestSuite/actions/runs/9997455742/job/27634060037?pr=297#step:6:15

I saw 2m+ runs in IREE, hopefully this helps there too. Should be able
to get the total time down to 10-20 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant