Fix espaloma model download race conditions #398

epretti · 2025-08-05T23:33:29Z

This is an attempt to fix race conditions when multiple processes are trying to download espaloma models simultaneously (I believe this was the likely cause of test failures I saw in #397). Specifically, if a process creates ESPALOMA_MODEL_CACHE_PATH in between another one checking that it doesn't exist and trying to create it, the second process would fail. Also, if a process sees that the PyTorch model file exists, it would try to read it even if another process is still in the middle of downloading it, so it will run into an early EOF.

Here makedirs(exist_ok=True) should fix the first problem, and downloading to a temporary location before doing what should be an atomic rename should fix the second. I suspect CI failures from these problems would be intermittent, but I can reliably reproduce them locally, and verify the fixes, by deleting ~/.espaloma and running just the espaloma tests in parallel.

mattwthompson · 2025-08-05T23:48:03Z

We should consider using an off-the-shelf tool, @mikemhenry liked https://pypi.org/project/pooch/ in the past.

We (OpenFF) are facing issues with rate-limiting in CI when we hammer assets via GitHub Releases

codecov-commenter · 2025-08-05T23:53:22Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.09%. Comparing base (06edade) to head (7d3e995).

Files with missing lines	Patch %	Lines
...penmmforcefields/generators/template_generators.py	63.63%	4 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #398      +/-   ##
==========================================
+ Coverage   81.02%   81.09%   +0.06%     
==========================================
  Files           5        5              
  Lines         822      825       +3     
==========================================
+ Hits          666      669       +3     
  Misses        156      156

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

epretti · 2025-08-06T00:16:39Z

In an ideal world, this could be handled by espaloma itself. It looks like there's some capability for downloading models, but not handling automatic caching unless I missed it. Is this fix to the current solution OK for now, though?

mikemhenry · 2025-08-06T15:05:49Z

I agree that this should all be handled in espaloma, the capability to download models started in this package, then I added a really basic method into espaloma.

I like this fix since it will make this more robust and I can work upstream on espaloma native solution.

Fix espaloma model download race conditions

7d3e995

mikemhenry approved these changes Aug 6, 2025

View reviewed changes

mikemhenry merged commit e7c1f9e into openmm:main Aug 6, 2025
14 checks passed

mattwthompson mentioned this pull request Aug 6, 2025

Remove DummySystemGenerator #397

Merged

epretti deleted the fix-espaloma-race branch August 6, 2025 16:03

epretti mentioned this pull request Aug 6, 2025

Intermittent failures with espaloma #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix espaloma model download race conditions #398

Fix espaloma model download race conditions #398

Uh oh!

epretti commented Aug 5, 2025

Uh oh!

mattwthompson commented Aug 5, 2025

Uh oh!

codecov-commenter commented Aug 5, 2025 •

edited

Loading

Uh oh!

epretti commented Aug 6, 2025

Uh oh!

mikemhenry commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix espaloma model download race conditions #398

Fix espaloma model download race conditions #398

Uh oh!

Conversation

epretti commented Aug 5, 2025

Uh oh!

mattwthompson commented Aug 5, 2025

Uh oh!

codecov-commenter commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

epretti commented Aug 6, 2025

Uh oh!

mikemhenry commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Aug 5, 2025 •

edited

Loading