Skip to content

Conversation

@jewettaijfc
Copy link
Contributor

@jewettaijfc jewettaijfc commented Jul 14, 2025

This PR addresses issue #2600

This PR:

  • Adds the replace_existing argument to Batch.load(), Batch.download(), and BatchData.load() functions from container.py. This allows these functions to skip over previously downloaded files.
  • For these batch functions, the default value of replace_existing is False.
  • A single warning is generated if it looks like any files have already been downloaded. It explains how to override the default behavior (using replace_existing==True).
  • I refrained from making any modifications which did not address issue 2600, except:
  • I followed Tyler's advice and moved the call to Batch.download() out of Batch.run() and into Batch.load().
  • BatchData.load_sim_data() (which downloads only one file) now invokes web.load() with replace_existing=True, instead of False. (This change was made to be consistent with the other single-file download functions. See below.)

Single-file download functions overwrite existing files

There are also at least 10 different functions in container.py, webapi.py, and s3utils.py which download only a single simulation file. After this PR, all of these single-file functions have the opposite behavior: They will overwrite a previously downloaded file. (Previously, some did and others did not.)

I think this should be the default behavior for functions that download single files: It is the responsibility of the user to decide whether or not to download the file.

Discussion

For batch simulations, I think the situation is different because it's unreasonable to ask users to specify all of the files they want to download manually. That's why I only make it possible to automatically skip previously downloaded batch simulation files.

How to test this code

This code is particularly difficult code to write a unit test for (since it requires mocking in a multi-threaded environment). Instead I tested it manually.

Test strategy

  • You cannot test this code in this branch because currently the web-API is not working. Ideally, you should rebase the changes from this PR onto a working release branch (v2.9.0rc1). (Unfortunately I had trouble rebasing this PR onto v2.9.0rc1 due to a large number of conflicts in unrelated files.)
  • Fortunately all the changes in this PR are in one file (container.py). So we just need to copy that file over to the v2.9.0rc1 version and run the tests. (See details below.)

Detailed test instructions:

git checkout jewettaijfc/issue2600  # checkout this branch
cp -f tidy3d/web/api/container.py /tmp/
git checkout v2.9.0rc1  # web.api works in this branch
cp -f /tmp/container.py tidy3d/web/api/  # (this effectively rebases on v2.9.0rc1)

Then follow the instructions in this file: test_pr2654.py.zip

@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2648 branch 3 times, most recently from d07aa9c to 9e08bd1 Compare July 14, 2025 23:48
@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch 2 times, most recently from f97393b to b33daf1 Compare July 15, 2025 01:25
@jewettaijfc jewettaijfc requested a review from tylerflex July 15, 2025 01:26
@yaugenst-flex yaugenst-flex requested review from yaugenst-flex and removed request for tylerflex July 15, 2025 06:40
@yaugenst-flex yaugenst-flex force-pushed the jewettaijfc/issue2648 branch from 9e08bd1 to 2d38bb2 Compare July 15, 2025 09:13
Base automatically changed from jewettaijfc/issue2648 to develop July 15, 2025 10:12
@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch from 62f1aa3 to ac85a54 Compare July 15, 2025 20:51
@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch 3 times, most recently from c0bd4f2 to ee0a307 Compare July 17, 2025 00:18
@jewettaijfc jewettaijfc marked this pull request as ready for review July 17, 2025 00:19
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing changes made in this pull request

@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch from 96f32b2 to 8e87a89 Compare July 17, 2025 00:34
Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jewettaijfc! Left some (mostly minor) comments.

@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch from 6748c8d to 2ee3551 Compare July 17, 2025 18:05
@github-actions
Copy link
Contributor

github-actions bot commented Jul 17, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/web/api/container.py (90.9%): Missing lines 469,940

Summary

  • Total: 22 lines
  • Missing: 2 lines
  • Coverage: 90%

tidy3d/web/api/container.py

  465         """
  466 
  467         batch_file = Batch._batch_path(path_dir=path_dir)
  468         batch = Batch.from_file(batch_file)
! 469         return batch.load(path_dir=path_dir, replace_existing=replace_existing)
  470 
  471 
  472 class Batch(WebContainer):
  473     """

  936             for task_name, job in self.jobs.items():
  937                 job_path_str = self._job_data_path(task_id=job.task_id, path_dir=path_dir)
  938                 if os.path.exists(job_path_str):
  939                     if replace_existing:
! 940                         log.info(f"File '{job_path_str}' already exists. Overwriting.")
  941                     else:
  942                         log.info(f"File '{job_path_str}' already exists. Skipping.")
  943                         continue
  944                 if "error" in job.status:

@jewettaijfc
Copy link
Contributor Author

I tested the latest version using test_pr2654.py.zip.

The new log.warning() message contains a link to the list of skipped files.

This PR is ready for review again.

@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch from a349f23 to 46ec978 Compare July 18, 2025 17:21
Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jewettaijfc! Minor comment about argument defaults but looks almost ready to go.

@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch from 46ec978 to bc414fd Compare July 21, 2025 15:45
@jewettaijfc jewettaijfc force-pushed the jewettaijfc/issue2600 branch 2 times, most recently from 935d2d5 to fd159e0 Compare July 21, 2025 15:52
Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 21, 2025
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 21, 2025
@jewettaijfc jewettaijfc removed this pull request from the merge queue due to a manual request Jul 21, 2025
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jul 21, 2025
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 21, 2025
@jewettaijfc jewettaijfc enabled auto-merge July 21, 2025 20:59
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 21, 2025
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 22, 2025
@jewettaijfc jewettaijfc enabled auto-merge July 22, 2025 05:50
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 22, 2025
@jewettaijfc jewettaijfc added this pull request to the merge queue Jul 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 22, 2025
…atchData.load(). Moved Batch.download() out of Batch.run() and into Batch.load(). BatchData.load_sim_data() invokes web.load() with replace_existing=True. (Previously it was False.)
@yaugenst-flex yaugenst-flex force-pushed the jewettaijfc/issue2600 branch from 126cc20 to 9041b3d Compare July 22, 2025 09:51
@yaugenst-flex yaugenst-flex enabled auto-merge July 22, 2025 09:52
@yaugenst-flex yaugenst-flex added this pull request to the merge queue Jul 22, 2025
Merged via the queue into develop with commit fc6768b Jul 22, 2025
24 checks passed
@yaugenst-flex yaugenst-flex deleted the jewettaijfc/issue2600 branch July 22, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Downloading batch files should check to see if any data is already downloaded

3 participants