Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDEA: Add multiple CPU support to driver script (Performance) #64

Open
truth-quark opened this issue Aug 2, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@truth-quark
Copy link
Collaborator

Idea: if more performance is required, use the multiprocessing library to execute multiple conversion processes.

This is useful if there are requirements to convert many input files.

Code changes required include:

  1. Splitting the input files into groups for faming out to different processes
  2. Implementing worker processes
  3. Reporting needs synchronisation to ensure the output logs are not interleaved
@aidanheerdegen
Copy link
Member

Hannah reported that converting a year long run takes 15-20 minutes for "monthly files".

That does seem relatively sluggish. Parallelisation might be more effective, and perhaps more necessary?, when there is a single variable per file #89.

@truth-quark
Copy link
Collaborator Author

truth-quark commented Sep 23, 2024

For reference, Hannah was converting:

  • 12 * 512mb files
  • 12 * 1.7GB files
  • Total ~26GB converted in 15-20 minutes.

Adding multiprocessing should give good gains as files can be converted independently. Assuming ncpus=4, processing could be shrunk to ~5 minutes.

@truth-quark truth-quark added the enhancement New feature or request label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants