Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parallel processing to speed up obs processing #733

Merged

Conversation

CoryMartin-NOAA
Copy link
Contributor

Using python multiprocessing to generate obs in parallel. The current list of obs goes from 10+ minutes to completing in ~5.5 minutes.


# Check if the converter was successful
# if os.path.exists(yaml_output_file):
# rm_p(yaml_output_file)

# run all bufr2ioda yamls in parallel
with mp.Pool(num_cores) as pool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the easiest way to do this was to split them into python and YAML+executable groups. If they are roughly equal in size, that probably is okay, but we may want to combine them all into the same pool?

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with python multiprocessing. Is running in parallel from python the most efficient approach?

Copied ush/ioda/bufr2ioda/run_bufr2ioda.py to a working copy of gdas-validation. Ran gdasprepatmiodaobs. Log file indicates jobs ran in parallel.

^[[38;21m2023-11-16 00:54:20,329 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_gpsro_bufr_combined.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/gpsro_bufr_combined_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,329 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_scat.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_scat_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpsfc_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpsfc_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpupa_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpupa_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_conventional_prepbufr_ps.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/conventional_prepbufr_ps_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_sfcshp_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/sfcshp_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_amv_goes.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_amv_goes_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,331 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_acft_profiles_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/acft_profiles_prepbufr_2021080100.json^[[0m

The total run time for the parallel prepatmiodaobs was 06:28 (mm:ss)

The previous serial job took 11:06.

Nice reduction!

@CoryMartin-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA I'm not sure if it's the most efficient, but I think this will work fine provided we don't need to run on multiple nodes. This was the fastest way to speed everything up. Next, we may wish to combine the pools so that the bufr2ioda.x threads run concurrently with the python ones. But that can be in a subsequent PR.

@CoryMartin-NOAA CoryMartin-NOAA merged commit 9fc4d73 into feature/gdas-validation Nov 16, 2023
5 checks passed
@CoryMartin-NOAA CoryMartin-NOAA deleted the feature/gdas-validation-parallel branch November 16, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants