Skip to content

Use of multithreading to improve areas where parallel processing could have major benefits to response times #699

@BobTorgerson

Description

@BobTorgerson

We would like to explore the use of multithreading in areas where parallel processing could have major benefits to our response times throughout the API. This will add additional complexity though, so it is something that we should have a careful eye for as we implement it in the future. An example of how this could be used is shown here for the zonal statistics:

data-api/zonal_stats.py

Lines 224 to 264 in 9d6cd62

# Use threading with a small number of workers for parallel
# processing of the zonal statistics
workers = min(4, max(1, cpu_count() // 2))
# Only use parallelization if we have enough combinations
# to justify overhead of starting them up
use_parallel = MULTIPROCESSING and workers > 1 and len(dimension_combinations) > 20
if use_parallel:
with ThreadPoolExecutor(max_workers=workers) as threads:
results = list(
threads.map(
lambda combo: (
combo,
calculate_zonal_stats(
da_i.sel(combo),
rasterized_polygon_array,
x_dim,
y_dim,
compute_full_stats,
),
),
dimension_combinations,
)
)
else:
results = [
(
combo,
calculate_zonal_stats(
da_i.sel(combo),
rasterized_polygon_array,
x_dim,
y_dim,
compute_full_stats,
),
)
for combo in dimension_combinations
]
return results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions