Conversation
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Reviewer's GuideAdds support for OPERA L4 zenith tropospheric delay products as a new tropospheric correction method in MintPy, including CLI wiring, configuration knobs, ASF-backed on-demand download with spatial/vertical subsetting, and end-to-end integration into the smallbaselineApp workflow. Sequence diagram for OPERA tropospheric correction workflowsequenceDiagram
actor User
participant SmallbaselineApp as smallbaselineApp
participant TropoOperaCLI as tropo_opera_cli
participant TropoOperaCore as tropo_opera_core
participant ASF as ASF_Search_Service
participant OPERAStore as Local_OPERA_dir
participant DiffCLI as diff_cli
User->>SmallbaselineApp: configure mintpy.troposphericDelay.method=opera
SmallbaselineApp->>SmallbaselineApp: get_timeseries_filename(method=opera)
SmallbaselineApp-->>User: timeseries_OPERA.h5 path
User->>TropoOperaCLI: tropo_opera.py -f dis_file -g geom_file --dir operaDir
TropoOperaCLI->>TropoOperaCLI: cmd_line_parse()
TropoOperaCLI->>TropoOperaCore: run_tropo_opera(inps)
TropoOperaCore->>TropoOperaCore: read_inps2date_time(inps)
TropoOperaCore->>TropoOperaCore: get_opera_date_time_list(supported_dates, utc_sec)
TropoOperaCore->>OPERAStore: get_opera_file_status(opera_date_list, opera_hour_list, operaDir)
OPERAStore-->>TropoOperaCore: matched_files, missing_date_hour_list
alt missing OPERA products
TropoOperaCore->>ASF: dload_opera_files(missing_date_hour_list, geom_file, dem_range)
ASF-->>OPERAStore: OPERA_L4_TROPO-ZENITH subset files
TropoOperaCore->>OPERAStore: get_opera_file_status(...)
OPERAStore-->>TropoOperaCore: updated matched_files
end
TropoOperaCore->>TropoOperaCore: calculate_delay_timeseries(tropo_file, dis_file, geom_file, operaDir)
loop for each acquisition date
TropoOperaCore->>OPERAStore: _get_best_local_opera_file(model_date, model_hour)
OPERAStore-->>TropoOperaCore: opera_file
TropoOperaCore->>TropoOperaCore: calc_zenith_delay_from_opera_file(opera_file, geom_file)
TropoOperaCore->>TropoOperaCore: _project_zenith_to_los(zenith_delay, inc_angle)
TropoOperaCore->>TropoOperaCore: writefile.write_hdf5_block(timeseries dataset)
end
TropoOperaCore->>DiffCLI: diff.py dis_file tropo_file -o cor_dis_file --force
DiffCLI-->>TropoOperaCore: corrected displacement file
TropoOperaCore-->>User: cor_dis_file path
Flowchart for downloading and subsetting OPERA products from ASFflowchart TD
A["Start dload_opera_files(missing_date_hour_list, opera_dir, geom_file, dem_range)"] --> B{missing_date_hour_list empty?}
B -->|yes| Z["Return (nothing to download)"]
B -->|no| C["Import asf_search and create ASFSession with _create_asf_session"]
C --> D["Check geom_file and fsspec availability for subsetting"]
D --> E["Build set of required model time tokens (YYYYMMDDTHHMMSSZ)"]
E --> F["Group tokens by model date"]
F --> G["For each model date: build ASFSearchOptions\ncollections=OPERA_TROPO_COLLECTIONS, date window"]
G --> H["Run ASF search(opts)"]
H --> I["For each product: extract text, parse model and production tokens"]
I --> J["Select best product per token (latest production time)"]
J --> K{Token has matching product?}
K -->|no| L["Log no match for token"]
K -->|yes| M{"Can subset remotely?\n(geom_file and fsspec available)"}
M -->|yes| N["Call dload_opera_file_subset(product, opera_dir, geom_file, dem_range, session)"]
N --> O{Subset download valid?}
O -->|yes| Q["Keep subsetted OPERA file in opera_dir"]
O -->|no| P["Fallback to full download"]
M -->|no| P["Use ASFSearchResults.download(path=opera_dir, session)"]
P --> Q
Q --> R["Update progress bar and counters"]
R --> S{More tokens?}
S -->|yes| G
S -->|no| T["Print summary of downloaded tokens and files"]
T --> Z
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The current implementation of
calc_zenith_delay_from_opera_filebuildsdelay_2dwith shape(npixels, ny, nx)and then callsmap_coordinatesover it, which will be extremely memory-heavy for typical image sizes (millions of pixels) and likely not scalable; consider switching to a direct 3D interpolation (e.g.,RegularGridInterpolatoron (height, lat, lon) evaluated at (DEM, lat, lon) points) or restructuring the computation to avoid allocating arrays proportional tonpixels * ny * nx. - The docstring of
read_opera_total_delay_cubementions cropping the height dimension "plus a one-level buffer on each side", butget_opera_height_crop_indicesdoes not add any buffer and just brackets [dem_min, dem_max]; it would be good to either add the buffer or update the docstring to match the actual behavior to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The current implementation of `calc_zenith_delay_from_opera_file` builds `delay_2d` with shape `(npixels, ny, nx)` and then calls `map_coordinates` over it, which will be extremely memory-heavy for typical image sizes (millions of pixels) and likely not scalable; consider switching to a direct 3D interpolation (e.g., `RegularGridInterpolator` on (height, lat, lon) evaluated at (DEM, lat, lon) points) or restructuring the computation to avoid allocating arrays proportional to `npixels * ny * nx`.
- The docstring of `read_opera_total_delay_cube` mentions cropping the height dimension "plus a one-level buffer on each side", but `get_opera_height_crop_indices` does not add any buffer and just brackets [dem_min, dem_max]; it would be good to either add the buffer or update the docstring to match the actual behavior to avoid confusion.
## Individual Comments
### Comment 1
<location path="src/mintpy/tropo_opera.py" line_range="313-317" />
<code_context>
+ # Vectorise: gather the two bracketing slabs for all pixels
+ # data_lo[p, iy, ix] = data[idx[p], iy, ix] → shape (npixels, ny, nx)
+ npixels = len(dem_flat)
+ data_lo = data[idx] # shape (npixels, ny, nx)
+ data_hi = data[idx + 1] # shape (npixels, ny, nx)
+ # Weighted blend: shape (npixels, ny, nx)
+ w3 = w[:, np.newaxis, np.newaxis]
+ delay_2d = data_lo * (1.0 - w3) + data_hi * w3 # (npixels, ny, nx)
+
+ # --- Step 2: Cubic interpolation in lat/lon per pixel -------------------
</code_context>
<issue_to_address>
**issue (performance):** Vertical interpolation allocates O(npixels * ny * nx) arrays and will not scale for typical scene sizes.
In `calc_zenith_delay_from_opera_file`, this sequence
```python
data_lo = data[idx] # (npixels, ny, nx)
data_hi = data[idx + 1] # (npixels, ny, nx)
w3 = w[:, None, None]
delay_2d = data_lo * (1.0 - w3) + data_hi * w3
```
materializes arrays of size `npixels * ny * nx`, which for typical MintPy scenes (e.g. ~10^6 pixels and ~10^4 OPERA grid points) is prohibitively large in both memory and compute.
Please refactor to avoid this scaling, e.g. by:
- Using a single 3D interpolator over `(z_axis, y_axis, x_axis)` evaluated at `(DEM, lat2d, lon2d)`, or
- Processing the interpolation in bounded chunks over the output pixels.
As written, this is likely to exhaust memory or severely thrash on realistic datasets.
</issue_to_address>
### Comment 2
<location path="src/mintpy/tropo_opera.py" line_range="317-326" />
<code_context>
+ delay_2d = data_lo * (1.0 - w3) + data_hi * w3 # (npixels, ny, nx)
</code_context>
<issue_to_address>
**issue (performance):** Cubic interpolation step builds a huge 3D array and calls `map_coordinates` over it, which is also not scalable.
`delay_2d` has shape `(npixels, ny, nx)` and is then passed to `map_coordinates` with
```python
coords = np.array([
np.arange(npixels, dtype=np.float64),
iy_frac,
ix_frac,
])
```
This scales memory as `npixels * ny * nx`, which is likely to be prohibitive for typical scenes.
To keep a two-step scheme but make it scalable, you could:
- First do the vertical interpolation once per OPERA time slice to get a 2D field on the OPERA grid.
- Then interpolate that 2D field from `(y_axis, x_axis)` to `(lat2d, lon2d)` for the MintPy grid (e.g., via `RegularGridInterpolator` or `map_coordinates` on a 2D array).
This removes `npixels` as a separate axis in the interpolated arrays and avoids the large 3D allocation.
</issue_to_address>
### Comment 3
<location path="src/mintpy/cli/tropo_opera.py" line_range="15-24" />
<code_context>
+from mintpy.utils.arg_utils import create_argument_parser
+
+############################################################################
+REFERENCE = """references:
+ in prerp to be added.
+"""
+
+DIR_DEMO = """--dir ./OPERA
+ To be added
+ ...
+"""
+
+EXAMPLE = """example:
+ tropo_opera.py -f timeseries.h5 -g inputs/geometryRadar.h5 --dir ./OPERA
+ tropo_opera.py -f geo/geo_timeseries.h5 -g geo/geo_geometryRadar.h5 --dir ./OPERA
+"""
+
</code_context>
<issue_to_address>
**nitpick (typo):** Help/epilog strings contain several typos and placeholders that should be cleaned up for user-facing CLI.
Please fix the spelling in the user-facing strings (e.g., `"in prerp"`, `"Tropopsheric"`, `"trospheric"`) and replace or remove placeholder texts like `"To be added"` so the CLI help appears complete and professional.
Suggested implementation:
```python
############################################################################
REFERENCE = """references:
OPERA tropospheric correction products for InSAR time series:
- Bekaert, D. et al., "OPERA L2 tropospheric corrections for InSAR time series" (documentation and product specification).
"""
```
Beyond the `REFERENCE` string shown above, the following additional changes are needed to fully implement your comment:
1. Update the `DIR_DEMO` string to remove the `"To be added"` placeholder and replace it with a concise explanation of what `--dir` should point to (e.g., "Path to the directory containing OPERA tropospheric correction products.").
2. Update the `EXAMPLE` string so that it contains polished, final examples without placeholders (ensure that the example commands are valid and consistent with the current CLI).
3. Search the entire file for typos like `"Tropopsheric"` and `"trospheric"` (and similar) and correct them to `"Tropospheric"` where they appear in user-facing help/epilog text.
4. Remove or rephrase any remaining "to be added" or similar placeholder text in help/epilog strings so the CLI help appears complete and professional.
</issue_to_address>
### Comment 4
<location path="src/mintpy/tropo_opera.py" line_range="211" />
<code_context>
+ return lat2d, lon2d, dem
+
+
+def calc_zenith_delay_from_opera_file(opera_file, geom_file, pad_cells=3):
+ """Calculate 2D zenith tropospheric delay map intersected with DEM.
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying `calc_zenith_delay_from_opera_file` and `dload_opera_files` by extracting focused helpers and using straightforward interpolation logic instead of custom broadcasting tricks and long in-body design narratives.
You can simplify `calc_zenith_delay_from_opera_file` and `dload_opera_files` quite a bit without losing functionality, mainly by:
* removing custom/broadcast-heavy interpolation logic and commentary
* factoring out single‑responsibility helpers
### 1. Simplify `calc_zenith_delay_from_opera_file`
Right now the function:
* contains a long design narrative
* uses non‑obvious broadcasting and `map_coordinates` tricks
* mixes: cube reading, axis normalization, vertical interpolation, lateral interpolation, and DEM masking
You can keep the same feature set with much simpler code by:
1. Extracting helpers:
* `prepare_opera_grid(cube)`
* `interpolate_delay_3d(z_axis, y_axis, x_axis, data, dem, lat2d, lon2d)`
2. Using a single `RegularGridInterpolator` in 3D.
If cubic lateral interpolation is not strictly required, a single 3D linear interpolation gives a big complexity win. One possible refactor:
```python
def prepare_opera_grid(cube):
z_axis = np.asarray(cube['height'], dtype=np.float64)
y_axis = np.asarray(cube['latitude'], dtype=np.float64)
x_axis = np.asarray(cube['longitude'], dtype=np.float64)
data = np.asarray(cube['total_delay'], dtype=np.float64) # (nz, ny, nx)
# ensure ascending axes
if z_axis.size > 1 and z_axis[1] < z_axis[0]:
z_axis = z_axis[::-1]
data = data[::-1, :, :]
if y_axis.size > 1 and y_axis[1] < y_axis[0]:
y_axis = y_axis[::-1]
data = data[:, ::-1, :]
if x_axis.size > 1 and x_axis[1] < x_axis[0]:
x_axis = x_axis[::-1]
data = data[:, :, ::-1]
return z_axis, y_axis, x_axis, data
```
Then replace the custom vertical + lateral interpolation with one clean call:
```python
from scipy.interpolate import RegularGridInterpolator
def interpolate_delay_3d(z_axis, y_axis, x_axis, data, lat2d, lon2d, dem):
rgi = RegularGridInterpolator(
(z_axis, y_axis, x_axis),
data,
bounds_error=False,
fill_value=np.nan,
# method='linear', # explicit if desired
)
pts = np.column_stack([
dem.ravel().astype(np.float64),
lat2d.ravel().astype(np.float64),
lon2d.ravel().astype(np.float64),
])
ztd_flat = rgi(pts)
ztd = ztd_flat.reshape(dem.shape).astype(np.float32)
# mask invalid DEM
ztd[~np.isfinite(dem)] = np.nan
return ztd
```
`calc_zenith_delay_from_opera_file` becomes mostly orchestration:
```python
def calc_zenith_delay_from_opera_file(opera_file, geom_file, pad_cells=3):
"""Calculate 2D zenith tropospheric delay map intersected with DEM.
Interpolates the 3D OPERA delay cube (height, lat, lon) onto the
3D point cloud defined by (DEM height, pixel lat, pixel lon),
using RegularGridInterpolator.
"""
lat2d, lon2d, dem = get_geom_lat_lon_dem(geom_file)
valid_dem = dem[np.isfinite(dem)]
dem_range = None
if valid_dem.size > 0:
dem_range = (float(np.nanmin(valid_dem)), float(np.nanmax(valid_dem)))
cube = read_opera_total_delay_cube(
opera_file, geom_file, dem_range=dem_range, pad_cells=pad_cells
)
z_axis, y_axis, x_axis, data = prepare_opera_grid(cube)
ztd = interpolate_delay_3d(z_axis, y_axis, x_axis, data, lat2d, lon2d, dem)
return ztd, cube
```
Benefits:
* no advanced multi‑axis indexing (`data[idx]` producing `(npixels, ny, nx)`)
* no reliance on `map_coordinates` with synthetic axes
* the docstring can stay short; the implementation is self‑documenting
If preserving exactly “linear vertical, cubic lateral” is critical:
* keep a short docstring note
* but still extract helpers:
* `prepare_opera_grid`
* `vertical_interp_to_dem(z_axis, data, dem)` that returns a single 3D array `(height_on_dem, ny, nx)` or a cleaner shape like `(ny, nx, npixels)` using explicit `for` / `np.interp` over flattened `(ny*nx)` rather than advanced indexing
* `interpolate_grid_to_pixels(field, y_axis, x_axis, lat2d, lon2d)` using a single `RegularGridInterpolator` in 2D
The key is to remove the long “design debate” comment block and the custom broadcasting patterns, even if you keep the two‑step scheme.
### 2. Strip internal design commentary
Inside `calc_zenith_delay_from_opera_file` there is a multi‑paragraph comment explaining different strategies (“Most efficient”, “Better”, “Best practical approach”, etc.).
You can significantly reduce cognitive load by:
* Moving any rationale you want to preserve into the docstring as 2–3 sentences.
* Deleting the in‑body narrative, keeping only short comments right above non‑obvious lines.
For example, replace:
```python
# --- Step 1: Linear interpolation in the vertical -----------------------
# At each OPERA (lat, lon) grid node, interpolate ...
# [multiple paragraphs of alternative designs and complexity discussion]
```
with:
```python
# Linear interpolation of delay along height onto DEM elevations
```
This makes it much easier for future readers to see “what it does now” instead of re‑reading past design trade‑offs.
### 3. Split `dload_opera_files` into smaller helpers
`dload_opera_files` currently mixes:
* missing token grouping
* ASF query construction
* product selection (latest production per model time)
* deciding subset vs full download
* progress printing
You can keep the exact behavior but extract helpers to isolate responsibilities. For example:
```python
def build_missing_tokens(missing_date_hour_list):
missing_tokens = sorted({_model_time_token(d, h) for d, h in missing_date_hour_list})
date_to_tokens = {}
for token in missing_tokens:
date_to_tokens.setdefault(token[:8], []).append(token)
return missing_tokens, date_to_tokens
```
```python
def search_day_products(asf, session, date_str):
start_str = f'{date_str[0:4]}-{date_str[4:6]}-{date_str[6:8]}T00:00:00Z'
end_str = f'{date_str[0:4]}-{date_str[4:6]}-{date_str[6:8]}T23:59:59Z'
opts = asf.ASFSearchOptions(
maxResults=250,
collections=OPERA_TROPO_COLLECTIONS,
collectionAlias=False,
start=start_str,
end=end_str,
)
if session is not None:
opts.session = session
return asf.search(opts=opts)
```
```python
def select_best_product_per_token(results, day_tokens):
token_best_product = {}
token_best_prod_time = {}
token_match_count = {token: 0 for token in day_tokens}
for product in results:
text = _asf_product_text(product)
if 'OPERA_L4_TROPO-ZENITH_' not in text:
continue
model_token, prod_token = _parse_opera_time_tokens(text)
if model_token not in token_match_count:
continue
token_match_count[model_token] += 1
new_prod_time = prod_token or ''
old_prod_time = token_best_prod_time.get(model_token, '')
if model_token not in token_best_product or new_prod_time > old_prod_time:
token_best_product[model_token] = product
token_best_prod_time[model_token] = new_prod_time
return token_best_product, token_match_count
```
Then `dload_opera_files` becomes:
```python
missing_tokens, date_to_tokens = build_missing_tokens(missing_date_hour_list)
...
for date_str in sorted(date_to_tokens.keys()):
day_tokens = date_to_tokens[date_str]
results = search_day_products(asf, session, date_str)
token_best_product, token_match_count = select_best_product_per_token(results, day_tokens)
# rest: subset/full download + progress updates
```
This keeps the workflow and user‑visible behavior identical, but:
* shortens `dload_opera_files`
* makes testing and debugging each part much easier (`select_best_product_per_token` can be unit‑tested with synthetic `results`)
* reduces nesting and the number of variables you need to keep in mind at once
---
These changes preserve the new OPERA‑based functionality while substantially reducing complexity in the hot spots another reviewer flagged.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…ypos) - Remove extra blank line after imports to satisfy isort - Rewrite interpolation: cubic-lateral per height level then linear-vertical to reduce memory from O(npixels*ny*nx) to O(nz*npixels) - Fix docstring in read_opera_total_delay_cube (one-level buffer -> bracketing) - Fix CLI typos: Tropopsheric, trospheric, prerp, placeholder descriptions - Add _ensure_ascending helper for axis/data orientation
- Remove redundant parentheses around f-strings (pyupgrade --py36-plus) - Add asf_search to requirements.txt (needed for OPERA download)
fsspec[http] includes aiohttp, enabling byte-range HTTP downloads for efficient spatial/vertical subsetting of OPERA ZTD files.
|
@yunjunz this should bring the PR back in sync with dev, and also it addresses all the concerns of the bots. |
|
Thanks for checking @sssangha. issues for the test are addressed as well. |
Description of proposed changes
This is an update to the PR that was open before: #1473.
Additionally:
This brings down resources drastically in terms of disk need and time to download. Example here is for MExico city example for 6 NISAR acquisitions.
Comparing the old full 3D method with the new 2D/1D combo (changed to simpler interpolation) shows also nearly identical results.
Summary by Sourcery
Add support for tropospheric delay correction using OPERA zenith delay products and wire it into the MintPy workflow and CLI.
New Features:
Enhancements: