Results from LLM-based security review

Hey @remi-braun, hope this find you well, just passing along two findings from a security scan I ran on eoreader using a popular LLM.  

Below is LLM generated descriptions of the vulnerabilities.s

----

> 1. XXE in STAC metadata XML parsing (stac_product.py:294)
> 
> ```python
> def _read_mtd_xml_stac(self, mtd_url, **kwargs) -> (etree._Element, dict):
>     ...
>     mtd_str = self.read_href(mtd_url, clients=self.clients)
>     root = etree.fromstring(mtd_str)
> ```
> `mtd_url` is a metadata `href` taken from a STAC Item that the user fetched over the network. The bytes returned by `read_href` are therefore attacker-influenced.
> 
> ## Reach
> 
> Triggered through the documented entry point:
> 
> 1. User calls `Reader().open(url)` where `url` points to a STAC catalog or item (`reader.py:605-620`).
> 2. The JSON is parsed into a `pystac.Item`. Asset `href`s become `mtd_url`.
> 3. `read_href(mtd_url)` fetches the metadata XML.
> 4. `etree.fromstring(mtd_str)` parses with entity resolution and network fetches enabled.
> 
> This affects every STAC product variant that inherits from `StacProduct`: `s2_e84`, `s2_mpc`, `hls`, `s1_rtc_asf`, `s1_rtc_mpc`, and any future STAC subclass.
> 
> ## Impact
> 
> A hostile STAC catalog (or a hostile URL pasted by the user) can ship XML that:
> 
> - Reads local files via `<!ENTITY xxe SYSTEM \"file:///etc/passwd\">` and exposes them through the parsed product attributes.
> - SSRFs internal services via entity URLs (cloud metadata endpoints, RFC1918 hosts).
> - Exhausts memory through entity expansion (billion laughs).
> 

-----

> 2. Security: eval() reachable on unvalidated input via public compute_index API (bands/indices.py:151)
> 
> `eoreader/bands/indices.py:151` calls `eval(index)(bands)` where `index` is a function parameter. The function `compute_index` is publicly exported (`eoreader/bands/__init__.py:98, 114`). A caller that passes an attacker-controlled string to `compute_index` gets remote code execution.
> 
> # Vulnerable code
> 
> ```python
> def compute_index(index: str, bands: dict, **kwargs) -> xr.DataArray:
>     ...
>     if hasattr(spyndex.indices, index):
>         ...
>     elif index in EOREADER_DERIVATIVES:
>         ...
>     else:
>         index_arr = eval(index)(bands)
> ```
> 
> The `else` branch is reached when the input string is not in the spyndex catalog and not in `EOREADER_DERIVATIVES`. There is no allowlist gating the `eval` call inside `compute_index` itself.
> 
> ## Reach
> 
> The internal call path through `Product.load(...)` is safe: `product.py:1178` filters band names through `is_index(band)` (which checks `str(index) in get_all_index_names()`) before they reach `compute_index`.
> 
> The risk is the public function. `compute_index` is exported in `eoreader.bands.__all__` and documented as taking an index name string. A downstream application that passes user-controlled strings (config file, web request, CLI arg, notebook input) without re-implementing `is_index` first gets RCE. A payload such as `__import__('os').system('...')` fails the `hasattr` and dict-membership checks and falls into `eval`.
> 
> ## Impact
> 
> Arbitrary code execution in the host process whenever an unfiltered string reaches `compute_index`. The function's docstring does not warn that it evaluates strings, so a developer reading the API docs has no signal that pre-validation is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results from LLM-based security review #306

Reach

Impact

Vulnerable code

Reach

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Results from LLM-based security review #306

Description

Reach

Impact

Vulnerable code

Reach

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions