Skip to content

Results from LLM-based security review #306

@brendancol

Description

@brendancol

Hey @remi-braun, hope this find you well, just passing along two findings from a security scan I ran on eoreader using a popular LLM.

Below is LLM generated descriptions of the vulnerabilities.s


  1. XXE in STAC metadata XML parsing (stac_product.py:294)
def _read_mtd_xml_stac(self, mtd_url, **kwargs) -> (etree._Element, dict):
    ...
    mtd_str = self.read_href(mtd_url, clients=self.clients)
    root = etree.fromstring(mtd_str)

mtd_url is a metadata href taken from a STAC Item that the user fetched over the network. The bytes returned by read_href are therefore attacker-influenced.

Reach

Triggered through the documented entry point:

  1. User calls Reader().open(url) where url points to a STAC catalog or item (reader.py:605-620).
  2. The JSON is parsed into a pystac.Item. Asset hrefs become mtd_url.
  3. read_href(mtd_url) fetches the metadata XML.
  4. etree.fromstring(mtd_str) parses with entity resolution and network fetches enabled.

This affects every STAC product variant that inherits from StacProduct: s2_e84, s2_mpc, hls, s1_rtc_asf, s1_rtc_mpc, and any future STAC subclass.

Impact

A hostile STAC catalog (or a hostile URL pasted by the user) can ship XML that:

  • Reads local files via <!ENTITY xxe SYSTEM \"file:///etc/passwd\"> and exposes them through the parsed product attributes.
  • SSRFs internal services via entity URLs (cloud metadata endpoints, RFC1918 hosts).
  • Exhausts memory through entity expansion (billion laughs).

  1. Security: eval() reachable on unvalidated input via public compute_index API (bands/indices.py:151)

eoreader/bands/indices.py:151 calls eval(index)(bands) where index is a function parameter. The function compute_index is publicly exported (eoreader/bands/__init__.py:98, 114). A caller that passes an attacker-controlled string to compute_index gets remote code execution.

Vulnerable code

def compute_index(index: str, bands: dict, **kwargs) -> xr.DataArray:
    ...
    if hasattr(spyndex.indices, index):
        ...
    elif index in EOREADER_DERIVATIVES:
        ...
    else:
        index_arr = eval(index)(bands)

The else branch is reached when the input string is not in the spyndex catalog and not in EOREADER_DERIVATIVES. There is no allowlist gating the eval call inside compute_index itself.

Reach

The internal call path through Product.load(...) is safe: product.py:1178 filters band names through is_index(band) (which checks str(index) in get_all_index_names()) before they reach compute_index.

The risk is the public function. compute_index is exported in eoreader.bands.__all__ and documented as taking an index name string. A downstream application that passes user-controlled strings (config file, web request, CLI arg, notebook input) without re-implementing is_index first gets RCE. A payload such as __import__('os').system('...') fails the hasattr and dict-membership checks and falls into eval.

Impact

Arbitrary code execution in the host process whenever an unfiltered string reaches compute_index. The function's docstring does not warn that it evaluates strings, so a developer reading the API docs has no signal that pre-validation is required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions