Skip to content

Runtime network loading of FIELDS_JSON_URL and mixed type enum #400

@schwehr

Description

@schwehr

@volaya @duckontheweb @lossyrob and #264

Introduced here pystac/summaries.py#L40, FIELDS_JSON_URL causes the default path to go load the field definitions from a remote file. I am not a fan of having an additional code path (there was already the validation schema fetches) outside of testing:

  • That file is not versioned with pystac
  • It incurs a runtime cost of fetching 24089 bytes across a network. For some folks this might be an issue.
  • It's another code path that I have to patch out for hermetic testing
  • It blocks using using this code when offline or sandboxed
  • If something is wrong with that file, it's not obvious how to approach fixing it
  • What if the file moves?

I would much rather have a script to rebuild a python file that contains just the resulting table that is only the parts used here.

Currently it's this:

FIELDS_JSON_URL = ( 
   "https://cdn.jsdelivr.net/npm/@radiantearth/stac-fields/fields-normalized.json"
)

@lru_cache(maxsize=None)
def _get_fields_json(url: str) -> Dict[str, Any]:
    return pystac.StacIO.default().read_json(url)

# ...

class Summarizer:
    # ...
    def __init__(self, fields: Optional[str] = None):
        fieldspath = fields or FIELDS_JSON_URL
        try:
            jsonfields = _get_fields_json(fieldspath)
        except:
            if fields is None:
                raise Exception(
                    "Could not read fields definition file at "
                    f"{fields} or it is invalid.\n"
                    "Try using a local fields definition file."
                )
            else:
                raise
        self._set_field_definitions(jsonfields)

    def _set_field_definitions(self, fields: Dict[str, Any]) -> None:
        self.summaryfields: Dict[str, SummaryStrategy] = {}
        for name, desc in fields["metadata"].items():
            if isinstance(desc, dict):
                strategy_value = desc.get("summary", True)
                try:
                    strategy: SummaryStrategy = SummaryStrategy(strategy_value)
                except ValueError:
                    strategy = SummaryStrategy.DEFAULT
                if strategy != SummaryStrategy.DONT_SUMMARIZE:
                    self.summaryfields[name] = strategy
            else:
                self.summaryfields[name] = SummaryStrategy.DEFAULT

Additionally, having an enum of mixed types is likely to bite us down the road. An importer should translate all of those strategies into single character strings. Especially with True vrs 3 other strings that eval to True. It's not obvious here what the default even is.

class SummaryStrategy(Enum):
    ARRAY = "v"
    RANGE = "r"
    SCHEMA = "s"
    DONT_SUMMARIZE = False
    DEFAULT = True

Also, the above uses DONT for don't. That's hard on some users. Please use DO_NOT_SUMMARIZE or something else more readable. I generally suggest avoiding contractions in code even when it is just in comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions