Skip to content

Ingest observations from NNJA and convert to DART observation sequence format #2

@hkershaw-brown

Description

@hkershaw-brown

Ingest observations from NNJA (via Brightband) and convert to DART observation sequence format

Summary
This project will ingest observation data from the Brightband nnja-ai API (the AI-ready NOAA-NASA Joint Archive, NNJA) and convert it into the DART observation sequence (obs_seq) format. The goal is to enable direct assimilation of NNJA observations in DART-based workflows, bridging modern observational archives and existing data assimilation systems.

Motivation

  • The nnja-ai dataset provides a modern, well-structured, cloud-native observational archive (in Parquet / tabular format) for a wide range of sensors (satellites, radiosondes, surface stations, etc.)
  • DART requires observations in its obs_seq structure (with associated metadata, error specifications, and observation types) to perform assimilation.
  • By building a conversion pipeline, we unlock the potential of NNJA observations for assimilation experiments, operational workflows, and hybrid AI–DA systems.
  • This also helps users avoid manual, ad-hoc conversions and ensures consistency, traceability, and robustness in data handling.

Goals

  • Develop a conversion tool that queries or ingests NNJA data from the Brightband nnja-ai API.
  • Map NNJA variables, sensor identifiers, timestamps, locations, and metadata to DART observation definitions (obs_def).
  • Generate valid DART obs_seq files from the ingested data.
  • Validate output by testing small examples using the DART obs_sequence_tool
  • Provide documentation and example notebooks demonstrating conversion workflows.
  • (Optional) Automate periodic ingestion / updates so new NNJA observations can be converted on demand.

Approach / Methodology

  • Familiarize yourself with the nnja-ai API / SDK
    • Use the Brightband nnja-ai SDK or API to query or download observations in a programmatic way.
    • Explore the data schemas, partitioning (e.g. date, sensor type), and how to filter for desired subsets.
  • Define mapping between NNJA observation schema and DART observation definitions
    • Determine how NNJA field names (e.g. sensor, variable, quality flags, geolocation) map to DART’s obs_type, obs_error, obs_kind, etc.
    • Handle sensor-specific nuances (e.g. satellite radiances vs in-situ data).
  • Build conversion routines
    • Read NNJA data into a notebook
    • Apply filters, quality control, and coordinate/time transformations (if needed).
    • Create DART-compatible data structures and metadata.
    • Write out obs_seq files
  • Testing & validation
    • Use small subsets of NNJA data to test conversions.
    • Run DART observation tool to confirm DART can read the resulting observation seqeunces.
    • Compare statistics (observation count, error distributions) before and after conversion.
  • Documentation and automation

**Skills Needed or to be gained **

  • Python programming (file I/O, data processing)
  • Experience with data handling libraries (Pandas, PyArrow, Dask, xarray)
  • Familiarity with Parquet, columnar data formats, and large-volume data reading
  • Understanding of DART observation sequence format, obs_def, obs_seq conventions
  • Some knowledge of remote sensing / satellite observation metadata if working with radiance data
  • Comfort with time coordinate systems, geospatial transforms, and quality flags

Possible Challenges & Open Questions

  • Some observations may lack full metadata (e.g. sensor angles, calibration) needed by DART.
  • Time zone, time reference, or timestamp precision mismatches between NNJA and DART.
  • Ensuring that coordinate systems align (e.g. lat/lon grids, altitude levels).
  • Performance issues when converting large volumes of data (memory, I/O).
  • Consistency with DART observation error and quality control expectations.
  • Handling edge cases — missing data, sensor blacklisting, quality flags, or observation duplicates.
  • Maintaining compatibility as the nnja-ai schema evolves or updates (versioning).

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    ObservationsObservational data, quality control, error - all things observations!PythonPython Skills!enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions