-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
ObservationsObservational data, quality control, error - all things observations!Observational data, quality control, error - all things observations!PythonPython Skills!Python Skills!enhancementNew feature or requestNew feature or request
Description
Ingest observations from NNJA (via Brightband) and convert to DART observation sequence format
Summary
This project will ingest observation data from the Brightband nnja-ai API (the AI-ready NOAA-NASA Joint Archive, NNJA) and convert it into the DART observation sequence (obs_seq) format. The goal is to enable direct assimilation of NNJA observations in DART-based workflows, bridging modern observational archives and existing data assimilation systems.
Motivation
- The nnja-ai dataset provides a modern, well-structured, cloud-native observational archive (in Parquet / tabular format) for a wide range of sensors (satellites, radiosondes, surface stations, etc.)
- DART requires observations in its
obs_seqstructure (with associated metadata, error specifications, and observation types) to perform assimilation. - By building a conversion pipeline, we unlock the potential of NNJA observations for assimilation experiments, operational workflows, and hybrid AI–DA systems.
- This also helps users avoid manual, ad-hoc conversions and ensures consistency, traceability, and robustness in data handling.
Goals
- Develop a conversion tool that queries or ingests NNJA data from the Brightband nnja-ai API.
- Map NNJA variables, sensor identifiers, timestamps, locations, and metadata to DART observation definitions (
obs_def). - Generate valid DART
obs_seqfiles from the ingested data. - Validate output by testing small examples using the DART obs_sequence_tool
- Provide documentation and example notebooks demonstrating conversion workflows.
- (Optional) Automate periodic ingestion / updates so new NNJA observations can be converted on demand.
Approach / Methodology
- Familiarize yourself with the nnja-ai API / SDK
- Use the Brightband
nnja-aiSDK or API to query or download observations in a programmatic way. - Explore the data schemas, partitioning (e.g. date, sensor type), and how to filter for desired subsets.
- Use the Brightband
- Define mapping between NNJA observation schema and DART observation definitions
- Determine how NNJA field names (e.g. sensor, variable, quality flags, geolocation) map to DART’s
obs_type,obs_error,obs_kind, etc. - Handle sensor-specific nuances (e.g. satellite radiances vs in-situ data).
- Determine how NNJA field names (e.g. sensor, variable, quality flags, geolocation) map to DART’s
- Build conversion routines
- Read NNJA data into a notebook
- Apply filters, quality control, and coordinate/time transformations (if needed).
- Create DART-compatible data structures and metadata.
- Write out
obs_seqfiles
- Testing & validation
- Use small subsets of NNJA data to test conversions.
- Run DART observation tool to confirm DART can read the resulting observation seqeunces.
- Compare statistics (observation count, error distributions) before and after conversion.
- Documentation and automation
- Create a Sphinx Gallery example to showcase the nnja-to-dart converter on the pyDARTdiags gallery of notebooks.
- Optionally wrap the converter into a CLI or automated pipeline for ongoing usage.
**Skills Needed or to be gained **
- Python programming (file I/O, data processing)
- Experience with data handling libraries (Pandas, PyArrow, Dask, xarray)
- Familiarity with Parquet, columnar data formats, and large-volume data reading
- Understanding of DART observation sequence format,
obs_def,obs_seqconventions - Some knowledge of remote sensing / satellite observation metadata if working with radiance data
- Comfort with time coordinate systems, geospatial transforms, and quality flags
Possible Challenges & Open Questions
- Some observations may lack full metadata (e.g. sensor angles, calibration) needed by DART.
- Time zone, time reference, or timestamp precision mismatches between NNJA and DART.
- Ensuring that coordinate systems align (e.g. lat/lon grids, altitude levels).
- Performance issues when converting large volumes of data (memory, I/O).
- Consistency with DART observation error and quality control expectations.
- Handling edge cases — missing data, sensor blacklisting, quality flags, or observation duplicates.
- Maintaining compatibility as the nnja-ai schema evolves or updates (versioning).
References
- Brightband nnja-ai project on GitHub the API / SDK for NNJA observations
- Brightband NNJA-AI example notebook
- pyDARTdiags documentation
- DART observation sequence and obs_def documentation
Metadata
Metadata
Assignees
Labels
ObservationsObservational data, quality control, error - all things observations!Observational data, quality control, error - all things observations!PythonPython Skills!Python Skills!enhancementNew feature or requestNew feature or request