Skip to content

GageDataFormat

Neptune-Meister edited this page Sep 30, 2024 · 6 revisions

Gage Format - netCDF

USGS Gage Data

The netCDF data files contain data related to multiple USGS stations, stored for discharge measurements at these stations, with additional metadata for each station and a time slice. Discharge measurements are stored as floats, with missing values denoted by NaN. For the discharge Quality, a quality score for the discharge measurements is provided, scaled by a factor of 0.01. Each station's data includes the "Query Time" in Unix timestamp format, using the proleptic Gregorian calendar. Here's a breakdown of the structure and format:

Dimensions:

  • stationIdInd: An unlimited dimension, representing the number of USGS stations included in the data.
  • stationIdStrLen: A fixed length of 15 characters for the station identifiers.
  • timeStrLen: A fixed length of 19 characters for the time strings formatted as "YYYY-MM-DD_HH:mm UTC".

Variables:

  • stationId [dim: stationIdInd, stationIdStrLen] (char): A 2D character array with dimensions (stationIdInd, stationIdStrLen). This stores the station identifier for each USGS station.
    • long_name: "USGS station identifier of length 15"
    • units: "-"
  • time [dim: stationIdInd, timeStrLen] (char): A 2D character array with dimensions (stationIdInd, timeStrLen). It stores the time associated with the data for each station, formatted as "YYYY-MM-DD_HH:mm UTC".
    • long_name: "YYYY-MM-DD_HH:mm UTC"
    • units: "UTC"
  • discharge [dim: stationIdInd] (float): A 1D array with dimension stationIdInd. It represents the discharge at each station in cubic meters per second.
    • long_name: "Discharge in cubic meters per second"
    • units: "m^3/s"
    • _FillValue: NaNf, indicating that missing or undefined values will be filled with "NaN".
  • discharge_quality [dim: stationIdInd] (short): A 1D array with dimension stationIdInd. This variable represents the quality of the discharge data on a scale of 0 to 100, scaled by a factor of 0.01.
    • long_name: "Discharge quality 0 to 100 to be scaled by 100"
    • units: "-"
    • multfactor: "0.01"
  • queryTime [dim: stationIdInd] (int): A 1D array with dimension stationIdInd. It represents the query time in Unix timestamp format (seconds since January 1, 1970).
    • units: "seconds since 1970-01-01 00:00:00 local TZ"
    • calendar: "proleptic_gregorian"

Global Attributes:

  • fileUpdateTimeUTC: The time the NetCDF file was last updated in UTC
  • sliceCenterTimeUTC: The center time of the data slice in UTC sliceTimeResolutionMinutes: The temporal resolution of the data slice, given as typically 15 minutes.

USACE Gage Data

The NetCDF file describes a time slice of data related to discharge measurements for various USACE stations. The Discharge values (in cubic meters per second) are recorded for each station, whereas missing values are represented by NaN. Discharge Quality is scored for each discharge measurement range from 0 to 100, scaled by 0.01. Query Time in Unix timestamps are used to record when the data was queried. The data file is set up to hold 15-minute discharge time slices. Here’s an explanation and the format based on the given header:

Dimensions:

  • stationIdInd: An unlimited dimension, currently with 0 entries. This indicates that no data is present yet for any stations.
  • stationIdStrLen: A fixed length of 15 characters for the station identifiers.
  • timeStrLen: A fixed length of 19 characters for time strings in the format "YYYY-MM-DD_HH:mm UTC".

Variables:

  • stationId (char): A 2D character array with dimensions (stationIdInd, stationIdStrLen). This holds the 15-character identifier for each station.
    • long_name: "USGS station identifier of length 15"
    • units: "-"
  • time [dim: stationIdInd, timeStrLen] (char): A 2D character array with dimensions (stationIdInd, timeStrLen). This records the timestamp for each station in the format "YYYY-MM DD_HH:mm UTC".
    • long_name: "YYYY-MM-DD_HH:mm UTC"
    • units: "UTC"
  • discharge [dim: stationIdInd] (float): A 1D array with dimension stationIdInd. This variable contains the discharge measurements (in cubic meters per second) for each station.
    • long_name: "Discharge in cubic meters per second"
    • units: "m^3/s"
    • _FillValue: NaNf, meaning missing or undefined values will be filled with "NaN".
  • discharge_quality [dim: stationIdInd] (short): A 1D array with dimension stationIdInd. It represents the quality of discharge data, rated on a scale of 0 to 100, and scaled by 0.01.
    • long_name: "Discharge quality 0 to 100 to be scaled by 100"
    • units: "-"
    • multfactor: "0.01"
  • queryTime [dim: stationIdInd] (int): A 1D array with dimension stationIdInd. This records the query time in Unix timestamp format (seconds since January 1, 1970), using the proleptic Gregorian calendar.
    • units: "seconds since 1970-01-01 00:00:00 local TZ"
    • calendar: "proleptic_gregorian"

Global Attributes:

  • fileUpdateTimeUTC: The time the NetCDF file was last updated in UTC
  • sliceCenterTimeUTC: The center time of the data slice in UTC
  • sliceTimeResolutionMinutes: The time resolution of the data slice, which is tyically 15 minutes.

RFC Gage Data

The netCDF data files contain hydrological time series data, focusing on observed and forecasted discharges at a single station over a time span. Below is a detailed breakdown of its structure based on the header:

Dimensions:

  • stationIdStrLen: The length of the station identifier string, fixed at 5 characters.
  • timeStrLen: The length of the time string for timestamps, fixed at 19 characters.
  • forecastInd: An unlimited dimension representing the number of forecast and observation points.
  • nseries: As presently implemented, a dimension of size 1, indicating that the file contains data for a single station.

Variables:

  • stationId [dim: stationIdStrLen] (char): A character array with 5 characters representing the station ID.
    • long_name: "RFC station identifier of length 5"
    • units: "-"
  • issueTimeUTC [dim: nseries, timeStrLen] (char): A string of length 19 representing the issue time of the data, formatted as "YYYY-MM-DD_HH:mm UTC".
    • long_name: "YYYY-MM-DD_HH:mm UTC"
    • units: "UTC"
  • discharges [dim: nseries, forecastInd] (float): A 2D array with dimensions (nseries, forecastInd) representing observed and forecasted discharges (in cubic meters per second).
    • long_name: Explanation of discharge values, which includes 48 hours of observed data before the issue time (T0) and up to 10 days (240 hours) of forecasted data after T0. The total data typically covers 12 days.
    • units: "m^3/s"
  • synthetic_values [dim: nseries] (byte): A 2D array with dimensions (nseries, forecastInd) indicating whether a discharge value is synthetic (1) or original (0).
    • long_name: "Whether the discharge value is synthetic or original"
    • units: "-"
  • totalCounts [dim: nseries] (short): A scalar representing the total number of observation and forecast values for this station.
    • long_name: "Total count of all observation and forecast values"
    • units: "-"
  • observedCounts [dim: nseries] (short): A scalar representing the total number of observed values before T0.
    • long_name: "Total observed values before T0"
    • units: "-"
  • forecastCounts [dim: nseries] (short): A scalar representing the total number of forecast values including and after T0.
    • long_name: "Total forecasted values including and after T0"
    • units: "-"
  • timeSteps [dim: nseries] (int): A scalar indicating the temporal resolution of forecast values in seconds.
    • long_name: "Frequency/temporal resolution of forecast values"
    • units: "seconds"
  • discharge_qualities [dim: nseries] (short): A scalar representing the quality of the discharge data on a scale from 0 to 100, scaled by a factor of 0.01.
    • long_name: "Discharge quality 0 to 100"
    • units: "-"
    • multfactor: "0.01"
  • queryTime [dim: nseries] (int64): A scalar representing the time of the data query in Unix time (seconds since 1970-01-01 00:00:00 in the local timezone).
    • units: "seconds since 1970-01-01 00:00:00 local TZ"

Global Attributes:

  • fileUpdateTimeUTC: Time when the NetCDF file was last updated in UTC.
  • sliceStartTimeUTC: Start time of the time slice for the data in UTC.
  • sliceTimeResolutionMinutes: Time resolution of the data in minutes (60 minutes).
  • missingValue: Value used to represent missing data (-999.99).
  • newest_forecast: Indicates whether the data is the newest forecast available.
  • NWM_version_number: Version of the National Water Model (NWM) used to generate the data (version 2.1).

Canadian Gage Data:

The files contain netCDF data containing time slices from WSC (Water Survey of Canada) stations, capturing discharge measurements at 15-minute intervals.

Dimensions:

  • stationIdStrLen: Fixed length of 15 characters for the station identifiers.
  • stationIdInd: An unlimited dimension containing the number of stations.
  • timeStrLen: Fixed length of 19 characters for time strings formatted as "YYYY-MM-DD_HH:mm UTC".

Variables:

  • stationId [dim: stationIdInd, stationIdStrLen] (char): A 2D character array with dimensions (stationIdInd, stationIdStrLen). This stores the 15-character WSC station identifiers, padded to a length of 15.
    • long_name: "WSC station id padded to length 15"
    • units: "-"
  • time [dim: stationIdInd, timeStrLen] (char): A 2D character array with dimensions (stationIdInd, timeStrLen). It records the timestamp of each discharge measurement in the format "YYYY-MM-DD_HH:mm UTC".
    • long_name: "YYYY-MM-DD_HH:mm UTC"
    • units: "UTC"
  • discharge [dim: stationIdInd] (float): A 1D array with dimension stationIdInd. This variable contains the discharge measurements (in cubic meters per second) for each station.
    • long_name: "Discharge in cubic meters per second"
    • units: "m^3/s"
  • discharge_quality [dim: stationIdInd] (short): A 1D array with dimension stationIdInd. It represents the quality of the discharge data, rated on a scale of 0 to 100, and scaled by 0.01.
    • long_name: "Discharge quality 0 to 100 to be scaled by 100"
    • units: "-"
    • multfactor: "0.01"
  • queryTime [dim: stationIdInd] (int): A 1D array with dimension stationIdInd. It records the query time in Unix timestamp format (seconds since January 1, 1970), using the local time zone.
    • units: "seconds since 1970-01-01 00:00:00 local TZ"

Global Attributes:

  • fileUpdateTimeUTC: The time the NetCDF file was last updated in UTC
  • sliceCenterTimeUTC: The center time of the data slice in UTC
  • sliceTimeResolutionMinutes: The time resolution of the data slice, which is 15 minutes.
Clone this wiki locally