Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decScale is ignore and is doesn't appear to be available #88

Closed
rmendels opened this issue Jun 6, 2019 · 13 comments
Closed

decScale is ignore and is doesn't appear to be available #88

rmendels opened this issue Jun 6, 2019 · 13 comments

Comments

@rmendels
Copy link

rmendels commented Jun 6, 2019

Look at the attached file. If I use 'wgrib -V' and look at a single record info, it shows say:

rec 1460:7671255:date 1977123118 PRES kpds5=1 kpds6=102 kpds7=0 levels=(0,0) grid=220 MSL anl:
  PRES=Pressure [Pa]
  timerange 0 P1 0 P2 0 TimeU 1  nx 63 ny 63 GDS grid 5 num_in_ave 0 missing 0
  center 58 subcenter 0 process 58 Table 1 scan: WE:SN winds(N/S)
  polar stereo: Lat1 -19.115000 Long1 -125.000000 Orient -80.000000
     north pole (63 x 63) Dx 381000 Dy 381000 scan 64 mode 0
  min/max data 97781.2 105112  num bits 11  BDS_Ref 977.812  DecScale -2 BinScale -4

DecScale -2 means to divide the data by 100. If you read using xarray and cfgrib:

grib_file = xr.open_dataset('g1977', engine='cfgrib')
grib_pres = grib_file.pres[1459, :, :]
grib_data = grib_pres.values
grib_data
array([[101387.5 , 101556.25, 101743.75, ..., 101418.75, 101450.  ,
        101481.25],
       [101356.25, 101506.25, 101637.5 , ..., 101387.5 , 101431.25,
        101475.  ],
       [101375.  , 101468.75, 101556.25, ..., 101362.5 , 101425.  ,
        101475.  ],
       ...,
       [101531.25, 101425.  , 101350.  , ..., 101400.  , 101412.5 ,
        101393.75],
       [101556.25, 101456.25, 101400.  , ..., 101475.  , 101512.5 ,
        101537.5 ],
       [101518.75, 101443.75, 101412.5 , ..., 101575.  , 101637.5 ,
        101662.5 ]], dtype=float32)

The data is not scaled as it should be, nor does the DecScale attribute appear to be available in any of the information read in.

g1977.zip

@alexamici
Copy link
Contributor

@rmendels given that the file contains the pressure at mean sea level and it declares unit=Pa cfgrib representation of the data looks correct to me.

I must admit I never noticed the decimalScaleFactor key until now but I would assume ecCodes applies it when needed, @shahramn can you confirm?.

We don't support changing unit in cfgrib, so it is up to the user to rescale and update the unit attribute.

@shahramn
Copy link
Collaborator

shahramn commented Jun 7, 2019 via email

@rmendels
Copy link
Author

rmendels commented Jun 7, 2019

@shahramn @alexamici Please look at my example grib flle, and read it in by the code I give. The data are not scaled. I could live with this if the value of decimalScaleFactor was available somehow in the returned structure, but it is not. This from the NCEP reference (ftp://ftp.cpc.ncep.noaa.gov/wd51we/ams2010_sc/formats.pdf)

DecScale 0 BinScale 0 = the decimal scaling factors are 100 and 20.
Normally the field values are (i * 2BinScale + BDS_Ref) * 10(­DecScale)

What is causing my problem is FNMOC in their infinite wisdom for the same dataset have the file like what I sent, and then scaled and with DecScale = 0. So I get inconsistent reads of the data, sometimes in the range of 950 - 1050, sometimes in the range 95000 - 105000, but DecScale is set correctly one each case. As I said, if the values of DecScale and BinScale were included in the attributes returned that would help

@alexamici
Copy link
Contributor

alexamici commented Jun 7, 2019

@rmendels I did download and read the data you provided, and as I said it appears to be correct:

>>> ds = xr.open_dataset('g1977', engine='cfgrib')                                                                                                
>>> ds.pres.units                                                                                                                                        
'Pa'
>>> np.nanmean(ds.pres)
101265.98

The unit of measure is reported as Pa and the values are in the 100,000 range as it is expected for mean sea level pressure.

It is entirely possible that we represent incorrectly the other dataset you mention, but this one looks correct.

Can you please run the instructions above on a GRIB file with DecScale=0 and report the result?

Thanks!

@rmendels
Copy link
Author

rmendels commented Jun 7, 2019

@alexamici You are misunderstanding my comments. Based on how you folks seem to want to read in the data, the data are being read in correctly in both cases But for me to properly use that data, I need the values of DecScale and BinScale, since you are not applying those in the read-in. Maybe I am missing something, but in what is returned I can not see any way to access those values. Is that making sense? You are returning incomplete information about the attributes of the data.

@alexamici
Copy link
Contributor

alexamici commented Jun 7, 2019

@rmendels I'll try to guess your intent as you are not declaring it explicitly.

I guess that you want the pressure values in hPa not in Pa as they are expressed in the GRIB file (wgrib and ecCodes agree the expected unit of measure is Pa). That's ok, you need to parse the unit attribute in the pres variable and if it is not what you expect you need to apply the change in unit of measure by yourself. Just remember to update the unit attribute as well to keep the representation of the data consistent. The reason I maintain that the information provided by cfgrib is indeed complete, at least in this case, is that you don't need the DecScale key because the same information is accessible in the unit attribute.

Hope this helps.

@rmendels
Copy link
Author

rmendels commented Jun 7, 2019

@alexamici no. In the two files the units are the same. The two files differ in the value of DecScale. I have no problem with cfgrib reading in the data as it is now doing so that it is doing it consistently. What I am asking for then is somewhere in what is return in the file or data access includes the value of DecScale and BinScale, which as far as I can tell the present structure does not include. I am not certain I know how to make it clearer. If there is a way as the code works now for me to access the value of DecScale please tell me how to do so, or could you please add that somewhere to you structure. Attached is the next year of data. The units are the same, the values are .01 of the 1977 value, in 1977 DecScale is -2, in 1978 it is 0. Without DecScale it is hard for me to write general code that will work on all the files.

g1978.zip

@alexamici
Copy link
Contributor

Ok, I finally understand the issue. The g1978 file you attached above has either incorrect metadata or its metadata are interpreted incorrectly by ecCodes, as it declares to have values in Pa but clearly contains hPa:

>>> ds = xr.open_dataset('g1978', engine='cfgrib')
>>> ds.pres.units
'Pa'
>>> np.nanmean(ds.pres)
1011.89197

I'm not familiar enough with the GRIB on-disk data structure to know if it is a ecCodes issue or the file is broken, maybe @shahramn can comment.

This is different from the g1977 file where data and metadata are consistent.

So at the moment I'll do:

  1. if this is a bug in ecCodes I'll move the issue there
  2. in any event we need a generic way to access all GRIB keys Get additional attributes from grib file #89 so you can fix files with broken metadata

@shahramn
Copy link
Collaborator

shahramn commented Jun 10, 2019 via email

@alexamici
Copy link
Contributor

@shahramn I agree g1977 looks good. The original poster has shared a second file g1978 here that instead appears inconsistent (not sure if it is the file or the ecCodes interpretation, but I suspect is the first).

@rmendels
Copy link
Author

@shahramn @alexamici Folks, look at the Value of DecScale in the two files. If eecodes does not return it, then use wgrib. Also, as I posted above from the NCEP guide to Grib:

DecScale 0 BinScale 0 = the decimal scaling factors are 100 and 20.
Normally the field values are (i * 2BinScale + BDS_Ref) * 10(­DecScale)

I am anything but a Grib expert, and NCEP may not even be following the standard, but if you combine this with the stated values each file ultimately produces the same numbers. It strikes me that this is similar to scale and offset in Netcdf. Again, a very simple fix is to provide the complete metatdata in the structure, that is the values of DecScale and BinScale. BTW these files are from the Navy's FNMOC.

@alexamici
Copy link
Contributor

alexamici commented Jul 23, 2019

@rmendels in branch stable/0.9.7.x I added an experimental API to read arbitrary GRIB key, please follow the discussion in issue #89

In your case you can use:

>>> ds = xr.open_dataset('g1978', engine='cfgrib', backend_kwargs={'read_keys':['decimalScaleFactor', 'binaryScaleFactor']})                     
>>> ds.pres                                                                                                                                       
<xarray.DataArray 'pres' (time: 1460, y: 63, x: 63)>
[5794740 values with dtype=float32]
Coordinates:
  * time        (time) datetime64[ns] 1978-01-01 ... 1978-12-31T18:00:00
    step        timedelta64[ns] ...
    meanSea     int64 ...
    latitude    (y, x) float64 ...
    longitude   (y, x) float64 ...
    valid_time  (time) datetime64[ns] ...
Dimensions without coordinates: y, x
Attributes:
    GRIB_paramId:                    54
    GRIB_shortName:                  pres
    ...
    GRIB_decimalScaleFactor:         0
    GRIB_binaryScaleFactor:          -4
    long_name:                       Pressure
    units:                           Pa

@alexamici
Copy link
Contributor

@rmendels if you don't mind I'll close this issue as duplicate of #89 that is more specific. Thanks again for reporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants