Allow writing pre-masked and pre-scaled DataSet to Store

### Is your feature request related to a problem?

I am attempting to reduce the size of Datasets stored on disk and sent via network while maintaining data accuracy. One compelling way to do this is to use integer storage, since the data source is natively in int16 format. However, the consumers of these Datasets is expecting data scaled into physical units, and that requires moving to a float64 data type and multiplying by a float scaling factor.

On the surface, this seems like a great fit for the `scale_factor` encoding parameter. The main issue I have is loss of accuracy. In order to use the `scale_factor` encoding parameter, the data manipulation involved looks like:

1. Convert data from `int16` to `float64`, multiply by the scale factor, and add offset if needed.
2. Write to file using one of the `to_X` methods (e.g. `to_netcdf()`). 
    - The write-file operation applies the reverse scaling and casts to the target datatype, so int16 data is stored. 
3. Consumer then loads data, using `mask_and_scale=True` (default).
    - When the data is loaded, data is cast to a float type, multiplied by `scale_factor`, and `add_offset` added to it. 

This set of manipulations adds two unnecessary operations (1 and 2 above), if one could store scale-encoded data directly. Unnecessary error due to casting and float calculations would be avoided by reducing the number of conversions/calculations on the source data.

For reference, I am using the netCDF4 format with `h5netcdf` engine, but I'm not sure if the engine or the file format matters with regard to scale encoding. 

### Describe the solution you'd like

Add a `mask_and_scale` argument on `to_X` write methods (e.g. `to_netcdf()`), so a producer can provide pre-scale-encoded data, but also provide the scale encoding parameters so the load operation can apply the scale operation if `mask_and_scale=True`. Essentially, if this proposed `mask_and_scale` argument were True, the library would assume the user has pre-scaled and pre-masked their data if encoding has been set. Perhaps some basic check to ensure the dtype matched the target encoding, as a sanity check.

In concept, this seems very similar to the `mask_and_scale` argument to `load_dataset()`, but in the opposite direction. 

### Describe alternatives you've considered

1. Using compression to reduce datastore size
    - This is somewhat effective but has runtime consequences on both load and write sides of the operation. Will likely consider this in the near term.
2. Storing the dataset directly as `int16` data, and forcing the consumer to scale the data themselves.
    - I'd rather not push this onto the consumer's code, but it might be workable. 
3. Doing some math to "force" the rounding/casting outcome in the end to match the source data
    - Seems error-prone and cumbersome, but may work in some situations. 

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow writing pre-masked and pre-scaled DataSet to Store #11207

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Allow writing pre-masked and pre-scaled DataSet to Store #11207

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions