Update FloatFormatter
with parameters for the computer representation
#521
Labels
feature request
Request for a new feature
Problem Description
As a user, I want to make sure the min/max values in the reverse transform can be represented by the machine.
Expected behavior
Add the following parameter to
FloatFormatter
:computer_type
: Default ('Float'
). Accepts:'Int8'
,'Int16'
,'Int32'
,'Int64'
,'UInt8'
,'UInt16'
,'UInt32'
,'UInt64'
,'Float'
Functionality Changes:
fit
, store the originaldtype
of the columntransform
, convert everything toFloat
for machine learning purposes.reverse_transform
:Cast back to the original dtype. As an extra measure: Clip the values to the min and max machine limits for the givencomputer_representation
, and round back to whole numbers if needed. (This is a no-op forFloat
.) Note if:learn_min_max_bounds=True
, then use the learned values instead.Note: The
dtype
may be different than the computer representation. For example, pandas might have read in a column asInt64
by default but the user might be telling us it's supposed to beUInt8
. Always defer to the parameter, not the dtype.Errors
During
fit
ortransform
: Throw an error if the data is out of bounds according to the computer representation. Note: It does not matter what the actual pandas dtype is, only what the computer representation parameter is.Additional context
See #518 as an example for where this fails today.
The text was updated successfully, but these errors were encountered: