-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] add Real32Quant column type #16390
base: master
Are you sure you want to change the base?
Conversation
cde7f5a
to
ee5a38d
Compare
eadb32d
to
3c1d6f6
Compare
Test Results 13 files 13 suites 3d 3h 43m 58s ⏱️ Results for commit 64700ac. ♻️ This comment has been updated with latest results. |
ae7a8ee
to
5519b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! In principle looks good to me.
void SetTruncated(std::size_t nBits); | ||
/// Sets this field to use a quantized integer representation using `nBits` per value. | ||
/// This call promises that this field will only contain values contained in `[minValue, maxValue]` inclusive. | ||
/// If a value outside this range is assigned to this field, the behavior is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define it? I.e. test and throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the specification as vague as possible to give us room for implementing it in the most performant way possible. Maybe this is not necessary, but I think we should have a discussion about it before settling for anything other than UB - we can always specify more in the future, but not go back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine for the specification. But in our current implementation, should we / will we throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 6526106
ee0dc36
to
3a9c90c
Compare
This sounds very similar to Double32 ... We will need to explain clearly the differences and advantages .... |
3a9c90c
to
6526106
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! I'll leave the question where to throw in the Unpack()
call stack up to you.
6526106
to
b92c6a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend holding off on merging until we have documented/understood the differences (if any) between this implementation and the related Double32_t implementation (capability and resulting onfile precision)
@pcanal The implementation is, logically speaking, exactly the same (see
We can discuss in more details, but in my opinion those are the two main points of difference. The implementation itself is trivial and it's akin to As a last divergence point, |
Isn't it the already same in the in
Thank you for clarifying :)
(If not already done) this should be called out in the doc. |
Yes, I had misread the function the first time, you are right. But anyway performance is not really our concern as regards our motivation for Real32Quant :)
Our idea is, at least for now, to say that the behavior in case of out-of-range is undefined, because we don't want to preclude ourselves to change our mind or our implementation in the future. At the moment we throw an exception but we might decide something different in the future, e.g. for performance reasons. Edit: I was wrong, we're currently not specifying the behavior in the specifications.md but only in the code. Should we also explicitly say the behavior is undefined in the specs? I'm not sure it makes sense because the specs in principle only refer to the binary format; should they also say how a writer should behave when receiving wrong input from the user? @jblomer thoughts about this? |
b92c6a4
to
34e53ca
Compare
I think that the specification is not the ideal place for that sort of error behavior documentation. I'd suggest a brief section in the architecture.md that summarizes the low-precision float options in RNTuple and highlights the differences and similarities to the already existing |
34e53ca
to
9cbc228
Compare
@pcanal is there still anything blocking this PR? |
9cbc228
to
a2ad9a3
Compare
[ntuple] make tests pass
uncomment commented-out code
a2ad9a3
to
d6c9c0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
This Pull request:
adds the
Real32Quant
column type to RNTuple. This column type stores floating point values on disk as integers with a user-defined precision (from 3 to 32 bits) and a user-defined value range. This allows to reduce the storage space required to save floats with a well-defined range with more precision than a simple truncation.The conversion is defined as (pseudocode):
This change requires adding metadata to the on-disk information, more specifically in the Field Description (see specifications.md for more details).
Checklist: