[ntuple] add Real32Quant column type #16390

silverweed · 2024-09-09T13:48:09Z

This Pull request:

adds the Real32Quant column type to RNTuple. This column type stores floating point values on disk as integers with a user-defined precision (from 3 to 32 bits) and a user-defined value range. This allows to reduce the storage space required to save floats with a well-defined range with more precision than a simple truncation.

The conversion is defined as (pseudocode):

def quantize(value, min, max, n_precision_bits)
{
  quantized_max = (1 << n_precision_bits) - 1;
  scale = quantized_max / (max - min);
  quantized = round((value - min) * scale);
  return quantized;
}

This change requires adding metadata to the on-disk information, more specifically in the Field Description (see specifications.md for more details).

Checklist:

tested changes locally
updated the docs (if necessary)

github-actions · 2024-09-10T14:49:54Z

Test Results

13 files 13 suites 3d 3h 43m 58s ⏱️
2 697 tests 2 697 ✅ 0 💤 0 ❌
32 953 runs 32 953 ✅ 0 💤 0 ❌

Results for commit 64700ac.

♻️ This comment has been updated with latest results.

jblomer

Very nice! In principle looks good to me.

tree/ntuple/v7/doc/specifications.md

jblomer · 2024-09-14T14:04:27Z

tree/ntuple/v7/inc/ROOT/RField/RFieldFundamental.hxx

   void SetTruncated(std::size_t nBits);
+   /// Sets this field to use a quantized integer representation using `nBits` per value.
+   /// This call promises that this field will only contain values contained in `[minValue, maxValue]` inclusive.
+   /// If a value outside this range is assigned to this field, the behavior is undefined.


Should we define it? I.e. test and throw?

I wanted to keep the specification as vague as possible to give us room for implementing it in the most performant way possible. Maybe this is not necessary, but I think we should have a discussion about it before settling for anything other than UB - we can always specify more in the future, but not go back.

I think that's fine for the specification. But in our current implementation, should we / will we throw?

Added in 6526106

tree/ntuple/v7/src/RColumnElement.hxx

pcanal · 2024-09-16T13:24:35Z

This column type stores floating point values on disk as integers with a user-defined precision (from 3 to 32 bits) and a user-defined value range.

This sounds very similar to Double32 ... We will need to explain clearly the differences and advantages ....

jblomer

LGTM, thanks! I'll leave the question where to throw in the Unpack() call stack up to you.

pcanal

I recommend holding off on merging until we have documented/understood the differences (if any) between this implementation and the related Double32_t implementation (capability and resulting onfile precision)

silverweed · 2024-09-19T06:52:32Z

@pcanal The implementation is, logically speaking, exactly the same (see TBufferFile.cxx:Read/WriteDouble32.
The main difference is design-wise, for the fact that RNTuple's quantization is:

not bound to the type of the variable (you can quantize any float or double, not just a Double32_t
not statically chosen (you can set the value range and the bit width at runtime rather than deciding once and for all via the variable comments).

We can discuss in more details, but in my opinion those are the two main points of difference. The implementation itself is trivial and it's akin to Read/WriteFastArrayDouble32 (but slightly more performant in principle, as it doesn't have to check the min/max/scale factor for each element - they are all the same within a call to Pack/Unpack).

As a last divergence point, Double32_t silently clamps the values that fall our of range, while Real32Quant will throw in that situation.

pcanal · 2024-09-19T18:17:19Z

(but slightly more performant in principle, as it doesn't have to check the min/max/scale factor for each element - they are all the same within a call to Pack/Unpack).

Isn't it the already same in the in Read/WriteFastArrayDouble32 ? (Furthermore for reading the common case is to call TBufferFile::ReadWithFactor/ReadWithNbits)

The implementation is, logically speaking, exactly the same

Thank you for clarifying :)

As a last divergence point, Double32_t silently clamps the values that fall our of range, while Real32Quant will throw in that situation.

(If not already done) this should be called out in the doc.

silverweed · 2024-09-20T07:10:18Z

Isn't it the already same in the in Read/WriteFastArrayDouble32?

Yes, I had misread the function the first time, you are right. But anyway performance is not really our concern as regards our motivation for Real32Quant :)

(If not already done) this should be called out in the doc.

Our idea is, at least for now, to say that the behavior in case of out-of-range is undefined, because we don't want to preclude ourselves to change our mind or our implementation in the future. At the moment we throw an exception but we might decide something different in the future, e.g. for performance reasons.
~~You can see the exact wording in this very commit (in the specifications.md file), so if you have some specific suggestion feel free to comment on it.~~

Edit: I was wrong, we're currently not specifying the behavior in the specifications.md but only in the code. Should we also explicitly say the behavior is undefined in the specs? I'm not sure it makes sense because the specs in principle only refer to the binary format; should they also say how a writer should behave when receiving wrong input from the user? @jblomer thoughts about this?

jblomer · 2024-09-20T13:03:54Z

Edit: I was wrong, we're currently not specifying the behavior in the specifications.md but only in the code. Should we also explicitly say the behavior is undefined in the specs? I'm not sure it makes sense because the specs in principle only refer to the binary format; should they also say how a writer should behave when receiving wrong input from the user? @jblomer thoughts about this?

I think that the specification is not the ideal place for that sort of error behavior documentation. I'd suggest a brief section in the architecture.md that summarizes the low-precision float options in RNTuple and highlights the differences and similarities to the already existing Double32_t and Float16_t.

jblomer · 2024-09-23T21:03:49Z

@pcanal is there still anything blocking this PR?

tree/ntuple/v7/doc/architecture.md

tree/ntuple/v7/src/RColumnElement.hxx

[ntuple] make tests pass

uncomment commented-out code

tree/ntuple/v7/src/RColumnElement.hxx

pcanal

Thanks.

silverweed added the in:RNTuple label Sep 9, 2024

silverweed requested review from hahnjo, dpiparo, vepadulano and enirolf September 9, 2024 13:48

silverweed self-assigned this Sep 9, 2024

silverweed requested a review from jblomer as a code owner September 9, 2024 13:48

silverweed force-pushed the ntuple_quantfloat_2 branch 3 times, most recently from cde7f5a to ee5a38d Compare September 9, 2024 13:52

silverweed marked this pull request as draft September 10, 2024 07:58

silverweed force-pushed the ntuple_quantfloat_2 branch 3 times, most recently from eadb32d to 3c1d6f6 Compare September 10, 2024 12:43

silverweed force-pushed the ntuple_quantfloat_2 branch 2 times, most recently from ae7a8ee to 5519b91 Compare September 11, 2024 06:40

silverweed marked this pull request as ready for review September 11, 2024 07:34

hahnjo added this to the 6.34.00 milestone Sep 11, 2024

jblomer reviewed Sep 14, 2024

View reviewed changes

silverweed force-pushed the ntuple_quantfloat_2 branch 2 times, most recently from ee0dc36 to 3a9c90c Compare September 16, 2024 08:44

silverweed force-pushed the ntuple_quantfloat_2 branch from 3a9c90c to 6526106 Compare September 18, 2024 08:08

jblomer approved these changes Sep 18, 2024

View reviewed changes

silverweed force-pushed the ntuple_quantfloat_2 branch from 6526106 to b92c6a4 Compare September 18, 2024 11:32

pcanal requested changes Sep 18, 2024

View reviewed changes

silverweed force-pushed the ntuple_quantfloat_2 branch from b92c6a4 to 34e53ca Compare September 20, 2024 11:19

silverweed force-pushed the ntuple_quantfloat_2 branch from 34e53ca to 9cbc228 Compare September 23, 2024 09:35

silverweed requested a review from pcanal September 23, 2024 12:18

pcanal reviewed Sep 23, 2024

View reviewed changes

tree/ntuple/v7/doc/architecture.md Outdated Show resolved Hide resolved

pcanal reviewed Sep 23, 2024

View reviewed changes

tree/ntuple/v7/doc/architecture.md Outdated Show resolved Hide resolved

pcanal reviewed Sep 23, 2024

View reviewed changes

tree/ntuple/v7/src/RColumnElement.hxx Outdated Show resolved Hide resolved

silverweed requested a review from pcanal September 24, 2024 08:30

silverweed force-pushed the ntuple_quantfloat_2 branch from 9cbc228 to a2ad9a3 Compare September 24, 2024 13:59

silverweed added 11 commits September 24, 2024 16:00

[ntuple] add Real32Quant column type

c14659e

[ntuple] make tests pass

[ntuple] add more tests for Real32Quant, set min bits to 1

2e8f47b

[ntuple] move ValueRange from Field desc to Column desc

488cc12

[ntuple] add column ValueRange logic

2423ab5

uncomment commented-out code

[ntuple] fix specification table

7a6d557

[ntuple] in QuantizeReal, round instead of flooring

9e6b1df

[ntuple] fix ValueRange being always non-null on disk

de71160

[ntuple] add exhaustive tests for Real32Quant

4b6d742

[ntuple] improve specification

095babf

[ntuple] make Real32Quant throws on out of range in Pack/Unpack

984bc4a

[ntuple] add paragraph to architecture on low-prec floats

d6c9c0f

silverweed force-pushed the ntuple_quantfloat_2 branch from a2ad9a3 to d6c9c0f Compare September 24, 2024 14:00

pcanal reviewed Sep 24, 2024

View reviewed changes

tree/ntuple/v7/src/RColumnElement.hxx Outdated Show resolved Hide resolved

[ntuple] simpler unquantization formula

64700ac

pcanal approved these changes Sep 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ntuple] add Real32Quant column type #16390

[ntuple] add Real32Quant column type #16390

silverweed commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 10, 2024 •

edited

Loading

jblomer left a comment

jblomer Sep 14, 2024

silverweed Sep 16, 2024

jblomer Sep 16, 2024

silverweed Sep 18, 2024

pcanal commented Sep 16, 2024

jblomer left a comment

pcanal left a comment

silverweed commented Sep 19, 2024

pcanal commented Sep 19, 2024

silverweed commented Sep 20, 2024 •

edited

Loading

jblomer commented Sep 20, 2024 •

edited

Loading

jblomer commented Sep 23, 2024

pcanal left a comment

[ntuple] add Real32Quant column type #16390

Are you sure you want to change the base?

[ntuple] add Real32Quant column type #16390

Conversation

silverweed commented Sep 9, 2024 • edited Loading

This Pull request:

Checklist:

github-actions bot commented Sep 10, 2024 • edited Loading

Test Results

jblomer left a comment

Choose a reason for hiding this comment

jblomer Sep 14, 2024

Choose a reason for hiding this comment

silverweed Sep 16, 2024

Choose a reason for hiding this comment

jblomer Sep 16, 2024

Choose a reason for hiding this comment

silverweed Sep 18, 2024

Choose a reason for hiding this comment

pcanal commented Sep 16, 2024

jblomer left a comment

Choose a reason for hiding this comment

pcanal left a comment

Choose a reason for hiding this comment

silverweed commented Sep 19, 2024

pcanal commented Sep 19, 2024

silverweed commented Sep 20, 2024 • edited Loading

jblomer commented Sep 20, 2024 • edited Loading

jblomer commented Sep 23, 2024

pcanal left a comment

Choose a reason for hiding this comment

silverweed commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 10, 2024 •

edited

Loading

silverweed commented Sep 20, 2024 •

edited

Loading

jblomer commented Sep 20, 2024 •

edited

Loading