-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the Decimal type as an arbitrary-precision fixed-point numeric value #3902
Conversation
Here is an idea for how to encode in strings while preserving sort order. We write any number as 0.xyz... * 10^exp, where x != 0 and also the last decimal is not 0. It holds that if two numbers have different exponents, the one with the smaller exponent will always be smaller. If they have the same exponent, the one with the smallest decimals in lexicographic order is smaller. The first part of the string is an encoding of the exponent. The first part of the exponent encoding is the length of the exponent using repeated characters ("base 1"). Say we use a range of X+1 characters c[0] through c[X]. X could be 10 or e.g. 64 or 254. Then we can follow with an encoding of the decimals xyz.. I haven't included anything about negative numbers. To support that we can preamble everything with a character for the sign (e.g. 0 negative 1 for positive), and then for negative numbers we reverse the rest of the encoding, i.e. we "flip" the characters c[0] through c[X]. For datum.Next and datum.Prev, note that we have HasPrev() that tells us if we can use Prev. Maybe we can add a similar HasNext() and avoid using Next. Conceptually there is definitely no next or previous value.. There is a Next value that we can generate which is not a valid encoding, but it is lexicographically smaller than all encodings of larger numbers - and is the same as for strings - we append a 0. If that is actually what Next needs, then we can provide that (and we can provide a similar Prev as well if there is a large 'Z' character we don't use in encodings). Comments from the review on Reviewable.io |
Reviewed 26 of 30 files at r1. roachpb/data.go, line 433 [r1] (raw file): sql/driver/wire.go, line 99 [r1] (raw file): sql/parser/scan_test.go, line 395 [r1] (raw file): sql/parser/type_check_test.go, line 90 [r1] (raw file): sql/pgwire/types.go, line 359 [r1] (raw file): Comments from the review on Reviewable.io |
b35f97a
to
79181b8
Compare
Review status: 22 of 30 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. sql/driver/wire.go, line 99 [r1] (raw file): sql/parser/scan_test.go, line 395 [r1] (raw file): sql/parser/type_check_test.go, line 90 [r1] (raw file): sql/pgwire/types.go, line 359 [r1] (raw file): Comments from the review on Reviewable.io |
Reviewed 4 of 30 files at r1, 4 of 4 files at r2. sql/parser/builtins.go, line 705 [r2] (raw file): sql/parser/builtins.go, line 1006 [r2] (raw file): sql/parser/builtins.go, line 1073 [r2] (raw file): sql/parser/eval.go, line 419 [r2] (raw file): sql/parser/eval.go, line 897 [r2] (raw file): Comments from the review on Reviewable.io |
79181b8
to
fff75f3
Compare
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. sql/parser/builtins.go, line 705 [r2] (raw file): sql/parser/builtins.go, line 1006 [r2] (raw file): sql/parser/builtins.go, line 1073 [r2] (raw file): sql/parser/eval.go, line 419 [r2] (raw file): sql/parser/eval.go, line 897 [r2] (raw file): Comments from the review on Reviewable.io |
Thanks @RaduBerinde, I just created a decimal encoding scheme based roughly on this idea. Using the same assumption you pointed out above with all numbers defined as 0.xyz... * 10^exp, where x != 0 and also the last decimal is not 0. The scheme starts the encoding with one of 7 different starting tags (2 more for infinity and 2 more for NaN, once proper support is added).
After the tag, the absolute value of the exponent (if it's not zero) is encoded using an unsigned Varint. Next, the digit string is encoded in a big-endian base-16 byte string. Finally, a null terminator is appended to the end. In total, the scheme will encode a key that looks like One thing to note with the current commit is that I needed direct access to Review status: 18 of 37 files reviewed at latest revision, 6 unresolved discussions. roachpb/data.go, line 433 [r1] (raw file): Comments from the review on Reviewable.io |
Also, in case anyone is wondering, performance isn't great, but it isn't horrendous either. Encoding a _Encoding_
_Decoding_
Review status: 18 of 37 files reviewed at latest revision, 6 unresolved discussions. Comments from the review on Reviewable.io |
Reviewed 17 of 19 files at r3. GLOCKFILE, line 22 [r3] (raw file): keys/printer_test.go, line 110 [r3] (raw file): roachpb/data.go, line 433 [r1] (raw file): util/encoding/decimal.go, line 28 [r3] (raw file): Comments from the review on Reviewable.io |
4a1fd6e
to
69a8ac8
Compare
Review status: 33 of 37 files reviewed at latest revision, 4 unresolved discussions. GLOCKFILE, line 22 [r3] (raw file): keys/printer_test.go, line 110 [r3] (raw file): roachpb/data.go, line 433 [r1] (raw file): It turns out here that accurately estimating how big the decimal encoding will be is pretty expensive (on the order of 500ns to call util/encoding/decimal.go, line 28 [r3] (raw file): Comments from the review on Reviewable.io |
Reviewed 2 of 4 files at r4. keys/printer_test.go, line 113 [r4] (raw file): roachpb/data.go, line 433 [r1] (raw file): Comments from the review on Reviewable.io |
I haven't looked at this PR (yet) or the decimal encoding, but wanted to On Sunday, January 17, 2016, Tamir Duberstein notifications@github.com
|
69a8ac8
to
12c2661
Compare
Review status: 34 of 37 files reviewed at latest revision, 2 unresolved discussions. keys/printer_test.go, line 113 [r4] (raw file): roachpb/data.go, line 433 [r1] (raw file): Comments from the review on Reviewable.io |
Regarding the encoding you described (with the decimalNegValPosExp, decimalNegValZeroExp etc tags) - I thought we wanted the encoded string values to sort exactly like the floating point values would, no? (i.e. Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion. Comments from the review on Reviewable.io |
I'm not quite sure I understand the question Radu. The prefix tags will ensure that the encoded bytes do sort in the same way as their corresponding decimal values, as is tested in
Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion. Comments from the review on Reviewable.io |
Ah, I'm a moron, I thought we actually encode the string "decimalNegValPosExp". Found the const definitions, makes sense. Don't mind me ^_^ Comments from the review on Reviewable.io |
12c2661
to
89769a9
Compare
I was just able to improve encoding performance by about 30% by employing a copy-on-write scheme for the decimal's internal
Using pprof, it looks like about half the decimal encoding time is being spent in the Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion. Comments from the review on Reviewable.io |
Lots of progress here. Generally looks good. Perhaps I missed it, but are there tests using Review status: 34 of 37 files reviewed at latest revision, 6 unresolved discussions. roachpb/data.go, line 433 [r1] (raw file): sql/parser/builtins.go, line 705 [r6] (raw file): sql/parser/datum.go, line 331 [r6] (raw file): sql/values_test.go, line 150 [r1] (raw file): util/encoding/decimal.go, line 53 [r6] (raw file): The downside of using the same encoding as floats is that Comments from the review on Reviewable.io |
Postgres does support implicit conversion from decimal/numeric to double (float8) for the trig functions. But the return value will be double (float8). The doc is correct as far as it goes - it doesn't make any statement about what won't be accepted. Integer values would also be accepted, again due to implicit conversion. |
Postgres also has a second sqrt function that accepts a numeric/decimal and returns a numeric/decimal. Actually, there are a bunch of math functions in PG that have two separate internal functions, one for float8 and one for numeric. The doc marks them as "dp or numeric": Also note that PG has a div(numeric, numeric) function to return the integer quotient (as a numeric). |
3f49032
to
664845e
Compare
Review status: 32 of 37 files reviewed at latest revision, 6 unresolved discussions. roachpb/data.go, line 433 [r1] (raw file): Using a similar strategy as sql/values_test.go, line 150 [r1] (raw file): Comments from the review on Reviewable.io |
I think there are some math functions like Review status: 31 of 37 files reviewed at latest revision, 4 unresolved discussions. Comments from the review on Reviewable.io |
664845e
to
8282bbd
Compare
Review status: 35 of 37 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. sql/parser/builtins.go, line 705 [r6] (raw file): sql/parser/datum.go, line 331 [r6] (raw file): util/encoding/decimal.go, line 53 [r6] (raw file):
In reference to pretty printing float or decimal values if the tags are merged, it seems to me the easiest solution would be to decode all values as decimals and pretty print from there. Now that the encodings are shared this shouldn't be a problem (except for infinities and NaNs, which decimal does not yet support). util/encoding/decimal.go, line 29 [r7] (raw file): util/encoding/decimal.go, line 43 [r7] (raw file): util/encoding/decimal.go, line 90 [r7] (raw file): util/encoding/decimal.go, line 276 [r7] (raw file): Comments from the review on Reviewable.io |
LGTM Review status: 28 of 38 files reviewed at latest revision, 4 unresolved discussions. keys/printer.go, line 226 [r8] (raw file): util/encoding/decimal.go, line 53 [r6] (raw file): util/encoding/decimal.go, line 276 [r7] (raw file): util/encoding/decimal.go, line 100 [r8] (raw file): Comments from the review on Reviewable.io |
Review status: 28 of 38 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. util/encoding/decimal.go, line 75 [r6] (raw file): util/encoding/decimal.go, line 58 [r8] (raw file): util/encoding/decimal.go, line 67 [r8] (raw file): util/encoding/decimal.go, line 94 [r8] (raw file): util/encoding/decimal.go, line 114 [r8] (raw file): util/encoding/decimal.go, line 130 [r8] (raw file): util/encoding/decimal.go, line 141 [r8] (raw file): Comments from the review on Reviewable.io |
Review status: 28 of 38 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. util/encoding/decimal.go, line 141 [r8] (raw file): Comments from the review on Reviewable.io |
52597e2
to
c660695
Compare
Review status: 23 of 38 files reviewed at latest revision, 11 unresolved discussions. keys/printer.go, line 226 [r8] (raw file): util/encoding/decimal.go, line 53 [r6] (raw file): util/encoding/decimal.go, line 75 [r6] (raw file): util/encoding/decimal.go, line 276 [r7] (raw file): util/encoding/decimal.go, line 58 [r8] (raw file): util/encoding/decimal.go, line 67 [r8] (raw file): util/encoding/decimal.go, line 94 [r8] (raw file): util/encoding/decimal.go, line 100 [r8] (raw file): util/encoding/decimal.go, line 114 [r8] (raw file): util/encoding/decimal.go, line 130 [r8] (raw file): Comments from the review on Reviewable.io |
Review status: 23 of 38 files reviewed at latest revision, 6 unresolved discussions. util/encoding/decimal.go, line 53 [r6] (raw file): util/encoding/encoding.go, line 678 [r9] (raw file): Comments from the review on Reviewable.io |
Reviewed 8 of 10 files at r8, 14 of 14 files at r9. Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 5 unresolved discussions. sql/parser/datum.go, line 331 [r6] (raw file): We could append an This key obtained by such methods is no longer made out of parsable values, but that's OK (it only needs to be parsed by the pretty printer, which knows to not freak out). util/encoding/decimal.go, line 114 [r8] (raw file): Comments from the review on Reviewable.io |
c660695
to
8347ca9
Compare
Review status: all files reviewed at latest revision, 5 unresolved discussions. sql/parser/datum.go, line 331 [r6] (raw file): util/encoding/decimal.go, line 53 [r6] (raw file): util/encoding/decimal.go, line 114 [r8] (raw file): util/encoding/encoding.go, line 678 [r9] (raw file): Comments from the review on Reviewable.io |
LGTM Review status: 30 of 38 files reviewed at latest revision, 6 unresolved discussions. util/encoding/decimal.go, line 87 [r10] (raw file): util/encoding/decimal.go, line 105 [r10] (raw file): Comments from the review on Reviewable.io |
8347ca9
to
1ec44bf
Compare
Currently blocked on #3992 to add Review status: 28 of 38 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. util/encoding/decimal.go, line 87 [r10] (raw file): util/encoding/decimal.go, line 105 [r10] (raw file): Comments from the review on Reviewable.io |
1ec44bf
to
9f0c449
Compare
Review status: 25 of 38 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. util/encoding/decimal.go, line 105 [r10] (raw file): Comments from the review on Reviewable.io |
Review status: 25 of 38 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. util/encoding/decimal.go, line 105 [r10] (raw file): Comments from the review on Reviewable.io |
Reviewed 8 of 8 files at r10, 7 of 7 files at r11. util/encoding/decimal.go, line 87 [r11] (raw file): Comments from the review on Reviewable.io |
The encoding is based on sqlite4's key encoding: http://sqlite.org/src4/doc/trunk/www/key_encoding.wiki
9f0c449
to
702b845
Compare
Introduce the Decimal type as an arbitrary-precision fixed-point numeric value
See #1618.
This is still very much a work in progress.
roachpb.Value
encoding. I currently encode the DDecimal datum as its string representation, but this doesn't allow for the proper sorting and definitely isn't the correct way to handle it. I'd love to hear input from people more familiar with this layer (@tschottdorf or @petermattis?).datum.Next
anddatum.Prev
) for a DDecimal, given that decimals have "arbitrary" (in practice 2^31 digits to the right of the decimal) precision.Variance
andStddev
aggregation functions. This requires adecimal.Decimal.Sqrt
method/function, either here or ideally in the shopspring/decimal library.NaN
semantics to the shopspring/decimal library to support a few edge cases with operators and builtin functions. If that's not possible, I may also be able to add the concept to the DDecimal struct.It's worth noting that this PR doesn't make decimals particularly useful yet, because we still default NumVals to be parsed as DFloats, so we lose any precision gain we would have gotten from DDecimal. Postgres on the other hand defaults all decimal constants or integers that would overflow an int64 as Numerics (decimals), but is intelligent enough with implicit type conversions to make this seamless. Currently if we attempt to change this default in CockroachDB, it wrecks havoc on Sqllogictest. This will need to be reassessed later, and wont be possible until we improve float -> decimal and decimal -> float implicit type coercion.