Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the Decimal type as an arbitrary-precision fixed-point numeric value #3902

Merged
merged 2 commits into from
Jan 28, 2016

Conversation

nvanbenschoten
Copy link
Member

See #1618.

This is still very much a work in progress.

  • One area that I know needs work is the roachpb.Value encoding. I currently encode the DDecimal datum as its string representation, but this doesn't allow for the proper sorting and definitely isn't the correct way to handle it. I'd love to hear input from people more familiar with this layer (@tschottdorf or @petermattis?).
  • Along this line, I'm not sure how we can determine the next or previous logical decimal (datum.Next and datum.Prev) for a DDecimal, given that decimals have "arbitrary" (in practice 2^31 digits to the right of the decimal) precision.
  • I'm currently waiting on this PR so I can clean up some of the Modulus code.
  • I need to add support for the Decimal Variance and Stddev aggregation functions. This requires a decimal.Decimal.Sqrt method/function, either here or ideally in the shopspring/decimal library.
  • I'm going to need to introduce NaN semantics to the shopspring/decimal library to support a few edge cases with operators and builtin functions. If that's not possible, I may also be able to add the concept to the DDecimal struct.
  • I'm not sure how decimals should interact with Infinity and -Infinity. What does it mean for an arbitrary-precision value to be infinite? This may mean adding support for infinity into the shopspring/decimal library, or adding it to the DDecimal struct.

It's worth noting that this PR doesn't make decimals particularly useful yet, because we still default NumVals to be parsed as DFloats, so we lose any precision gain we would have gotten from DDecimal. Postgres on the other hand defaults all decimal constants or integers that would overflow an int64 as Numerics (decimals), but is intelligent enough with implicit type conversions to make this seamless. Currently if we attempt to change this default in CockroachDB, it wrecks havoc on Sqllogictest. This will need to be reassessed later, and wont be possible until we improve float -> decimal and decimal -> float implicit type coercion.

Review on Reviewable

@RaduBerinde
Copy link
Member

Here is an idea for how to encode in strings while preserving sort order.

We write any number as 0.xyz... * 10^exp, where x != 0 and also the last decimal is not 0. It holds that if two numbers have different exponents, the one with the smaller exponent will always be smaller. If they have the same exponent, the one with the smallest decimals in lexicographic order is smaller.

The first part of the string is an encoding of the exponent. The first part of the exponent encoding is the length of the exponent using repeated characters ("base 1").

Say we use a range of X+1 characters c[0] through c[X]. X could be 10 or e.g. 64 or 254.
If exp < X, the encoding is simply c[exp]
If X <= exp < X^2, the encoding is c[X] c[exp/X] c[exp%X]
If X^2 <= exp < X^3, the encoding is c[X] c[X] c[exp/X/X] c[exp/X%X] c[exp%X]
and so on. In other words, we encode exp in base X and if the encoding has n digits, we preamble with (n-1) c[X]s. If X is 10 and c[10]='A', here are some sample exponent encodings:
1 -> 1
9 -> 9
15 -> A15
99 -> A99
345 -> AA345
1134 -> AAA1134

Then we can follow with an encoding of the decimals xyz..

I haven't included anything about negative numbers. To support that we can preamble everything with a character for the sign (e.g. 0 negative 1 for positive), and then for negative numbers we reverse the rest of the encoding, i.e. we "flip" the characters c[0] through c[X].
If we need to encode -Inf and +Inf, we can use 4 characters instead of 2 for the "sign", e.g. -Inf is '0', negative numbers start with '1', positive numbers start with '2', +Inf is '3').

For datum.Next and datum.Prev, note that we have HasPrev() that tells us if we can use Prev. Maybe we can add a similar HasNext() and avoid using Next. Conceptually there is definitely no next or previous value..

There is a Next value that we can generate which is not a valid encoding, but it is lexicographically smaller than all encodings of larger numbers - and is the same as for strings - we append a 0. If that is actually what Next needs, then we can provide that (and we can provide a similar Prev as well if there is a large 'Z' character we don't use in encodings).


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 16, 2016

Reviewed 26 of 30 files at r1.
Review status: 26 of 30 files reviewed at latest revision, 5 unresolved discussions.


roachpb/data.go, line 433 [r1] (raw file):
what's the point of slicing v.RawBytes like this?


sql/driver/wire.go, line 99 [r1] (raw file):
lib/pq?


sql/parser/scan_test.go, line 395 [r1] (raw file):
should this test case have moved elsewhere in this file?


sql/parser/type_check_test.go, line 90 [r1] (raw file):
should this move to the passing part of these tests?


sql/pgwire/types.go, line 359 [r1] (raw file):
s/decimal/numeric/


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: 22 of 30 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


sql/driver/wire.go, line 99 [r1] (raw file):
Ah yeah, I get that wrong every other time. Done.


sql/parser/scan_test.go, line 395 [r1] (raw file):
Done.


sql/parser/type_check_test.go, line 90 [r1] (raw file):
Done.


sql/pgwire/types.go, line 359 [r1] (raw file):
Done.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 16, 2016

Reviewed 4 of 30 files at r1, 4 of 4 files at r2.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


sql/parser/builtins.go, line 705 [r2] (raw file):
nit: i think floatOrDecimal is more obvious


sql/parser/builtins.go, line 1006 [r2] (raw file):
s/powImpl/powImpls/


sql/parser/builtins.go, line 1073 [r2] (raw file):
nil, not dnull


sql/parser/eval.go, line 419 [r2] (raw file):
I think this is OK for now, but we're going to need a more general type coercion mechanism soon.


sql/parser/eval.go, line 897 [r2] (raw file):
throughout: nil, not DNull


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


sql/parser/builtins.go, line 705 [r2] (raw file):
Agreed.


sql/parser/builtins.go, line 1006 [r2] (raw file):
Done.


sql/parser/builtins.go, line 1073 [r2] (raw file):
Done.


sql/parser/eval.go, line 419 [r2] (raw file):
I completely agree 👍


sql/parser/eval.go, line 897 [r2] (raw file):
Changed wherever an error was being returned in CastExpr.Eval.


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Thanks @RaduBerinde, I just created a decimal encoding scheme based roughly on this idea. Using the same assumption you pointed out above with all numbers defined as 0.xyz... * 10^exp, where x != 0 and also the last decimal is not 0. The scheme starts the encoding with one of 7 different starting tags (2 more for infinity and 2 more for NaN, once proper support is added).

decimalNegValPosExp
decimalNegValZeroExp (no exp encoded)
decimalNegValNegExp 
decimalZero          (no exp or val encoded)
decimalPosValNegExp
decimalPosValZeroExp (no exp encoded)
decimalPosValPosExp 

After the tag, the absolute value of the exponent (if it's not zero) is encoded using an unsigned Varint. Next, the digit string is encoded in a big-endian base-16 byte string. Finally, a null terminator is appended to the end.

In total, the scheme will encode a key that looks like <tag><uvarint exponent><big-endian base-16 digit byte string>.

One thing to note with the current commit is that I needed direct access to decimal.Decimal's backing big.Int, so I'm temporarily pointing the decimal dependency to a cockroachdb fork. Hopefully this change will make it upstream so that we can use the official library again.


Review status: 18 of 37 files reviewed at latest revision, 6 unresolved discussions.


roachpb/data.go, line 433 [r1] (raw file):
There was none, I just hadn't looked too deeply at this yet. PTAL


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Also, in case anyone is wondering, performance isn't great, but it isn't horrendous either. Encoding a decimal is currently about twice as slow as encoding a float, but decoding speed is roughly the same.

_Encoding_

BenchmarkEncodeDecimal-4             2000000           649 ns/op
BenchmarkEncodeUint32-4             100000000           14.1 ns/op
BenchmarkEncodeUint64-4             100000000           15.9 ns/op
BenchmarkEncodeVarint-4             50000000            23.8 ns/op
BenchmarkEncodeUvarint-4            100000000           21.2 ns/op
BenchmarkEncodeBytes-4              30000000            52.2 ns/op
BenchmarkEncodeBytesDecreasing-4    20000000            75.9 ns/op
BenchmarkEncodeString-4             30000000            55.5 ns/op
BenchmarkEncodeStringDescending-4   20000000            81.1 ns/op
BenchmarkEncodeFloat-4               5000000           289 ns/op

_Decoding_

BenchmarkDecodeDecimal-4             5000000           285 ns/op
BenchmarkDecodeUint32-4             100000000           17.4 ns/op
BenchmarkDecodeUint64-4             100000000           23.7 ns/op
BenchmarkDecodeVarint-4             50000000            29.9 ns/op
BenchmarkDecodeUvarint-4            50000000            26.6 ns/op
BenchmarkDecodeBytes-4              20000000           107 ns/op
BenchmarkDecodeBytesDecreasing-4    20000000           125 ns/op
BenchmarkDecodeString-4             10000000           204 ns/op
BenchmarkDecodeStringDecreasing-4   10000000           233 ns/op
BenchmarkDecodeFloat-4               5000000           247 ns/op

Review status: 18 of 37 files reviewed at latest revision, 6 unresolved discussions.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 17, 2016

Reviewed 17 of 19 files at r3.
Review status: 35 of 37 files reviewed at latest revision, 4 unresolved discussions.


GLOCKFILE, line 22 [r3] (raw file):
does this work?


keys/printer_test.go, line 110 [r3] (raw file):
no test for the new decimal printer you added?


roachpb/data.go, line 433 [r1] (raw file):
ah, this isn't the same though. I think here you want to v.SetBytes(encoding.EncodeDecimalAschending(make([]byte, headerSize, headerSize + <estimated length of encoding> + <length of checksum>), d)). The current version likely allocates more than it must.


util/encoding/decimal.go, line 28 [r3] (raw file):
typo


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: 33 of 37 files reviewed at latest revision, 4 unresolved discussions.


GLOCKFILE, line 22 [r3] (raw file):
It seems to. I wouldn't rely on it permanently though.


keys/printer_test.go, line 110 [r3] (raw file):
Added.


roachpb/data.go, line 433 [r1] (raw file):
Oh ok, I understand what you were originally asking. I had just copied that from above, but you're right, the len was already headerSize so there was no need to slice it.

It turns out here that accurately estimating how big the decimal encoding will be is pretty expensive (on the order of 500ns to call decimal.String). It seems to me the best option is to quickly overestimate and chop off the extra, allowing the extra allocation in extreme situations. I'm currently estimating that the decimal encoding will be no more than 15 bytes (20 bytes with the header), which if you look at decimal_test.go, seems like a reasonable overestimate. I believe this same reasoning was used for SetTime as well.


util/encoding/decimal.go, line 28 [r3] (raw file):
Fixed.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 17, 2016

Reviewed 2 of 4 files at r4.
Review status: 35 of 37 files reviewed at latest revision, 2 unresolved discussions.


keys/printer_test.go, line 113 [r4] (raw file):
this doesn't look right; whether it's encoded ascendingly or descendingly, the key should print the same.


roachpb/data.go, line 433 [r1] (raw file):
SGTM. Some comments or constants might be helpful in documenting the magic numbers 15 and 11 here and above.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

I haven't looked at this PR (yet) or the decimal encoding, but wanted to
point out that the float encoding could work for decimals. You would need
different encoding and decoding routines for decimals, but the encoded
values could be identical.

On Sunday, January 17, 2016, Tamir Duberstein notifications@github.com
wrote:

Reviewed 2 of 4 files at r4.
Review status: 35 of 37 files reviewed at latest revision, 2 unresolved

discussions.

keys/printer_test.go, line 113 [r4]
https://reviewable.io:443/reviews/cockroachdb/cockroach/3902#-K8EZB6g3N8OSpd5-gxl:-K8EZB6g3N8OSpd5-gxm:-1419220177
(raw file

"/Table/42/-12.34"},
):
this doesn't look right; whether it's encoded ascendingly or descendingly,

the key should print the same.

roachpb/data.go, line 433 [r1]
https://reviewable.io:443/reviews/cockroachdb/cockroach/3902#-K88AnQhVUpzEXjIWHzP:-K8EZYm0fnLdptwcTvWI:786018897
(raw file

v.RawBytes = encoding.EncodeDecimalAscending(v.RawBytes[:headerSize], d)
):
SGTM. Some comments or constants might be helpful in documenting the magic

numbers 15 and 11 here and above.

Comments from the review on Reviewable.io
https://reviewable.io:443/reviews/cockroachdb/cockroach/3902


Reply to this email directly or view it on GitHub
#3902 (comment)
.

@nvanbenschoten
Copy link
Member Author

Review status: 34 of 37 files reviewed at latest revision, 2 unresolved discussions.


keys/printer_test.go, line 113 [r4] (raw file):
You're right about that, but if you look at lines 88 or 76, you'll see that this test has no notion of encoding direction. It might be something to take a look at in the future.


roachpb/data.go, line 433 [r1] (raw file):
Done.


Comments from the review on Reviewable.io

@RaduBerinde
Copy link
Member

Regarding the encoding you described (with the decimalNegValPosExp, decimalNegValZeroExp etc tags) - I thought we wanted the encoded string values to sort exactly like the floating point values would, no? (i.e. encoding(x) < encoding(y) <=> x < y).


Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

I'm not quite sure I understand the question Radu. The prefix tags will ensure that the encoded bytes do sort in the same way as their corresponding decimal values, as is tested in TestEncodeDecimal. Here's an example that shows that.

decimalNegValPosExp:  -100 = -.1*10^3
decimalNegValZeroExp: -.1 = -.1*10^0
decimalNegValNegExp:  -.0001 = -.1*10^-3
decimalZero:          0
decimalPosValNegExp:  .0001 = .1*10^-3
decimalPosValZeroExp: .1 = .1*10^0
decimalPosValPosExp:  100 = .1*10^3

Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@RaduBerinde
Copy link
Member

Ah, I'm a moron, I thought we actually encode the string "decimalNegValPosExp". Found the const definitions, makes sense. Don't mind me ^_^


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

I was just able to improve encoding performance by about 30% by employing a copy-on-write scheme for the decimal's internal big.Int, so that it doesn't need to be copied in the cases where we aren't modifying it (either flipping its sign or dividing off trailing zeros). Here's the new breakdown of encoding performance.

BenchmarkEncodeDecimal-4             3000000           462 ns/op
BenchmarkEncodeUint32-4             100000000           14.1 ns/op
BenchmarkEncodeUint64-4             100000000           16.3 ns/op
BenchmarkEncodeVarint-4             100000000           23.5 ns/op
BenchmarkEncodeUvarint-4            100000000           21.2 ns/op
BenchmarkEncodeBytes-4              30000000            51.7 ns/op
BenchmarkEncodeBytesDecreasing-4    20000000            76.9 ns/op
BenchmarkEncodeString-4             30000000            57.9 ns/op
BenchmarkEncodeStringDescending-4   20000000            80.9 ns/op
BenchmarkEncodeFloat-4               5000000           289 ns/op

Using pprof, it looks like about half the decimal encoding time is being spent in the big.Int.String routine. So far I haven't been able to find a way to avoid this, but I'm actively trying to find a way eliminate the call.


Review status: 34 of 37 files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Lots of progress here. Generally looks good. Perhaps I missed it, but are there tests using DECIMAL column types for indexes?


Review status: 34 of 37 files reviewed at latest revision, 6 unresolved discussions.


roachpb/data.go, line 433 [r1] (raw file):
For storage in a Value there is no requirement to use the encoding package as we do not compare the encoded values. Using EncodeTimeAscending was a convenience. If there is a faster/smaller encoding for Decimal here we should use it.


sql/parser/builtins.go, line 705 [r6] (raw file):
Do the trigonometric functions need to support decimal? Seems there is a guaranteed precision loss in the conversion to and from float. And it looks like postgres only supports floats: All trigonometric functions take arguments and return values of type double precision.


sql/parser/datum.go, line 331 [r6] (raw file):
I'm not sure about the decimal library we're using, but assuming that the type has a concept of precision (places after the decimal point), I think you just have to increase the precision by 1 (i.e. add a trailing 0).


sql/values_test.go, line 150 [r1] (raw file):
s/type/type./g


util/encoding/decimal.go, line 53 [r6] (raw file):
I'll reiterate my earlier comment that I think you can use the same encoding format that floats use. Encoding of floats computes the mantissa and exponent for the float where the mantissa is a variable length base-100 number. See floatMandE. This is accomplished by formatting the float to a string and then converting the base-10 mantissa to base-100. I think it should be straightforward to write a similar decimalMandEfunction.

The downside of using the same encoding as floats is that PeekType will not be able to determine if you have a float or decimal. This would affect pretty printing, but I think that can be worked around by adding an encoding.PrettyPrintValue() function that pretty-prints an encoding value, decoding for most types, but directly pretty-printing encoded decimal and float.


Comments from the review on Reviewable.io

@JackKrupansky
Copy link

Postgres does support implicit conversion from decimal/numeric to double (float8) for the trig functions. But the return value will be double (float8).

The doc is correct as far as it goes - it doesn't make any statement about what won't be accepted. Integer values would also be accepted, again due to implicit conversion.

@JackKrupansky
Copy link

Postgres also has a second sqrt function that accepts a numeric/decimal and returns a numeric/decimal.

Actually, there are a bunch of math functions in PG that have two separate internal functions, one for float8 and one for numeric. The doc marks them as "dp or numeric":
http://www.postgresql.org/docs/9.5/static/functions-math.html

Also note that PG has a div(numeric, numeric) function to return the integer quotient (as a numeric).

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/decimal branch 3 times, most recently from 3f49032 to 664845e Compare January 19, 2016 19:12
@nvanbenschoten
Copy link
Member Author

Review status: 32 of 37 files reviewed at latest revision, 6 unresolved discussions.


roachpb/data.go, line 433 [r1] (raw file):
Based on a quick benchmark, it looks like encoding the decimal as a string using decimal.String, and decoding using decimal.NewFromString is actually about 15% slower than using the encoding package. Neither of these functions are particularly fast, which is why the encoding package avoids using them, and works with the internal big.Int instead.

Using a similar strategy as EncodeTimeAscending, without the need for tags or normalizing exponents to ensure ordering, I was able to reduce this encoding/decoding time by about 75%. This strategy encodes the exponent using a Varint, and follows this with the Gob encoding of the big.Int. PTAL


sql/values_test.go, line 150 [r1] (raw file):
Done.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

I think there are some math functions like sqrt and exp and ln for which it makes sense to have decimal and float versions. But I'm struggling to see a reason for using trig functions with decimal types. The decimal type is primarily used for monetary values. Are the economic/financial equations that use trig functions on money values?


Review status: 31 of 37 files reviewed at latest revision, 4 unresolved discussions.


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: 35 of 37 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


sql/parser/builtins.go, line 705 [r6] (raw file):
Yeah I agree, the trigonometric functions don't need decimal implementations. Removed.


sql/parser/datum.go, line 331 [r6] (raw file):
Precision isn't expressed in the numeric encoding (all trailing zeros are stripped off) as per http://sqlite.org/src4/doc/trunk/www/key_encoding.wiki. That said, @andreimatei and I had discussed how an 0x0 byte could be appended to the end of the encoded value as the decimal encoding that would sort "next". He had mentioned that this would require some adjustment to how the method is used.


util/encoding/decimal.go, line 53 [r6] (raw file):
I've added the http://sqlite.org/src4/doc/trunk/www/key_encoding.wiki version of decimal encoding to the current commit as well, and was able to share a lot of code with float. I've left the other encoding for now so that we can compare, but I do agree that unifying the encoding is desirable and almost certainly the way to proceed, and that unifying the tags is also a good idea. The only reservation I have is that because the mantissa is now base-100 encoded instead of base-256 encoded, decoding performance has gone down significantly now that we need to perform string parsing. The benchmarks are included below.

BenchmarkEncodeDecimal-4     3000000           443 ns/op
BenchmarkDecodeDecimal-4     3000000           495 ns/op
BenchmarkEncodeDecimalOld-4  3000000           456 ns/op
BenchmarkDecodeDecimalOld-4  5000000           270 ns/op

In reference to pretty printing float or decimal values if the tags are merged, it seems to me the easiest solution would be to decode all values as decimals and pretty print from there. Now that the encodings are shared this shouldn't be a problem (except for infinities and NaNs, which decimal does not yet support).


util/encoding/decimal.go, line 29 [r7] (raw file):
Done.


util/encoding/decimal.go, line 43 [r7] (raw file):
Done.


util/encoding/decimal.go, line 90 [r7] (raw file):
I'm not always copying the big.Int, only if it is negative or I need to normalize it when there are zero's at the end of it's value that should be moved to the exponent. I'm currently looking at ways to push the normalization into the decimal library so I can avoid needing to normalize during encoding.


util/encoding/decimal.go, line 276 [r7] (raw file):
Done.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

LGTM


Review status: 28 of 38 files reviewed at latest revision, 4 unresolved discussions.


keys/printer.go, line 226 [r8] (raw file):
Handling infinity and NaN would be easier if we moved decodeKeyPrint to encoding.PrettyPrint. Then you could inspect whether the tag byte indicates an infinity or NaN.


util/encoding/decimal.go, line 53 [r6] (raw file):
There isn't a strong reason for base-100 encoding the mantissa for floats. Doing so is a holdover from the sqlite encoding. It would probably be a performance win to encode the mantissa for floats as base-256 as well.


util/encoding/decimal.go, line 276 [r7] (raw file):
Let's just get rid of this function now. It is only obfuscating at this point.


util/encoding/decimal.go, line 100 [r8] (raw file):
s/combintation/combination/g


Comments from the review on Reviewable.io

@andreimatei
Copy link
Contributor

Review status: 28 of 38 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 75 [r6] (raw file):
s/it's/its


util/encoding/decimal.go, line 58 [r8] (raw file):
is this still needed?


util/encoding/decimal.go, line 67 [r8] (raw file):
line numbers in comments are as good as random :)
If it was just meant as a review helper, you can add it as a Reviewable comment I guess


util/encoding/decimal.go, line 94 [r8] (raw file):
this function deserves a comment


util/encoding/decimal.go, line 114 [r8] (raw file):
is this correct? I don't really understand how b is used in this function; it's not an empty buffer, is it?


util/encoding/decimal.go, line 130 [r8] (raw file):
explain why?


util/encoding/decimal.go, line 141 [r8] (raw file):
so we represent the digits in base 100 but encode them as 2n+1? I thought the point of base 100 was to make them human readable?


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: 28 of 38 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 141 [r8] (raw file):
Yep. Human readability was not a factor in the choice of base-100, AFAIK. This is legacy stuff from using the sqlite encoding format. We certainly don't have to maintain it as we're not at all compatible with the sqlite encodings anymore.


Comments from the review on Reviewable.io

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/decimal branch 2 times, most recently from 52597e2 to c660695 Compare January 25, 2016 21:28
@nvanbenschoten
Copy link
Member Author

Review status: 23 of 38 files reviewed at latest revision, 11 unresolved discussions.


keys/printer.go, line 226 [r8] (raw file):
I moved most of the logic here for printing a single key to encoding.PrettyPrintValue, as you suggested before. PTAL


util/encoding/decimal.go, line 53 [r6] (raw file):
It sounds to me like keeping the encoding the same for both float and decimal, while moving both to a base-256 encoded mantissa is the way to go. It should probably be in a separate PR however, so I think I'll try to push this one through with the sqllite encoding and come back to this after.


util/encoding/decimal.go, line 75 [r6] (raw file):
Done.


util/encoding/decimal.go, line 276 [r7] (raw file):
Done.


util/encoding/decimal.go, line 58 [r8] (raw file):
No, I just hadn't taken it out yet.


util/encoding/decimal.go, line 67 [r8] (raw file):
Good catch 👍


util/encoding/decimal.go, line 94 [r8] (raw file):
Done.


util/encoding/decimal.go, line 100 [r8] (raw file):
Done.


util/encoding/decimal.go, line 114 [r8] (raw file):
b is used as a scratch space in this function if it has adequate capacity, before it is populated in encodeMandE.


util/encoding/decimal.go, line 130 [r8] (raw file):
This is needed to support the base-10 to base-100 slice conversion.


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: 23 of 38 files reviewed at latest revision, 6 unresolved discussions.


util/encoding/decimal.go, line 53 [r6] (raw file):
I'd prefer the switch to base-256 mantissa happen in a separate PR. Maybe just add a TODO somewhere in this file.


util/encoding/encoding.go, line 678 [r9] (raw file):
Any reason for this to only pretty-print a single value vs consuming/pretty-printing all of the values in b? Yes, you'd have to define the separator here, but I don't think that is a problem.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 25, 2016

Reviewed 8 of 10 files at r8, 14 of 14 files at r9.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


Comments from the review on Reviewable.io

@andreimatei
Copy link
Contributor

Review status: all files reviewed at latest revision, 5 unresolved discussions.


sql/parser/datum.go, line 331 [r6] (raw file):
We can't have Next() panic; it's used by the expression analyzer, right?

We could append an 0x0 byte to produce a key that sorts next. We could also use Key.PrefixEnd() to produce such a key, by taking advantage of the fact that we end the decimals with an otherwise unused character - 0x0 - so a decimal encoding cannot be the prefix of another. But these wouldn't be valid Decimal encodings, so it's not something this function can do.
I think the suggestion was to create a HasNext(), similar to HasPrev(), and maybe an NextKey() which returns Key. Look for where Next() is called to see what works best.

This key obtained by such methods is no longer made out of parsable values, but that's OK (it only needs to be parsed by the pretty printer, which knows to not freak out).


util/encoding/decimal.go, line 114 [r8] (raw file):
so how about you make this function take an empty buffer (the caller would do b[len(b) : cap(b)] and call it scratch?


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: all files reviewed at latest revision, 5 unresolved discussions.


sql/parser/datum.go, line 331 [r6] (raw file):
I added a HasNext method to Datum in #3965, which should make this work correctly now.


util/encoding/decimal.go, line 53 [r6] (raw file):
Done.


util/encoding/decimal.go, line 114 [r8] (raw file):
Done, calling it tmp for consistency.


util/encoding/encoding.go, line 678 [r9] (raw file):
I thought that would be the correct separation of concerns, but I'm fine changing it. Done.


Comments from the review on Reviewable.io

@andreimatei
Copy link
Contributor

LGTM


Review status: 30 of 38 files reviewed at latest revision, 6 unresolved discussions.


util/encoding/decimal.go, line 87 [r10] (raw file):
nit: make tmp the last argument and add a comment about it


util/encoding/decimal.go, line 105 [r10] (raw file):
does this line do anything now that tmp is empty?


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Currently blocked on #3992 to add DECIMAL index unit tests. After that, this branch should be clear to merge.


Review status: 28 of 38 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 87 [r10] (raw file):
Done.


util/encoding/decimal.go, line 105 [r10] (raw file):
tmp's length will be empty, but it may have capacity that can be sliced.


Comments from the review on Reviewable.io

@andreimatei
Copy link
Contributor

Review status: 25 of 38 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 105 [r10] (raw file):
right, but what's the point of slicing it?


Comments from the review on Reviewable.io

@nvanbenschoten
Copy link
Member Author

Review status: 25 of 38 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 105 [r10] (raw file):
Slicing an empty slice that has adequate capacity to support the slice will result in the length growing.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Jan 27, 2016

Reviewed 8 of 8 files at r10, 7 of 7 files at r11.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


util/encoding/decimal.go, line 87 [r11] (raw file):
something something share code with float. perhaps a TODO?


Comments from the review on Reviewable.io

nvanbenschoten added a commit that referenced this pull request Jan 28, 2016
Introduce the Decimal type as an arbitrary-precision fixed-point numeric value
@nvanbenschoten nvanbenschoten merged commit b5d9bb9 into master Jan 28, 2016
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/decimal branch January 28, 2016 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants