[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537

yaooqinn · 2019-11-15T03:50:48Z

What changes were proposed in this pull request?

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

the real type is 4 bytes, variable-precision, inexact, 6 decimal digits precision, same as our float, part of the SQL standard.

Why are the changes needed?

improve sql standard support
other dbs
https://www.postgresql.org/docs/9.3/datatype-numeric.html
https://prestodb.io/docs/current/language/types.html#floating-point
http://www.sqlservertutorial.net/sql-server-basics/sql-server-data-types/
MySQL treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled.
In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.

Does this PR introduce any user-facing change?

the type real and numeric become valid , which are aliases for float and decmial

How was this patch tested?

add ut

… as decimal

yaooqinn · 2019-11-15T03:52:03Z

cc @cloud-fan @maropu @HyukjinKwon, thanks for reviewing in advance.

maropu · 2019-11-15T03:56:34Z

I'm not sure that this is useful for users. Have you checked the previous discussion? #21766

yaooqinn · 2019-11-15T05:13:30Z

that pr seems to be raised before this ticket https://issues.apache.org/jira/browse/SPARK-27764, now we are enhancing ansi support and PostgresSQL feature parity, maybe have enough reason to support these types.

SparkQA · 2019-11-15T05:26:43Z

Test build #113835 has finished for PR 26537 at commit b9a25e8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-11-15T07:21:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

@@ -2154,17 +2154,17 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
      case ("smallint" | "short", Nil) => ShortType
      case ("int" | "integer", Nil) => IntegerType
      case ("bigint" | "long", Nil) => LongType
-      case ("float", Nil) => FloatType
+      case ("float" | "real", Nil) => FloatType


I take a look at SQL standard, float and real are different

FLOAT specifies the data type approximate numeric, with binary precision equal to or greater than the value of the specified <precision>. The maximum value of <precision> is implementation-defined. <precision> shall not be greater than this value. REAL specifies the data type approximate numeric, with implementation-defined precision.

We can say that now Spark supports real type, and its precision is the same as java float. But it needs to be proposed explicitly, e.g. send an email to dev list.

It seems numeric and decimal are not exactly the same...

I want to gather more information about these before sending an email to discuss.

postgresql
https://www.postgresql.org/docs/9.3/datatype-numeric.html

Name Storage Size Description Range

decimal variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point

real 4 bytes variable-precision, inexact 6 decimal digits precision

double precision 8 bytes variable-precision, inexact 15 decimal digits precision

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

mimer_sql_engine
https://download.mimer.com/pub/developer/docs/html_100/Mimer_SQL_Engine_DocSet/Syntax_Rules4.html#wp1228955

Data Type Abbrevi-ation Description Range

DECIMAL(p,s) DEC(p,s) Exact numerical,precision p, scale s. 1 <= p <= 450 <= s <= p

NUMERIC(p,s) N/A Exact numerical, precision p, scale s.(Same as DECIMAL). 1 <= p <= 450 <= s <= p

FLOAT(p) N/A Approximate numerical,mantissa precision p. 1 <= p <= 45Zero or absolute value10-999 to 10+999

REAL N/A Approximate numericalmantissa precision 7. Zero or absolute value10-38 to 10+38Corresponds to a single precision float.

FLOAT N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308Corresponds to a double precision float.

DOUBLE PRECISION N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308

Note: In Mimer SQL the NUMERIC data type is exactly equivalent to DECIMAL.

prestosql
https://prestosql.io/docs/current/language/types.html#floating-point

REAL

A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
DECIMAL
A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.

No documentation found about numeric type

SQL Server

https://docs.microsoft.com/en-us/openspecs/standards_support/MS-STDSUPLP/17a32be7-10b3-4025-bea4-133a66b4c689

Decimal

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. The xs:decimal type represents arbitrary precision
decimal numbers. Transact-SQL does not support variable precision decimals. Minimally conforming
XML processors are required to support decimal numbers with a minimum of totalDigits=18. TransactSQL supports totalDigits=38, but limits the fractional digits to 10. All xs:decimal-instanced values are
represented internally on the server by the SQL type numeric (38, 10).

Values of this type need to comply with the format of the SQL numeric type. This type internally
represents the support of numbers up to a total of 38 digits, with 10 of those digit positions reserved
for fractional precision.

Float

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. Values of this type need to comply with the format of
the SQL real type

In this case it's not very useful to look at other databases. There are many stuff that is "implementation-defined".

Personally I'm fine to treat real as float in Spark, just need an official proposal.

For numeric/decimal, I don't have a strong preference.

The below notes from SQL standard might have some information to help us whether or not to involve real or numeric

An SQL-implementation is permitted to regard certain s as equivalent, if they have the
same precision, scale, and radix, as permitted by the Syntax Rules of Subclause 6.1, “”. When two
or more s are equivalent, the SQL-implementation chooses one of these equivalent s as the normal form representing that equivalence class of s. The normal
form determines the name of the exact numeric type in the numeric type descriptor.
Similarly, an SQL-implementation is permitted to regard certain s as equivalent,
as permitted by the Syntax Rules of Subclause 6.1, “”, in which case the SQL-implementation
chooses a normal form to represent each equivalence class of and the normal
form determines the name of the approximate numeric type.

For the exact numeric types DECIMAL and NUMERIC:
a) The maximum value of precision is implementation-defined. precision shall not be greater than this value.
b) The maximum value of scale is implementation-defined. scale shall not be greater than this maximum value.

NUMERIC specifies the data type exact numeric, with the decimal precision and scale specified by the precision and scale.

DECIMAL specifies the data type exact numeric, with the decimal scale specified by the scale and the implementation-defined decimal precision equal to or greater than the value of the specified precision.

I don't know why 26th - NUMERIC and 27th - DECIMAL have different definitions, but IIUC with restraint of the 25th, they are even, both with implementation-defined precision and scale, the user-specified ones can not be greater than these.

So seems the key difference between DEICMAL and NUMERIC are:

implementation-defined decimal precision equal to or greater than the value of the specified precision.

Decimal can have at least specified precision (which means can have more) whereas numeric should have exactly specified precision.

Spark's decimal satisfy both so I think NUMERIC as a synonym of DECIMAL makes sense.

yaooqinn · 2019-11-16T14:09:52Z

retest this please

yaooqinn · 2019-11-16T14:34:50Z

CHAR is equivalent to CHARACTER. DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.
VARCHAR is equivalent to CHARACTER VARYING. ...

According to SQL standard, do we need DEC which is short for DECIMAL

SparkQA · 2019-11-16T17:52:15Z

Test build #113932 has finished for PR 26537 at commit b9a25e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-11-18T08:05:26Z

DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.

These two we can implement

yaooqinn · 2019-11-18T10:03:23Z

DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.

These two we can implement

Int/Integer is already supported, I'll make a separate pr to support dec and leave the current one for further discussion.

SparkQA · 2019-11-25T07:24:12Z

Test build #114371 has finished for PR 26537 at commit e3857e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-25T08:05:02Z

Test build #114375 has finished for PR 26537 at commit 14265f0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

Likewise, for FLOAT vs REAL, seems real has less restriction comparing to the float. @yaooqinn can you send an email to dev list to ask? I think it looks fair enough.

yaooqinn · 2019-12-05T11:16:32Z

thanks, @HyukjinKwon for your suggestion. I have drafted a proposal here.

http://apache-spark-developers-list.1001551.n3.nabble.com/PROPOSAL-Support-ANSI-type-real-numeric-as-synonyms-for-float-decimal-td28475.html

yaooqinn · 2019-12-09T15:57:10Z

https://issues.apache.org/jira/browse/HIVE-16764 hive supports numeric since 3.0

SparkQA · 2019-12-10T06:10:42Z

Test build #115063 has finished for PR 26537 at commit a1f21fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2019-12-10T06:16:57Z

The dev mailing list seems to remain silent about this, I just found that hive also has involved this feature since its 3.0.0 release. How about we make this possible? cc: @maropu @cloud-fan @HyukjinKwon

cloud-fan · 2019-12-10T12:42:07Z

sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql

@@ -59,3 +59,8 @@ TBLPROPERTIES ('a' = '1');

 SHOW CREATE TABLE tbl;
 DROP TABLE tbl;
+
+-- new real/numeric type


What we need to prove is the alias works. I think this test is good enough (no need to test it in cast), but we should update the comment. It's not new types, but just alias.

cloud-fan · 2019-12-10T12:43:44Z

According to the SQL standard, I think we should have named our float type as REAL, and decimal type as NUMERIC. It's too late to change it now, I'm fine to treat REAL and NUMERIC as aliases.

SparkQA · 2019-12-10T17:00:56Z

Test build #115104 has finished for PR 26537 at commit d5d1f8e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-12-10T17:57:26Z

thanks, merging to master!

gengliangwang · 2019-12-16T22:15:46Z

Does this PR introduce any user-facing change?
no

This PR does introduce user-facing changes. We should mention it in the PR description.

maropu · 2019-12-17T00:00:19Z

Is this a behaivour change issue? I think just adding a type alias does not change the existing behaviours?

gengliangwang · 2019-12-17T00:07:44Z

Technically, the new alias is user-facing change. The behavior change is that the type real and numeric become valid for end-users now.
Maybe I am overthinking. It is just a suggestion here.

maropu · 2019-12-17T00:27:25Z

Ah, ok. thanks.

HyukjinKwon · 2019-12-17T00:55:03Z

Yeah, it's "any user-facing change" not a "behaivour change" :-) ..

[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric…

b9a25e8

… as decimal

dongjoon-hyun added the SQL label Nov 15, 2019

cloud-fan reviewed Nov 15, 2019

View reviewed changes

yaooqinn mentioned this pull request Nov 18, 2019

[SPARK-29941][SQL] Add ansi type aliases for char and decimal #26574

Closed

yaooqinn added 4 commits November 25, 2019 10:30

Merge branch 'master' into SPARK-29587

ca25319

regen golden file

e3857e8

Merge branch 'master' into SPARK-29587

c6aaccc

regen golden file

14265f0

HyukjinKwon reviewed Dec 5, 2019

View reviewed changes

yaooqinn added 2 commits December 10, 2019 10:05

Merge branch 'master' into SPARK-29587

f7bd4b9

regen gf

a1f21fc

cloud-fan reviewed Dec 10, 2019

View reviewed changes

rm tests in cast

d5d1f8e

cloud-fan closed this in 8f0eb7d Dec 10, 2019

Name	Storage Size	Description	Range
decimal	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric	variable	user-specified precision, exact	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real	4 bytes	variable-precision, inexact	6 decimal digits precision
double precision	8 bytes	variable-precision, inexact	15 decimal digits precision

Data Type	Abbrevi-ation	Description	Range
DECIMAL(p,s)	DEC(p,s)	Exact numerical,precision p, scale s.	1 <= p <= 450 <= s <= p
NUMERIC(p,s)	N/A	Exact numerical, precision p, scale s.(Same as DECIMAL).	1 <= p <= 450 <= s <= p
FLOAT(p)	N/A	Approximate numerical,mantissa precision p.	1 <= p <= 45Zero or absolute value10-999 to 10+999
REAL	N/A	Approximate numericalmantissa precision 7.	Zero or absolute value10-38 to 10+38Corresponds to a single precision float.
FLOAT	N/A	Approximate numericalmantissa precision 16.	Zero or absolute value10-308 to 10+308Corresponds to a double precision float.
DOUBLE PRECISION	N/A	Approximate numericalmantissa precision 16.	Zero or absolute value10-308 to 10+308

[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537

[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537

Uh oh!

Conversation

yaooqinn commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yaooqinn commented Nov 15, 2019

Uh oh!

maropu commented Nov 15, 2019

Uh oh!

yaooqinn commented Nov 15, 2019

Uh oh!

SparkQA commented Nov 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Decimal

Float

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Nov 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Nov 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Nov 16, 2019

Uh oh!

yaooqinn commented Nov 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Nov 16, 2019

Uh oh!

cloud-fan commented Nov 18, 2019

Uh oh!

yaooqinn commented Nov 18, 2019

Uh oh!

SparkQA commented Nov 25, 2019

Uh oh!

SparkQA commented Nov 25, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Dec 5, 2019

Uh oh!

yaooqinn commented Dec 9, 2019

Uh oh!

SparkQA commented Dec 10, 2019

Uh oh!

yaooqinn commented Dec 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 10, 2019

Uh oh!

yaooqinn commented Nov 15, 2019 •

edited

Loading

maropu Nov 15, 2019 •

edited

Loading

yaooqinn Nov 16, 2019 •

edited

Loading

yaooqinn Nov 16, 2019 •

edited

Loading

yaooqinn commented Nov 16, 2019 •

edited

Loading

yaooqinn commented Dec 10, 2019 •

edited

Loading