Skip to content

[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 15, 2019

What changes were proposed in this pull request?

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

the real type is 4 bytes, variable-precision, inexact, 6 decimal digits precision, same as our float, part of the SQL standard.

Why are the changes needed?

improve sql standard support
other dbs
https://www.postgresql.org/docs/9.3/datatype-numeric.html
https://prestodb.io/docs/current/language/types.html#floating-point
http://www.sqlservertutorial.net/sql-server-basics/sql-server-data-types/
MySQL treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled.
In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.

Does this PR introduce any user-facing change?

the type real and numeric become valid , which are aliases for float and decmial

How was this patch tested?

add ut

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @HyukjinKwon, thanks for reviewing in advance.

@maropu
Copy link
Member

maropu commented Nov 15, 2019

I'm not sure that this is useful for users. Have you checked the previous discussion? #21766

@yaooqinn
Copy link
Member Author

that pr seems to be raised before this ticket https://issues.apache.org/jira/browse/SPARK-27764, now we are enhancing ansi support and PostgresSQL feature parity, maybe have enough reason to support these types.

@SparkQA
Copy link

SparkQA commented Nov 15, 2019

Test build #113835 has finished for PR 26537 at commit b9a25e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2154,17 +2154,17 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case ("smallint" | "short", Nil) => ShortType
case ("int" | "integer", Nil) => IntegerType
case ("bigint" | "long", Nil) => LongType
case ("float", Nil) => FloatType
case ("float" | "real", Nil) => FloatType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a look at SQL standard, float and real are different

FLOAT specifies the data type approximate numeric, with binary precision equal to or greater than the
value of the specified <precision>. The maximum value of <precision> is implementation-defined.
<precision> shall not be greater than this value.

REAL specifies the data type approximate numeric, with implementation-defined precision.

We can say that now Spark supports real type, and its precision is the same as java float. But it needs to be proposed explicitly, e.g. send an email to dev list.

Copy link
Member

@maropu maropu Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems numeric and decimal are not exactly the same...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to gather more information about these before sending an email to discuss.

  1. postgresql
    https://www.postgresql.org/docs/9.3/datatype-numeric.html
Name Storage Size Description Range
decimal variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real 4 bytes variable-precision, inexact 6 decimal digits precision
double precision 8 bytes variable-precision, inexact 15 decimal digits precision

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. mimer_sql_engine
    https://download.mimer.com/pub/developer/docs/html_100/Mimer_SQL_Engine_DocSet/Syntax_Rules4.html#wp1228955
Data Type Abbrevi-ation Description Range
DECIMAL(p,s) DEC(p,s) Exact numerical,precision p, scale s. 1 <= p <= 450 <= s <= p
NUMERIC(p,s) N/A Exact numerical, precision p, scale s.(Same as DECIMAL). 1 <= p <= 450 <= s <= p
FLOAT(p) N/A Approximate numerical,mantissa precision p. 1 <= p <= 45Zero or absolute value10-999 to 10+999
REAL N/A Approximate numericalmantissa precision 7. Zero or absolute value10-38 to 10+38Corresponds to a single precision float.
FLOAT N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308Corresponds to a double precision float.
DOUBLE PRECISION N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308

Note: In Mimer SQL the NUMERIC data type is exactly equivalent to DECIMAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. prestosql
    https://prestosql.io/docs/current/language/types.html#floating-point

REAL

A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
DECIMAL
A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.

No documentation found about numeric type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. SQL Server

https://docs.microsoft.com/en-us/openspecs/standards_support/MS-STDSUPLP/17a32be7-10b3-4025-bea4-133a66b4c689

Decimal

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. The xs:decimal type represents arbitrary precision
decimal numbers. Transact-SQL does not support variable precision decimals. Minimally conforming
XML processors are required to support decimal numbers with a minimum of totalDigits=18. TransactSQL supports totalDigits=38, but limits the fractional digits to 10. All xs:decimal-instanced values are
represented internally on the server by the SQL type numeric (38, 10).

Values of this type need to comply with the format of the SQL numeric type. This type internally
represents the support of numbers up to a total of 38 digits, with 10 of those digit positions reserved
for fractional precision.

Float

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. Values of this type need to comply with the format of
the SQL real type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's not very useful to look at other databases. There are many stuff that is "implementation-defined".

Personally I'm fine to treat real as float in Spark, just need an official proposal.

For numeric/decimal, I don't have a strong preference.

Copy link
Member Author

@yaooqinn yaooqinn Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below notes from SQL standard might have some information to help us whether or not to involve real or numeric

An SQL-implementation is permitted to regard certain s as equivalent, if they have the
same precision, scale, and radix, as permitted by the Syntax Rules of Subclause 6.1, “”. When two
or more s are equivalent, the SQL-implementation chooses one of these equivalent s as the normal form representing that equivalence class of s. The normal
form determines the name of the exact numeric type in the numeric type descriptor.
Similarly, an SQL-implementation is permitted to regard certain s as equivalent,
as permitted by the Syntax Rules of Subclause 6.1, “”, in which case the SQL-implementation
chooses a normal form to represent each equivalence class of and the normal
form determines the name of the approximate numeric type.

Copy link
Member Author

@yaooqinn yaooqinn Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For the exact numeric types DECIMAL and NUMERIC:
    a) The maximum value of precision is implementation-defined. precision shall not be greater than this value.
    b) The maximum value of scale is implementation-defined. scale shall not be greater than this maximum value.
  2. NUMERIC specifies the data type exact numeric, with the decimal precision and scale specified by the precision and scale.
  3. DECIMAL specifies the data type exact numeric, with the decimal scale specified by the scale and the implementation-defined decimal precision equal to or greater than the value of the specified precision.

I don't know why 26th - NUMERIC and 27th - DECIMAL have different definitions, but IIUC with restraint of the 25th, they are even, both with implementation-defined precision and scale, the user-specified ones can not be greater than these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So seems the key difference between DEICMAL and NUMERIC are:

implementation-defined decimal precision equal to or greater than the value of the specified precision.

Decimal can have at least specified precision (which means can have more) whereas numeric should have exactly specified precision.

Spark's decimal satisfy both so I think NUMERIC as a synonym of DECIMAL makes sense.

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 16, 2019

CHAR is equivalent to CHARACTER. DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.
VARCHAR is equivalent to CHARACTER VARYING. ...

According to SQL standard, do we need DEC which is short for DECIMAL

@SparkQA
Copy link

SparkQA commented Nov 16, 2019

Test build #113932 has finished for PR 26537 at commit b9a25e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.

These two we can implement

@yaooqinn
Copy link
Member Author

DEC is equivalent to DECIMAL. INT is equivalent to INTEGER.

These two we can implement

Int/Integer is already supported, I'll make a separate pr to support dec and leave the current one for further discussion.

@SparkQA
Copy link

SparkQA commented Nov 25, 2019

Test build #114371 has finished for PR 26537 at commit e3857e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2019

Test build #114375 has finished for PR 26537 at commit 14265f0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, for FLOAT vs REAL, seems real has less restriction comparing to the float. @yaooqinn can you send an email to dev list to ask? I think it looks fair enough.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 5, 2019

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

https://issues.apache.org/jira/browse/HIVE-16764 hive supports numeric since 3.0

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115063 has finished for PR 26537 at commit a1f21fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 10, 2019

The dev mailing list seems to remain silent about this, I just found that hive also has involved this feature since its 3.0.0 release. How about we make this possible? cc: @maropu @cloud-fan @HyukjinKwon

@@ -59,3 +59,8 @@ TBLPROPERTIES ('a' = '1');

SHOW CREATE TABLE tbl;
DROP TABLE tbl;

-- new real/numeric type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need to prove is the alias works. I think this test is good enough (no need to test it in cast), but we should update the comment. It's not new types, but just alias.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@cloud-fan
Copy link
Contributor

According to the SQL standard, I think we should have named our float type as REAL, and decimal type as NUMERIC. It's too late to change it now, I'm fine to treat REAL and NUMERIC as aliases.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115104 has finished for PR 26537 at commit d5d1f8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8f0eb7d Dec 10, 2019
@gengliangwang
Copy link
Member

Does this PR introduce any user-facing change?
no

This PR does introduce user-facing changes. We should mention it in the PR description.

@maropu
Copy link
Member

maropu commented Dec 17, 2019

Is this a behaivour change issue? I think just adding a type alias does not change the existing behaviours?

@gengliangwang
Copy link
Member

Technically, the new alias is user-facing change. The behavior change is that the type real and numeric become valid for end-users now.
Maybe I am overthinking. It is just a suggestion here.

@maropu
Copy link
Member

maropu commented Dec 17, 2019

Ah, ok. thanks.

@HyukjinKwon
Copy link
Member

Yeah, it's "any user-facing change" not a "behaivour change" :-) ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants