-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @cloud-fan @maropu @HyukjinKwon, thanks for reviewing in advance. |
I'm not sure that this is useful for users. Have you checked the previous discussion? #21766 |
that pr seems to be raised before this ticket https://issues.apache.org/jira/browse/SPARK-27764, now we are enhancing ansi support and PostgresSQL feature parity, maybe have enough reason to support these types. |
Test build #113835 has finished for PR 26537 at commit
|
@@ -2154,17 +2154,17 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
case ("smallint" | "short", Nil) => ShortType | |||
case ("int" | "integer", Nil) => IntegerType | |||
case ("bigint" | "long", Nil) => LongType | |||
case ("float", Nil) => FloatType | |||
case ("float" | "real", Nil) => FloatType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take a look at SQL standard, float and real are different
FLOAT specifies the data type approximate numeric, with binary precision equal to or greater than the
value of the specified <precision>. The maximum value of <precision> is implementation-defined.
<precision> shall not be greater than this value.
REAL specifies the data type approximate numeric, with implementation-defined precision.
We can say that now Spark supports real type, and its precision is the same as java float. But it needs to be proposed explicitly, e.g. send an email to dev list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems numeric and decimal are not exactly the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to gather more information about these before sending an email to discuss.
Name | Storage Size | Description | Range |
---|---|---|---|
decimal | variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
numeric | variable | user-specified precision, exact | up to 131072 digits before the decimal point; up to 16383 digits after the decimal point |
real | 4 bytes | variable-precision, inexact | 6 decimal digits precision |
double precision | 8 bytes | variable-precision, inexact | 15 decimal digits precision |
The types decimal and numeric are equivalent. Both types are part of the SQL standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- mimer_sql_engine
https://download.mimer.com/pub/developer/docs/html_100/Mimer_SQL_Engine_DocSet/Syntax_Rules4.html#wp1228955
Data Type | Abbrevi-ation | Description | Range |
---|---|---|---|
DECIMAL(p,s) | DEC(p,s) | Exact numerical,precision p, scale s. | 1 <= p <= 450 <= s <= p |
NUMERIC(p,s) | N/A | Exact numerical, precision p, scale s.(Same as DECIMAL). | 1 <= p <= 450 <= s <= p |
FLOAT(p) | N/A | Approximate numerical,mantissa precision p. | 1 <= p <= 45Zero or absolute value10-999 to 10+999 |
REAL | N/A | Approximate numericalmantissa precision 7. | Zero or absolute value10-38 to 10+38Corresponds to a single precision float. |
FLOAT | N/A | Approximate numericalmantissa precision 16. | Zero or absolute value10-308 to 10+308Corresponds to a double precision float. |
DOUBLE PRECISION | N/A | Approximate numericalmantissa precision 16. | Zero or absolute value10-308 to 10+308 |
Note: In Mimer SQL the NUMERIC data type is exactly equivalent to DECIMAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REAL
A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
DECIMAL
A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.
No documentation found about numeric type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- SQL Server
Decimal
SQL Server 2008 R2 and SQL Server 2012 vary as follows:
Transact-SQL partially supports this data type. The xs:decimal type represents arbitrary precision
decimal numbers. Transact-SQL does not support variable precision decimals. Minimally conforming
XML processors are required to support decimal numbers with a minimum of totalDigits=18. TransactSQL supports totalDigits=38, but limits the fractional digits to 10. All xs:decimal-instanced values are
represented internally on the server by the SQL type numeric (38, 10).
Values of this type need to comply with the format of the SQL numeric type. This type internally
represents the support of numbers up to a total of 38 digits, with 10 of those digit positions reserved
for fractional precision.
Float
SQL Server 2008 R2 and SQL Server 2012 vary as follows:
Transact-SQL partially supports this data type. Values of this type need to comply with the format of
the SQL real type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's not very useful to look at other databases. There are many stuff that is "implementation-defined".
Personally I'm fine to treat real as float in Spark, just need an official proposal.
For numeric/decimal, I don't have a strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below notes from SQL standard might have some information to help us whether or not to involve real or numeric
An SQL-implementation is permitted to regard certain s as equivalent, if they have the
same precision, scale, and radix, as permitted by the Syntax Rules of Subclause 6.1, “”. When two
or more s are equivalent, the SQL-implementation chooses one of these equivalent s as the normal form representing that equivalence class of s. The normal
form determines the name of the exact numeric type in the numeric type descriptor.
Similarly, an SQL-implementation is permitted to regard certain s as equivalent,
as permitted by the Syntax Rules of Subclause 6.1, “”, in which case the SQL-implementation
chooses a normal form to represent each equivalence class of and the normal
form determines the name of the approximate numeric type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For the exact numeric types DECIMAL and NUMERIC:
a) The maximum value of precision is implementation-defined. precision shall not be greater than this value.
b) The maximum value of scale is implementation-defined. scale shall not be greater than this maximum value.- NUMERIC specifies the data type exact numeric, with the decimal precision and scale specified by the precision and scale.
- DECIMAL specifies the data type exact numeric, with the decimal scale specified by the scale and the implementation-defined decimal precision equal to or greater than the value of the specified precision.
I don't know why 26th - NUMERIC and 27th - DECIMAL have different definitions, but IIUC with restraint of the 25th, they are even, both with implementation-defined
precision and scale, the user-specified ones can not be greater than these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So seems the key difference between DEICMAL and NUMERIC are:
implementation-defined decimal precision equal to or greater than the value of the specified precision.
Decimal can have at least specified precision (which means can have more) whereas numeric should have exactly specified precision.
Spark's decimal satisfy both so I think NUMERIC as a synonym of DECIMAL makes sense.
retest this please |
According to SQL standard, do we need DEC which is short for DECIMAL |
Test build #113932 has finished for PR 26537 at commit
|
These two we can implement |
Int/Integer is already supported, I'll make a separate pr to support |
Test build #114371 has finished for PR 26537 at commit
|
Test build #114375 has finished for PR 26537 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, for FLOAT vs REAL, seems real has less restriction comparing to the float. @yaooqinn can you send an email to dev list to ask? I think it looks fair enough.
thanks, @HyukjinKwon for your suggestion. I have drafted a proposal here. |
https://issues.apache.org/jira/browse/HIVE-16764 hive supports numeric since 3.0 |
Test build #115063 has finished for PR 26537 at commit
|
The dev mailing list seems to remain silent about this, I just found that hive also has involved this feature since its 3.0.0 release. How about we make this possible? cc: @maropu @cloud-fan @HyukjinKwon |
@@ -59,3 +59,8 @@ TBLPROPERTIES ('a' = '1'); | |||
|
|||
SHOW CREATE TABLE tbl; | |||
DROP TABLE tbl; | |||
|
|||
-- new real/numeric type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we need to prove is the alias works. I think this test is good enough (no need to test it in cast), but we should update the comment. It's not new types, but just alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
According to the SQL standard, I think we should have named our float type as REAL, and decimal type as NUMERIC. It's too late to change it now, I'm fine to treat REAL and NUMERIC as aliases. |
Test build #115104 has finished for PR 26537 at commit
|
thanks, merging to master! |
This PR does introduce user-facing changes. We should mention it in the PR description. |
Is this a behaivour change issue? I think just adding a type alias does not change the existing behaviours? |
Technically, the new alias is user-facing change. The behavior change is that the type |
Ah, ok. thanks. |
Yeah, it's "any user-facing change" not a "behaivour change" :-) .. |
What changes were proposed in this pull request?
The types decimal and numeric are equivalent. Both types are part of the SQL standard.
the real type is 4 bytes, variable-precision, inexact, 6 decimal digits precision, same as our float, part of the SQL standard.
Why are the changes needed?
improve sql standard support
other dbs
https://www.postgresql.org/docs/9.3/datatype-numeric.html
https://prestodb.io/docs/current/language/types.html#floating-point
http://www.sqlservertutorial.net/sql-server-basics/sql-server-data-types/
MySQL treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled.
In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.
Does this PR introduce any user-facing change?
the type
real
andnumeric
become valid , which are aliases forfloat
anddecmial
How was this patch tested?
add ut