Skip to content

[SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal #26537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2179,17 +2179,18 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case ("smallint" | "short", Nil) => ShortType
case ("int" | "integer", Nil) => IntegerType
case ("bigint" | "long", Nil) => LongType
case ("float", Nil) => FloatType
case ("float" | "real", Nil) => FloatType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a look at SQL standard, float and real are different

FLOAT specifies the data type approximate numeric, with binary precision equal to or greater than the
value of the specified <precision>. The maximum value of <precision> is implementation-defined.
<precision> shall not be greater than this value.

REAL specifies the data type approximate numeric, with implementation-defined precision.

We can say that now Spark supports real type, and its precision is the same as java float. But it needs to be proposed explicitly, e.g. send an email to dev list.

Copy link
Member

@maropu maropu Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems numeric and decimal are not exactly the same...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to gather more information about these before sending an email to discuss.

  1. postgresql
    https://www.postgresql.org/docs/9.3/datatype-numeric.html
Name Storage Size Description Range
decimal variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real 4 bytes variable-precision, inexact 6 decimal digits precision
double precision 8 bytes variable-precision, inexact 15 decimal digits precision

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. mimer_sql_engine
    https://download.mimer.com/pub/developer/docs/html_100/Mimer_SQL_Engine_DocSet/Syntax_Rules4.html#wp1228955
Data Type Abbrevi-ation Description Range
DECIMAL(p,s) DEC(p,s) Exact numerical,precision p, scale s. 1 <= p <= 450 <= s <= p
NUMERIC(p,s) N/A Exact numerical, precision p, scale s.(Same as DECIMAL). 1 <= p <= 450 <= s <= p
FLOAT(p) N/A Approximate numerical,mantissa precision p. 1 <= p <= 45Zero or absolute value10-999 to 10+999
REAL N/A Approximate numericalmantissa precision 7. Zero or absolute value10-38 to 10+38Corresponds to a single precision float.
FLOAT N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308Corresponds to a double precision float.
DOUBLE PRECISION N/A Approximate numericalmantissa precision 16. Zero or absolute value10-308 to 10+308

Note: In Mimer SQL the NUMERIC data type is exactly equivalent to DECIMAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. prestosql
    https://prestosql.io/docs/current/language/types.html#floating-point

REAL

A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
DECIMAL
A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.

No documentation found about numeric type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. SQL Server

https://docs.microsoft.com/en-us/openspecs/standards_support/MS-STDSUPLP/17a32be7-10b3-4025-bea4-133a66b4c689

Decimal

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. The xs:decimal type represents arbitrary precision
decimal numbers. Transact-SQL does not support variable precision decimals. Minimally conforming
XML processors are required to support decimal numbers with a minimum of totalDigits=18. TransactSQL supports totalDigits=38, but limits the fractional digits to 10. All xs:decimal-instanced values are
represented internally on the server by the SQL type numeric (38, 10).

Values of this type need to comply with the format of the SQL numeric type. This type internally
represents the support of numbers up to a total of 38 digits, with 10 of those digit positions reserved
for fractional precision.

Float

SQL Server 2008 R2 and SQL Server 2012 vary as follows:

Transact-SQL partially supports this data type. Values of this type need to comply with the format of
the SQL real type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's not very useful to look at other databases. There are many stuff that is "implementation-defined".

Personally I'm fine to treat real as float in Spark, just need an official proposal.

For numeric/decimal, I don't have a strong preference.

Copy link
Member Author

@yaooqinn yaooqinn Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below notes from SQL standard might have some information to help us whether or not to involve real or numeric

An SQL-implementation is permitted to regard certain s as equivalent, if they have the
same precision, scale, and radix, as permitted by the Syntax Rules of Subclause 6.1, “”. When two
or more s are equivalent, the SQL-implementation chooses one of these equivalent s as the normal form representing that equivalence class of s. The normal
form determines the name of the exact numeric type in the numeric type descriptor.
Similarly, an SQL-implementation is permitted to regard certain s as equivalent,
as permitted by the Syntax Rules of Subclause 6.1, “”, in which case the SQL-implementation
chooses a normal form to represent each equivalence class of and the normal
form determines the name of the approximate numeric type.

Copy link
Member Author

@yaooqinn yaooqinn Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For the exact numeric types DECIMAL and NUMERIC:
    a) The maximum value of precision is implementation-defined. precision shall not be greater than this value.
    b) The maximum value of scale is implementation-defined. scale shall not be greater than this maximum value.
  2. NUMERIC specifies the data type exact numeric, with the decimal precision and scale specified by the precision and scale.
  3. DECIMAL specifies the data type exact numeric, with the decimal scale specified by the scale and the implementation-defined decimal precision equal to or greater than the value of the specified precision.

I don't know why 26th - NUMERIC and 27th - DECIMAL have different definitions, but IIUC with restraint of the 25th, they are even, both with implementation-defined precision and scale, the user-specified ones can not be greater than these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So seems the key difference between DEICMAL and NUMERIC are:

implementation-defined decimal precision equal to or greater than the value of the specified precision.

Decimal can have at least specified precision (which means can have more) whereas numeric should have exactly specified precision.

Spark's decimal satisfy both so I think NUMERIC as a synonym of DECIMAL makes sense.

case ("double", Nil) => DoubleType
case ("date", Nil) => DateType
case ("timestamp", Nil) => TimestampType
case ("string", Nil) => StringType
case ("character" | "char", length :: Nil) => CharType(length.getText.toInt)
case ("varchar", length :: Nil) => VarcharType(length.getText.toInt)
case ("binary", Nil) => BinaryType
case ("decimal" | "dec", Nil) => DecimalType.USER_DEFAULT
case ("decimal" | "dec", precision :: Nil) => DecimalType(precision.getText.toInt, 0)
case ("decimal" | "dec", precision :: scale :: Nil) =>
case ("decimal" | "dec" | "numeric", Nil) => DecimalType.USER_DEFAULT
case ("decimal" | "dec" | "numeric", precision :: Nil) =>
DecimalType(precision.getText.toInt, 0)
case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
DecimalType(precision.getText.toInt, scale.getText.toInt)
case ("interval", Nil) => CalendarIntervalType
case (dt, params) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,8 @@ TBLPROPERTIES ('a' = '1');

SHOW CREATE TABLE tbl;
DROP TABLE tbl;

-- float alias real and decimal alias numeric
CREATE TABLE tbl (a REAL, b NUMERIC, c NUMERIC(10), d NUMERIC(10,1)) USING parquet;
SHOW CREATE TABLE tbl;
DROP TABLE tbl;
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 24
-- Number of queries: 27


-- !query 0
Expand Down Expand Up @@ -220,3 +220,28 @@ DROP TABLE tbl
struct<>
-- !query 23 output



-- !query 24
CREATE TABLE tbl (a REAL, b NUMERIC, c NUMERIC(10), d NUMERIC(10,1)) USING parquet
-- !query 24 schema
struct<>
-- !query 24 output



-- !query 25
SHOW CREATE TABLE tbl
-- !query 25 schema
struct<createtab_stmt:string>
-- !query 25 output
CREATE TABLE `tbl` (`a` FLOAT, `b` DECIMAL(10,0), `c` DECIMAL(10,0), `d` DECIMAL(10,1))
USING parquet


-- !query 26
DROP TABLE tbl
-- !query 26 schema
struct<>
-- !query 26 output