-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53940][SQL] Function version() should return full spark version instead of short version #52641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…n instead of short version
I think we should keep the legacy behavior. We should add one parameter (e.g. isExtended default is false) to show the extra version info. |
Following @dongjoon-hyun's #26209 (comment), I changed VERSION to VERSION_SHORT. I'd still leave the decision to @dongjoon-hyun. |
@beliefer I don't see the necessity to introduce a legacy config because the result has no difference for the official Apache release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As mentioned in the above, we cannot make a breaking change like this, @pan3793 .
- In addition, Yes, this was intentionally designed in that way.
@dongjoon-hyun, if we only count official Apache Spark releases, for the user perspective, the result does not change, so I suppose it will break nothing? And I wonder why this is designed to return a short version? I experienced many times in daily Spark support cases that users claim they are using an x.y.z Spark but actually a vender modified version (and it does have a vender suffix in the full version), this definitely is a misleading info! |
No. Please see the rational of spark/common/utils/src/main/scala/org/apache/spark/util/VersionUtils.scala Lines 41 to 52 in 8499a62
In addition, if we only consider official Spark release, why do we need this PR? |
@dongjoon-hyun I understand sometimes having
My point is, we should use the official Spark release to justify whether it's a breaking change or not, in the meantime, we should make the code friendly for vendor distributions. I also checked the CDH Hive,
|
I wonder if we should better just have the another expression like |
@HyukjinKwon Having another function looks too overkill to me. @beliefer @dongjoon-hyun @HyukjinKwon I tested two additional popular RDBMS, MariaDB and PostgreSQL, both support If there are concerns about breaking changes for vendor Spark distributions, how about introducing an internal config, e.g., |
Adding a legacy configuration appears contradictory under the hypothesis of non-breaking changes. Or if this change benefits vendor-specific Spark distributions, they can maintain it themselves by modifying the version() or simply adding a new UDF. This seems inappropriate for the community to make any assumptions on their behalf. Or if it's for Spark code contributors for bug hunting, version w/-or-w/o can still be vague。 Instead, they should refer to the git hashtag. |
@yaooqinn I hold the same view, I prefer to not have config, but if someone want, I'm also fine to have an internel config.
I'm not fully agree with this. It's a fact that there are many vendors have their own Spark distributions, if possible, we should make code vendor-friendly if it won't introducing additional complexity. Consider some common practice for vendors: the vendor usually has their own support channel, version suffix, docs site. In history,
I agree git hash is more accurate, but full version is still valuable in many cases. For example, if there is a well known bug in |
We do not need a legacy config. It is just an optional parameter.
|
@beliefer, this actually introduces user-facing changes and complicates things, which is exactly what I want to avoid whenever possible. To summarize, my current proposed candidates are:
|
What changes were proposed in this pull request?
As title.
Why are the changes needed?
For example, when executing
select version()
onspark-4.1.0-preview2-bin-hadoop3
, it returns4.1.0
, which is misleading.Another case is that the vendor version usually has a suffix, e.g.,
Foo Spark 4.0.1-foo1.2.3
, in this case,version()
returns only the short version 4.0.1 is misleading.As a reference, in CDH Hive,
version()
returns the full version.MariaDB
version()
's result contains vendor infoPostgreSQL
version()
's result also contains vendor infoDoes this PR introduce any user-facing change?
No, for the Apache official release, because all official release versions are same as the short version.
But users would see the full Spark version for preview releases when executing
SELECT version()
How was this patch tested?
UT is tuned.
Was this patch authored or co-authored using generative AI tooling?
No.