Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gluten-4585][VL] Support spark.sql.files.ignoreMissingFiles=true #4725

Merged
merged 5 commits into from
Feb 21, 2024

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented Feb 20, 2024

What changes were proposed in this pull request?

Fixes: #4585
Upstream PR: facebookincubator/velox#8615, facebookincubator/velox#8662

How was this patch tested?

UT.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhli1142015
Copy link
Contributor Author

@rui-mo , @PHILO-HE , @JkSelf , could you help to review this PR? Thanks.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a tiny comment for test. Thanks!

)
sources <- Seq("", format)
} {
if (BackendTestUtils.isVeloxBackendLoaded()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to just include this test in VeloxTestSettings, but exclude it in ClickHouseTestSettings. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@@ -367,7 +367,7 @@ class VeloxTestSettings extends BackendTestSettings {
.excludeByPrefix("SPARK-22790")
// plan is different cause metric is different, rewrite
.excludeByPrefix("SPARK-25237")
// ignoreMissingFiles mode, wait to fix
// ignoreMissingFiles mode, rewrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we desrible why we need rewrite these tests? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Copy link

Run Gluten Clickhouse CI

@@ -319,6 +319,7 @@ class ClickHouseTestSettings extends BackendTestSettings {
enableSuite[GlutenFileBasedDataSourceSuite]
.exclude("SPARK-23072 Write and read back unicode column names - csv")
.excludeByPrefix("Enabling/disabling ignoreMissingFiles using")
.excludeByPrefix("Gluten - Enabling/disabling ignoreMissingFiles using")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: .excludeGlutenTest("Enabling/disabling ignoreMissingFiles using ...") with full test name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use excludeGlutenTestsByPrefix.

Copy link

Run Gluten Clickhouse CI

PHILO-HE
PHILO-HE previously approved these changes Feb 21, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@PHILO-HE PHILO-HE merged commit 996ff4c into apache:main Feb 21, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4725_time.csv log/native_master_02_20_2024_c3614f866_time.csv difference percentage
q1 32.42 34.24 1.821 105.62%
q2 25.68 24.42 -1.261 95.09%
q3 37.60 38.78 1.184 103.15%
q4 38.67 38.03 -0.641 98.34%
q5 71.58 70.83 -0.756 98.94%
q6 7.06 7.23 0.171 102.42%
q7 84.77 82.68 -2.093 97.53%
q8 86.31 85.38 -0.933 98.92%
q9 124.71 126.15 1.446 101.16%
q10 42.63 43.25 0.623 101.46%
q11 20.08 20.79 0.706 103.52%
q12 27.58 26.48 -1.104 96.00%
q13 47.59 44.88 -2.706 94.31%
q14 16.02 18.88 2.859 117.85%
q15 28.02 29.18 1.162 104.15%
q16 14.19 15.42 1.229 108.66%
q17 102.84 102.55 -0.287 99.72%
q18 149.40 149.13 -0.271 99.82%
q19 12.76 12.57 -0.184 98.55%
q20 29.24 26.34 -2.904 90.07%
q21 223.03 225.83 2.800 101.26%
q22 13.86 13.68 -0.186 98.66%
total 1236.05 1236.72 0.675 100.05%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_02_22_2024_time.csv log/native_master_02_20_2024_c3614f866_time.csv difference percentage
q1 32.51 34.24 1.732 105.33%
q2 24.28 24.42 0.145 100.60%
q3 38.12 38.78 0.660 101.73%
q4 38.44 38.03 -0.403 98.95%
q5 71.06 70.83 -0.229 99.68%
q6 5.36 7.23 1.876 135.00%
q7 85.47 82.68 -2.789 96.74%
q8 88.36 85.38 -2.984 96.62%
q9 123.63 126.15 2.519 102.04%
q10 44.24 43.25 -0.994 97.75%
q11 21.09 20.79 -0.306 98.55%
q12 26.06 26.48 0.411 101.58%
q13 46.65 44.88 -1.766 96.21%
q14 15.31 18.88 3.576 123.36%
q15 31.22 29.18 -2.042 93.46%
q16 13.80 15.42 1.616 111.70%
q17 104.44 102.55 -1.883 98.20%
q18 149.48 149.13 -0.357 99.76%
q19 13.72 12.57 -1.149 91.63%
q20 28.41 26.34 -2.069 92.72%
q21 226.51 225.83 -0.686 99.70%
q22 13.62 13.68 0.061 100.45%
total 1241.78 1236.72 -5.060 99.59%

WangGuangxin added a commit to WangGuangxin/gluten that referenced this pull request May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] support spark.sql.files.ignoreMissingFiles=true
4 participants