-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TASK][MEDIUM] Spark engine query results support reading from HDFS #5377
Labels
Comments
@cxzl25 this task is missing "implementation steps" before adding into https://github.com/orgs/apache/projects/296 |
Hi , I am interested in this task, can you assign it to me ? I expect to achieve it in a month Thanks. |
directly call |
3 tasks
turboFei
added a commit
that referenced
this issue
Dec 21, 2023
# 🔍 Description ## Issue References 🔗 #5591 (comment) ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [ ] Pull request title is okay. - [ ] No license issues. - [ ] Milestone correctly set? - [ ] Test coverage is ok - [ ] Assignees are selected. - [ ] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes #5895 from lsm1/branch-kyuubi-5377-followup. Closes #5377 4219d28 [Fei Wang] nit 31d4fc1 [senmiaoliu] use zlib when SPARK version less than 3.2 Lead-authored-by: senmiaoliu <senmiaoliu@trip.com> Co-authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
4 tasks
turboFei
added a commit
to turboFei/kyuubi
that referenced
this issue
Jan 17, 2024
turboFei
added a commit
that referenced
this issue
Feb 4, 2024
# 🔍 Description ## Issue References 🔗 We shall always try to clean session result path as `kyuubi.operation.result.saveToFile.enabled` can be enabled during runtime. This pull request fixes #5377 ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #5986 from turboFei/OPERATION_RESULT_SAVE_TO_FILE_runtime. Closes #5377 5dfdf96 [Fei Wang] comments 895caec [Fei Wang] [KYUUBI #5377][FOLLOWUP] Always try to cleanup session result path Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
turboFei
added a commit
that referenced
this issue
Feb 4, 2024
…sult max rows # 🔍 Description Followup #5591 Support to get existing limit from more plan and regard the result max rows. ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #5963 from turboFei/incremental_save. Closes #5377 223d510 [Fei Wang] use optimized plan ecefc2a [Fei Wang] use spark plan 57091e5 [Fei Wang] minor 2096144 [Fei Wang] for logical plan 0f734ee [Fei Wang] ut fdc1155 [Fei Wang] save f8e405a [Fei Wang] math.min Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Feb 5, 2024
# 🔍 Description ## Issue References 🔗 We shall always try to clean session result path as `kyuubi.operation.result.saveToFile.enabled` can be enabled during runtime. This pull request fixes apache#5377 ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#5986 from turboFei/OPERATION_RESULT_SAVE_TO_FILE_runtime. Closes apache#5377 5dfdf96 [Fei Wang] comments 895caec [Fei Wang] [KYUUBI apache#5377][FOLLOWUP] Always try to cleanup session result path Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Feb 5, 2024
…ard result max rows # 🔍 Description Followup apache#5591 Support to get existing limit from more plan and regard the result max rows. ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#5963 from turboFei/incremental_save. Closes apache#5377 223d510 [Fei Wang] use optimized plan ecefc2a [Fei Wang] use spark plan 57091e5 [Fei Wang] minor 2096144 [Fei Wang] for logical plan 0f734ee [Fei Wang] ut fdc1155 [Fei Wang] save f8e405a [Fei Wang] math.min Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Mar 21, 2024
### _Why are the changes needed?_ close apache#5377 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ NO Closes apache#5591 from lsm1/branch-kyuubi-5377. Closes apache#5377 9d1a18c [senmiaoliu] ignore empty file 3c70a1e [LSM] fix doc 73d3c3a [senmiaoliu] fix style and add some comment 80e1f0d [senmiaoliu] Close orc fetchOrcStatement and remove result save file when ExecuteStatement close 42634a1 [senmiaoliu] fix style 979125d [senmiaoliu] fix style 1dc07a5 [senmiaoliu] spark engine save into hdfs file Lead-authored-by: senmiaoliu <senmiaoliu@trip.com> Co-authored-by: LSM <senmiaoliu@trip.com> Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Mar 21, 2024
# 🔍 Description ## Issue References 🔗 apache#5591 (comment) ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklists ## 📝 Author Self Checklist - [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project - [x] I have performed a self-review - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) ## 📝 Committer Pre-Merge Checklist - [ ] Pull request title is okay. - [ ] No license issues. - [ ] Milestone correctly set? - [ ] Test coverage is ok - [ ] Assignees are selected. - [ ] Minimum number of approvals - [ ] No changes are requested **Be nice. Be informative.** Closes apache#5895 from lsm1/branch-kyuubi-5377-followup. Closes apache#5377 4219d28 [Fei Wang] nit 31d4fc1 [senmiaoliu] use zlib when SPARK version less than 3.2 Lead-authored-by: senmiaoliu <senmiaoliu@trip.com> Co-authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Mar 21, 2024
# 🔍 Description ## Issue References 🔗 We shall always try to clean session result path as `kyuubi.operation.result.saveToFile.enabled` can be enabled during runtime. This pull request fixes apache#5377 ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#5986 from turboFei/OPERATION_RESULT_SAVE_TO_FILE_runtime. Closes apache#5377 5dfdf96 [Fei Wang] comments 895caec [Fei Wang] [KYUUBI apache#5377][FOLLOWUP] Always try to cleanup session result path Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
zhaohehuhu
pushed a commit
to zhaohehuhu/incubator-kyuubi
that referenced
this issue
Mar 21, 2024
…ard result max rows # 🔍 Description Followup apache#5591 Support to get existing limit from more plan and regard the result max rows. ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes apache#5963 from turboFei/incremental_save. Closes apache#5377 223d510 [Fei Wang] use optimized plan ecefc2a [Fei Wang] use spark plan 57091e5 [Fei Wang] minor 2096144 [Fei Wang] for logical plan 0f734ee [Fei Wang] ut fdc1155 [Fei Wang] save f8e405a [Fei Wang] math.min Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code of Conduct
Search before creating
Mentor
Skill requirements
Background and Goals
The client's SQL query may cause the Spark engine to fail because the query data result is too large, and the driver to fail due to OOM.
Although the number of query results can be limited by configuring
kyuubi.operation.result.max.rows
, if the amount of data in one row is too large, it will still cause OOM.If it can support writing query results to HDFS or other storage systems, when the client needs to obtain the results, the engine will obtain the results from HDFS, which can avoid the OOM problem.
Implementation steps
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils#getSizePerRow
df.write
api to write the query results of the execution plan to HDFSAdditional context
Original reporter is @cxzl25
The text was updated successfully, but these errors were encountered: