Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6402]: engine.share.level=GROUP enable for a list of hadoop … #6779

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Madhukar525722
Copy link
Contributor

@Madhukar525722 Madhukar525722 commented Oct 23, 2024

…groups

🔍 Description

Issue References 🔗

This pull request fixes #6402

Describe Your Solution 🔧

Currently, the group level engine is getting launched with the first user group itself. This change will allow user to pass in which group they want to launch the engine.

The flow of implementation is :

  • Take input of preferred groups
    • If it is not defined, take the default route, pick the first group from the users groups list
    • If its defined
      • It will iterate the list, and look for the first match
      • If no match found, it will return the first group from the users groups list
  1. In this way, for every group there will be single engine, which can be re-used.
  2. First user launching the engine is implemented to handle the scenarios where Hadoop groups are not the yarn users.
  3. For security concerns in 2nd point, we can have fine grained access control on session level, using Apache Ranger.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Results🧪

  1. When PREFERRED_GROUPS were not defined, it used the first group from the list
    Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c
    application ID: application_1728291907264_43966
    User: madlnu

  2. When valid PREFERRED_GROUPS were defined
    Spark application name: kyuubi_GROUP_SPARK_SQL_kyuubi_test_b_default_be9a16a8-be38-4ab6-bee9-1934f8556f18
    application ID: application_1728291907264_43968
    User: madlnu

  3. When no PREFERRED_GROUPS were matching, it used the first group from the list
    Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c
    application ID: application_1728291907264_43966
    User: madlnu

  4. When any other user X tries to access the existing GROUP engine, it uses the same engine
    Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c
    application ID: application_1728291907264_43966
    User: madlnu


Checklist 📝

Be nice. Be informative.

@Madhukar525722
Copy link
Contributor Author

Madhukar525722 commented Oct 26, 2024

Hi @pan3793 @bowenliang123 @turboFei . Please review the implementation. Thanks

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (d3520dd) to head (2c6f270).

Files with missing lines Patch % Lines
...rg/apache/kyuubi/session/HadoopGroupProvider.scala 0.00% 13 Missing ⚠️
...in/scala/org/apache/kyuubi/config/KyuubiConf.scala 0.00% 6 Missing ⚠️
...ain/scala/org/apache/kyuubi/engine/EngineRef.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #6779   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         687     687           
  Lines       42439   42459   +20     
  Branches     5793    5799    +6     
======================================
- Misses      42439   42459   +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bowenliang123
Copy link
Contributor

To support preferred the group name in session conf, I would suggest to do the following:

  1. add a session conf key in KyuubiReservedKeys rather than in adding a server side config for a global preferred group (you are not even using its value anyway).
  2. add a server config for choosing the select policy, eg. 'head', 'prefered_session_conf'

And I have worries about using the session conf will interference choosing the engine reference, as it's easy to be changed at runtime. Is there any better approach with solid connection variables?

@pan3793
Copy link
Member

pan3793 commented Oct 28, 2024

@Madhukar525722 thanks for taking care of this feature, I leave the comments to add a configuration "kyuubi.session.preferGroup" previously, but I have a little different idea now.

Now I would suggest having a "kyuubi.session.preferredGroups" (Seq[String]), when present, we select the most preferred group from the whole group list, otherwise, take the head, instead of failing fast. For the implementatio, we can use a custom Comparator to acheive that.

And I have worries about using the session conf will interference choosing the engine reference, as it's easy to be changed at runtime. Is there any better approach with solid connection variables?

seems there is no much differences from the existing properties like kyuubi.session.user?

@Madhukar525722
Copy link
Contributor Author

Madhukar525722 commented Oct 28, 2024

Hi @pan3793 , I have considered your suggestion for using kyuubi.session.preferredGroups. Please review.

@Madhukar525722
Copy link
Contributor Author

Hi @pan3793 , Please review the change, integration tests failed due to timeout. In previous it was green

@pan3793
Copy link
Member

pan3793 commented Nov 12, 2024

@Madhukar525722 sorry for late reply, I'm quite busy these days, will take a look soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] engine.share.level=GROUP takes only first AD Group if the user is part of multiple AD Groups
4 participants