Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK][MEDIUM] Enhancing JVM monitoring in Kyuubi Spark Engine using JVMQuake #5402

Closed
2 of 3 tasks
iodone opened this issue Oct 11, 2023 · 4 comments
Closed
2 of 3 tasks
Assignees

Comments

@iodone
Copy link
Contributor

iodone commented Oct 11, 2023

Code of Conduct

Search before creating

  • I have searched in the task list and found no similar tasks.

Mentor

  • I have sufficient knowledge and experience of this task, and I volunteer to be the mentor of this task to guide contributors to complete the task.

Skill requirements

  • Familiarize the integration process of Spark and Kyuubi engine plugins.
  • Understand the principles of collecting Spark JVM metrics.

Background and Goals

When facing out-of-control memory management in Spark engine, we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered.

So introducing JVMQuake provides more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure.

Implementation steps

  1. Start the JVMQuake for the driver and executor through Spark plugins.
  2. Collect GC metrics using JVMQuake.
  3. Set rules for killing processes and specify the path for saving HeapDump.

Additional context

Custom Spark Plugin example:

package example

import org.apache.spark.api.plugin.{SparkPlugin, DriverPlugin, ExecutorPlugin}

class CustomExecSparkPlugin extends SparkPlugin  {
 
  override def driverPlugin(): DriverPlugin = {
    new DriverPlugin() {
      override def shutdown(): Unit = {
        // custom code        
      }
    }
  }

  override def executorPlugin(): ExecutorPlugin = {
    new ExecutorPlugin() {
      override def shutdown(): Unit = {
        // custom code  
      }
    }
  }
}

No response

@yoock
Copy link

yoock commented Oct 11, 2023

can you assign it to me?

@iodone
Copy link
Contributor Author

iodone commented Oct 11, 2023

Can you determine the specific completion time for this task?

@yoock
Copy link

yoock commented Oct 11, 2023

November 15th

@yikf
Copy link
Contributor

yikf commented Nov 14, 2023

Is there any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants