Skip to content

[Feature]: Allow setting "activeDeadlineSeconds" on spark-dependency Pod #2202

Open

Description

Requirement

As a Jaeger operator,
I want to be able to limit the execution time of the jaeger-spark-dependencies Job,
so that I can ensure the Job is not running forever and blocking/wasting resources.

Problem

The spark-dependency spark jobs (the actual Spark jobs inside the JVM) often run into OutOfMemory issues.
The actual problem here is, that the Container does not fail (exit), even though the Spark job already failed.

To solve this issue lasting I have created jaegertracing/spark-dependencies#131 within the spark-dependency repo. However this repos seems not to be maintained anymore (?), hence it would be a improvement to at least be able to limit the execution time of the Pod using Kubernetes specifications. This is currently not feasible for the user since the CronJob is managed by the Jaeger Operator.

Proposal

Set activeDeadlineSeconds on the Pod-spec to limit the execution time. If the specified amount of time run out before the job finishes, the Pod will be deleted and a new Pod will be created.

Ideally this should be configurable within jaeger.spec.storage.dependencies. A high default value (8h or 1d) would also be fine, but would be a breaking change in case of (real) long running spark jobs.

This does not solve the problem entirely, but would at least be a mitigation.

Open questions

Is jaegertracing/spark-dependencies still maintained?

-> If yes: it would be better to fix the Job itself jaegertracing/spark-dependencies#131

-> If no: I could open a PR to address this if the proposal sounds good to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions