Skip to content
This repository was archived by the owner on Jan 6, 2023. It is now read-only.
This repository was archived by the owner on Jan 6, 2023. It is now read-only.

Feature request: Do not try to recover output checkpoints for plugins that don't use it #896

Open
@sundbry

Description

@sundbry

Many output plugins do not use checkpoint state. It would be nice if the system did not even bother to read/write these checkpoint files if they are just going to be empty anyways. This should give us a couple benefits:

  1. Anti-fragility in resuming jobs (especially when the output/structure of the job changes but the input stays the same)
  2. Reduced load and costs of S3

I imagine to keep the interface flexible, we could add to the plugin protocol to check at runtime for various features of the plugin, such as if output checkpointing is supported.

Thoughts?

(Typical stack trace when I resume a job and output checkpointing fails b/c I changed the job definition around)

ERROR 2019-05-23 09:23:29,085 service.data.job.core: {:message Onyx lifecycle exception,  :phase :lifecycle/recover-output}
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: A07B7892DD6BE81F; S3 Extended Request ID: AOGX9hab+QAnQIuIk9C1gVVVSOPTZRUeBNkzRIbUE/vxnk7wlDS/OWqqquH/M9GNnWNUr4DWyF8=), S3 Extended Request ID: AOGX9hab+QAnQIuIk9C1gVVVSOPTZRUeBNkzRIbUE/vxnk7wlDS/OWqqquH/M9GNnWNUr4DWyF8=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1409)
        at onyx.storage.s3$read_checkpointed_bytes.invokeStatic(s3.clj:102)
        at onyx.storage.s3$read_checkpointed_bytes.invoke(s3.clj:100)
        at onyx.storage.s3$eval49560$fn__49562$fn__49564.invoke(s3.clj:241)
        at onyx.storage.s3$eval49560$fn__49562.invoke(s3.clj:239)
        at clojure.lang.MultiFn.invoke(MultiFn.java:284)
        at onyx.peer.resume_point$read_checkpoint.invokeStatic(resume_point.clj:56)
        at onyx.peer.resume_point$read_checkpoint.invoke(resume_point.clj:51)
        at onyx.peer.resume_point$recover_output.invokeStatic(resume_point.clj:112)
        at onyx.peer.resume_point$recover_output.invoke(resume_point.clj:106)
        at onyx.peer.task_lifecycle$recover_output.invokeStatic(task_lifecycle.clj:486)
        at onyx.peer.task_lifecycle$recover_output.invoke(task_lifecycle.clj:479)
        at onyx.peer.task_lifecycle.TaskStateMachine.exec(task_lifecycle.clj:1070)
        at onyx.peer.task_lifecycle$run_task_lifecycle_BANG_.invokeStatic(task_lifecycle.clj:550)
        at onyx.peer.task_lifecycle$run_task_lifecycle_BANG_.invoke(task_lifecycle.clj:540)
        at onyx.peer.task_lifecycle$start_task_lifecycle_BANG_$fn__43880.invoke(task_lifecycle.clj:1155)
        at clojure.core.async$thread_call$fn__11217.invoke(async.clj:442)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions