Skip to content

refactor(prover): combine proxy and worker#1688

Merged
Mirko-von-Leipzig merged 22 commits intonextfrom
mirko/refactor/prover
Feb 20, 2026
Merged

refactor(prover): combine proxy and worker#1688
Mirko-von-Leipzig merged 22 commits intonextfrom
mirko/refactor/prover

Conversation

@Mirko-von-Leipzig
Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig commented Feb 18, 2026

This PR effectively rewrites the miden-remote-prover, removing the distinction between proxy and worker mode by adding a configurable request queue to the "worker".

This removes the need for pingora as we now essentially replace it with a basic queuing mechanism. This was proposed in #1171 (comment).

Given the scale of the refactor, I don't think the diff view will be of much use. Instead a from-scratch review is probably necessary, though I think (hope) it should be quite straight-forward.

Prover gRPC schema

I didn't touch the gRPC schema of the prover. The only real problem here is that we have both a proxy and worker status definition and its unclear now to which they refer.

This isn't such a big deal, though I will note that the gRPC reflection on the miden-remote-prover now erroneously thinks it supports the proxy status because its part of the same file.

One potential option is to only have a single status implementation i.e. the proxy one, and the "worker" just always sends an array of one element in its impl. This would also let us add this as a method instead of as a separate service. The original intention here was to host it on internal ports but I think at this point that intention no longer matters (imo).

Closes #1171
Closes #1341
Closes #935

@Mirko-von-Leipzig Mirko-von-Leipzig marked this pull request as ready for review February 18, 2026 13:06
Copy link
Collaborator

@SantiagoPittella SantiagoPittella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing, i really like this simplification and the new design.

Left some minor nit comments.

Regarding the gRPC schema, should we removed the references to the proxy and worker from the schemas? It doesn't make much sense to keep it now IMO, just having one service definition. And probably we need to extend a with the service status schema to include the capacity.

@Mirko-von-Leipzig
Copy link
Collaborator Author

Regarding the gRPC schema, should we removed the references to the proxy and worker from the schemas? It doesn't make much sense to keep it now IMO, just having one service definition. And probably we need to extend a with the service status schema to include the capacity.

I think we should but I'd like to do that in a different PR (if we get consensus). This would also impact the network monitor and I'd like to isolate that change.

Comment on lines +79 to +86
// Blocking in place is fairly safe since we guarantee that only a single request is
// processed at a time.
//
// This has the downside that requests being proven cannot be cancelled since we are now
// outside the async runtime. This could occur if the server timeout is exceeded, or
// the client cancels the request. A different approach is technically possible, but
// would require more complex logic to handle cancellation in tandem with sync.
tokio::task::block_in_place(|| prover.prove(request)).map(tonic::Response::new)
Copy link
Collaborator Author

@Mirko-von-Leipzig Mirko-von-Leipzig Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is something I'd like to address somehow. But we can try that in a follow-up PR because I think it would complicate this review.

As a summary:

  • We have a server.timeout which should cancel a request if its been enqueued for longer than the timeout.
  • This only works for tasks in the async runtime.
  • block_in_place moves the current thread off the async runtime, making it uncancelable.

So right now what happens is that if there is a very long prove request, then it completes but likely all other requests in-queue get timed out. Which is of course unfair and not really what we want.


I think what we want is to bound the time a single request spends actually proving. We already have "back-pressure" in the form of the bounded queue capacity so there is no real benefit to punishing those already in the queue.

Cancelling a sync task isn't trivial, but I think we can achieve this using a dedicated prover thread + a manager task which aborts the thread if it takes too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why do we need the task here to be sync? Is this because the prove() function is sync? If so, this will change shortly (in v0.21 VM, prove() is an async function).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. Yes because prove() is (expensive) sync and would otherwise block the runtime.

In which case I'm happy to leave this as is until we get async prove.

Copy link
Collaborator

@SantiagoPittella SantiagoPittella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merge next
@Mirko-von-Leipzig Mirko-von-Leipzig merged commit c6ad6be into next Feb 20, 2026
19 checks passed
@Mirko-von-Leipzig Mirko-von-Leipzig deleted the mirko/refactor/prover branch February 20, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants