op-e2e: cleanup endpoints and dialing #11594

protolambda · 2024-08-24T03:50:17Z

Description

This cleans up the endpoint-dialing in op-e2e:

each service provides an endpoint.RPC (interface!), not a string.
the e2e System caches client RPC bindings
the e2e System no longer pre-dials clients (unused clients = slow down of e2e)
the e2e System now will close all cached clients on shutdown
the GethInstance type is moved into the geth package now, to bundle the node and backend parts of in-process geth.
the Opnode type now encapsulates the op-e2e lifecycle idiosyncrasies of op-node.
the above two types now provides their RPC endpoints through the endpoint.RPC type, allowing them to specify both the RPC and HTTP, as well as in-process attach (possible with geth, soon also op-node and other services).
no tests manually dial an RPC anymore, unless really necessary. This also steers many tests away from not closing their RPC clients properly.
the endpoint type selection now always respects the global preference of using HTTP type RPC or not.

And I removed the defer sys.Close() statements, since the System already registers a sys.Close on test-cleanup.

In the future I hope we can refactor the op-batcher/op-proposer/op-node to all directly use the endpoint package, to configure their service. Then we can pass in the interface, rather than the string, and enjoy faster op-e2e testing (no system dials, websocket overhead, etc).

Tests

Test infra refactor, no new features that aren't already covered.

tynes · 2024-08-24T16:30:16Z

Refactor looks good to me but CI is hanging for some reason it seems

… poll interval

protolambda · 2024-08-24T20:15:33Z

@tynes I accidentally quoted one client name. And the uptime check for closed op-geth was borked due to the in-process RPC not going down like HTTP/WS endpoint would. Also adjusted the proposer poll interval from 50ms to 500ms; at a 6 second proposer interval it's insane to poll 120 times per proposal, hope that improves op-e2e performance a bit.

tynes · 2024-08-24T20:19:07Z

Changes in fff7059 look good to me

tynes · 2024-08-24T20:48:20Z

Looks like its still hanging someplace

protolambda · 2024-08-24T21:24:21Z

@tynes sorry about that, it was hanging in only one run mode, when using HTTP. The L1 endpoint that is initialized during the System Start apparently needs to support subscriptions, an exception to the HTTP RPC rule. This caused the tests to fail, but one test would then deadlock when it failed before batcher startup, as it was waiting for the batcher to report back before allowing the test to shut down fully. Tests should be fixed now.

Edit: forgot to remove some debug logs. Fixed. And also made the artifacts FS test run in parallel, I noticed it was not running in parallel like the other tests.
Edit 2: typo

protolambda · 2024-08-24T21:49:11Z

Ugh, more tests failing, also in http mode, when RPC subscriptions are used in other test settings. Think I'll change it so that the HTTP-only part only applies to the node setup, not to the clients used by tests.

* op-e2e: cleanup endpoints and dialing * op-e2e: fix accidental wrong dial, fix endpoint-test, adjust proposer poll interval * op-e2e: fix test deadlock, fix L1 RPC no-HTTP exception * op-e2e: any RPC for test, HTTP mode only applied to nodes * op-e2e: fix lint

op-e2e: cleanup endpoints and dialing

abd9b4c

protolambda requested a review from a team as a code owner August 24, 2024 03:50

protolambda requested a review from tynes August 24, 2024 03:50

tynes approved these changes Aug 24, 2024

View reviewed changes

op-e2e: fix accidental wrong dial, fix endpoint-test, adjust proposer…

fff7059

… poll interval

tynes approved these changes Aug 24, 2024

View reviewed changes

tynes enabled auto-merge August 24, 2024 20:19

protolambda force-pushed the e2e-dial-improvements branch from 44789b7 to 9b5d78f Compare August 24, 2024 21:22

protolambda force-pushed the e2e-dial-improvements branch from 9b5d78f to 650981b Compare August 24, 2024 21:26

op-e2e: fix test deadlock, fix L1 RPC no-HTTP exception

7a41528

protolambda force-pushed the e2e-dial-improvements branch from 650981b to 7a41528 Compare August 24, 2024 21:30

op-e2e: any RPC for test, HTTP mode only applied to nodes

cf8bc2d

tynes added this pull request to the merge queue Aug 24, 2024

protolambda added 2 commits August 24, 2024 17:41

Merge branch 'develop' into e2e-dial-improvements

a01ac85

op-e2e: fix lint

c4b2404

protolambda removed this pull request from the merge queue due to a manual request Aug 24, 2024

protolambda enabled auto-merge August 24, 2024 23:42

protolambda added this pull request to the merge queue Aug 25, 2024

Merged via the queue into develop with commit 978355d Aug 25, 2024
62 checks passed

protolambda deleted the e2e-dial-improvements branch August 25, 2024 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-e2e: cleanup endpoints and dialing #11594

op-e2e: cleanup endpoints and dialing #11594

protolambda commented Aug 24, 2024 •

edited

Loading

tynes commented Aug 24, 2024

protolambda commented Aug 24, 2024

tynes commented Aug 24, 2024

tynes commented Aug 24, 2024

protolambda commented Aug 24, 2024 •

edited

Loading

protolambda commented Aug 24, 2024

op-e2e: cleanup endpoints and dialing #11594

op-e2e: cleanup endpoints and dialing #11594

Conversation

protolambda commented Aug 24, 2024 • edited Loading

tynes commented Aug 24, 2024

protolambda commented Aug 24, 2024

tynes commented Aug 24, 2024

tynes commented Aug 24, 2024

protolambda commented Aug 24, 2024 • edited Loading

protolambda commented Aug 24, 2024

protolambda commented Aug 24, 2024 •

edited

Loading

protolambda commented Aug 24, 2024 •

edited

Loading