Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-e2e: cleanup endpoints and dialing #11594

Merged
merged 6 commits into from
Aug 25, 2024
Merged

Conversation

protolambda
Copy link
Contributor

@protolambda protolambda commented Aug 24, 2024

Description

This cleans up the endpoint-dialing in op-e2e:

  • each service provides an endpoint.RPC (interface!), not a string.
  • the e2e System caches client RPC bindings
  • the e2e System no longer pre-dials clients (unused clients = slow down of e2e)
  • the e2e System now will close all cached clients on shutdown
  • the GethInstance type is moved into the geth package now, to bundle the node and backend parts of in-process geth.
  • the Opnode type now encapsulates the op-e2e lifecycle idiosyncrasies of op-node.
  • the above two types now provides their RPC endpoints through the endpoint.RPC type, allowing them to specify both the RPC and HTTP, as well as in-process attach (possible with geth, soon also op-node and other services).
  • no tests manually dial an RPC anymore, unless really necessary. This also steers many tests away from not closing their RPC clients properly.
  • the endpoint type selection now always respects the global preference of using HTTP type RPC or not.

And I removed the defer sys.Close() statements, since the System already registers a sys.Close on test-cleanup.

In the future I hope we can refactor the op-batcher/op-proposer/op-node to all directly use the endpoint package, to configure their service. Then we can pass in the interface, rather than the string, and enjoy faster op-e2e testing (no system dials, websocket overhead, etc).

Tests

Test infra refactor, no new features that aren't already covered.

@protolambda protolambda requested a review from a team as a code owner August 24, 2024 03:50
@protolambda protolambda requested a review from tynes August 24, 2024 03:50
@tynes
Copy link
Contributor

tynes commented Aug 24, 2024

Refactor looks good to me but CI is hanging for some reason it seems

@protolambda
Copy link
Contributor Author

@tynes I accidentally quoted one client name. And the uptime check for closed op-geth was borked due to the in-process RPC not going down like HTTP/WS endpoint would. Also adjusted the proposer poll interval from 50ms to 500ms; at a 6 second proposer interval it's insane to poll 120 times per proposal, hope that improves op-e2e performance a bit.

@tynes
Copy link
Contributor

tynes commented Aug 24, 2024

Changes in fff7059 look good to me

@tynes tynes enabled auto-merge August 24, 2024 20:19
@tynes
Copy link
Contributor

tynes commented Aug 24, 2024

Looks like its still hanging someplace

@protolambda
Copy link
Contributor Author

protolambda commented Aug 24, 2024

@tynes sorry about that, it was hanging in only one run mode, when using HTTP. The L1 endpoint that is initialized during the System Start apparently needs to support subscriptions, an exception to the HTTP RPC rule. This caused the tests to fail, but one test would then deadlock when it failed before batcher startup, as it was waiting for the batcher to report back before allowing the test to shut down fully. Tests should be fixed now.

Edit: forgot to remove some debug logs. Fixed. And also made the artifacts FS test run in parallel, I noticed it was not running in parallel like the other tests.
Edit 2: typo

@protolambda
Copy link
Contributor Author

Ugh, more tests failing, also in http mode, when RPC subscriptions are used in other test settings. Think I'll change it so that the HTTP-only part only applies to the node setup, not to the clients used by tests.

@tynes tynes added this pull request to the merge queue Aug 24, 2024
@protolambda protolambda removed this pull request from the merge queue due to a manual request Aug 24, 2024
@protolambda protolambda added this pull request to the merge queue Aug 25, 2024
Merged via the queue into develop with commit 978355d Aug 25, 2024
62 checks passed
@protolambda protolambda deleted the e2e-dial-improvements branch August 25, 2024 00:19
samlaf pushed a commit to samlaf/optimism that referenced this pull request Nov 10, 2024
* op-e2e: cleanup endpoints and dialing

* op-e2e: fix accidental wrong dial, fix endpoint-test, adjust proposer poll interval

* op-e2e: fix test deadlock, fix L1 RPC no-HTTP exception

* op-e2e: any RPC for test, HTTP mode only applied to nodes

* op-e2e: fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants