-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autoclean gains SUPER POWERS (and other optimizations) #5594
autoclean gains SUPER POWERS (and other optimizations) #5594
Conversation
bf1dbc6
to
7dc0cca
Compare
7dc0cca
to
0b15133
Compare
Rebased.... |
0b15133
to
4e32fef
Compare
4e32fef
to
7bc8854
Compare
Fixed up rust generated stuff, and also updated delpay docs (and added missing request schema!). |
7bc8854
to
59a205e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubts regarding the RPC batching support, which I don't see how it can work without jeopardizing correctness.
59a205e
to
528b688
Compare
Rebased on master, and:
|
528b688
to
4226be8
Compare
4226be8
to
1de8bca
Compare
Ack 1de8bca |
Extended timeouts in test_autoclean: CI is slow esp under valgrind! Ack e134332 |
e134332
to
441e1ba
Compare
b77f1fd
to
282c8b2
Compare
Trivial rebase for generated code clash (contrib/pyln-testing/pyln/testing/node_pb2.py). Ack 282c8b2 |
This can happen, and in fact does below in our test_autoclean_once test where we update the datastore, and return from the cmd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
contrib/giantnode.py shows we're spending a lot of time looking up payments by payment_hash: sendpays does it to see if we've already paid, and bookkeeper does it in listsendpays: ``` - 94.52% 0.00% lightningd lightningd [.] read_json - 94.52% read_json - 94.48% parse_request - 94.46% plugin_hook_call_rpc_command - plugin_hook_call_ - rpc_command_hook_final - 94.46% command_exec - 49.08% json_sendpay - 49.01% send_payment - 48.86% send_payment_core - 48.84% wallet_payment_list - 48.80% db_step + db_sqlite3_step - 45.38% json_listsendpays - 45.36% wallet_payment_list - 45.30% db_step + 45.30% db_sqlite3_step ``` This doesn't actually make much of a difference, so see next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Filtering by status is rare, so we can do it in the caller; just let sqlite3 filter by payment_hash. With ~650,000 payments in db: Before: ``` 129/300000 complete 5.60/sec (33078 invs, 169 pays, 0 retries) in 30 seconds. 19 hours-14 hours remaining. 201/300000 complete 7.20/sec (43519 invs, 241 pays, 0 retries) in 40 seconds. 16 hours-11 hours remaining. 257/300000 complete 5.60/sec (54568 invs, 289 pays, 0 retries) in 50 seconds. 16 hours-14 hours remaining. 305/300000 complete 4.80/sec (65772 invs, 337 pays, 0 retries) in 60 seconds. 16 hours-17 hours remaining. 361/300000 complete 5.60/sec (75875 invs, 401 pays, 0 retries) in 70 seconds. 16 hours-14 hours remaining. ``` After: ``` 760/300000 complete 40.00/sec (19955 invs, 824 pays, 0 retries) in 20 seconds. 2 hours-2 hours remaining. 1176/300000 complete 41.60/sec (30082 invs, 1224 pays, 0 retries) in 30 seconds. 2 hours-119 minutes remaining. 1584/300000 complete 40.80/sec (40224 invs, 1640 pays, 0 retries) in 40 seconds. 2 hours-2 hours remaining. 1984/300000 complete 40.00/sec (49938 invs, 2048 pays, 0 retries) in 50 seconds. 2 hours-2 hours remaining. ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This one directly contains the scids of the channels involved, not references, so can outlive the channels. As a side-effect, however, it now never lists `payment_hash`. Having it listed (via join) is not possible as it is a *string* in the channels table, and difficult anyway because of channel aliases. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…_id_str, add db_bind_scid. Although it's deprecated already (because it stores as string), it's better to make the name explicit. And create a new helper which stores as BIGINT. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Without helpers, we were using the u64 functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normally, we'd use the delete_columns function to remove the old `short_channel_id` string field, *but* we can't do that for sqlite, as there are other tables with references to it. So add a FIXME to do it once everyone has upgraded to an sqlite3 which has native support for column deletion. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…s table. And remove the now-unused string-based helper functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And document that we never know payment_hash. Changelog-Added: JSON-RPC: `listforwards` now shows `in_htlc_id` and `out_htlc_id` Changelog-Changed: JSON-RPC: `listforwards` now never shows `payment_hash`; use `listhtlcs`.
Using `listfowards` for this wrong; expose this directly if people care (and unlike listforwards, which could be deleted, we have to remember these while the channel is still open!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: JSON-RPC: `listhtlcs` new command to list all known HTLCS.
Changelog-Added: JSON-RPC: `delforward` command to delete listforwards entries. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And take the opportunity to rename l0 and l1 in the tests to the more natural l1 l2 (since we add l3). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's more natural: we will eventually support dynamic config variables, so this will be quite nice. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Plugins: `autoclean` can now delete old forwards, payments, and invoices automatically.
And remove deprecated autocleaninvoice docs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Plugins: `autoclean-status` command to see what autoclean is doing. Changelog-Deprecated: JSON-RPC: `autocleaninvoice` (use option `autoclean-expiredinvoices-age`)
Changelog-Added: Plugins: `autoclean-once` command for a single cleanup. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They slow down benchmarking, which is kind of unfair! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
autoclean was using 98% of its time in memmove; we should simply keep an offset, and memmove when it's empty. And also, only memmove the used region, not the entire buffer! Running on product of giantnodes.py: $ time l1-cli autoclean-once failedpays 1 { "autoclean": { "failedpays": { "cleaned": 26895, "uncleaned": 1000000 } } } Before: real 20m46.579s user 0m0.000s sys 0m0.001s After: real 2m10.568s user 0m0.001s sys 0m0.000s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use linked list, not an array. ``` + 97.89% 0.01% autoclean autoclean [.] next_plan - 97.08% 0.01% autoclean autoclean [.] json_stream_output_write - 97.06% json_stream_output_write - 84.29% ld_stream_complete - 83.87% tal_arr_remove_ + 83.71% __memcpy_avx_unaligned_erms (inlined) + 12.76% rpc_stream_complete + 96.59% 0.03% autoclean autoclean [.] tal_arr_remove_ + 96.48% 0.00% autoclean libc.so.6 [.] __memcpy_avx_unaligned_erms (inlined) + 94.98% 94.98% autoclean libc.so.6 [.] __memmove_avx_unaligned_erms + 84.29% 0.01% autoclean autoclean [.] ld_stream_complete + 12.76% 0.00% autoclean autoclean [.] rpc_stream_complete ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This shows the benefits of batching when being slammed by autoclean: Before: ``` plugin-autoclean: last 1000 deletes took 6699022 nsec each plugin-autoclean: last 1000 deletes took 6734025 nsec each plugin-autoclean: last 1000 deletes took 6681189 nsec each plugin-autoclean: last 1000 deletes took 6597148 nsec each plugin-autoclean: last 1000 deletes took 6637085 nsec each plugin-autoclean: last 1000 deletes took 6801425 nsec each plugin-autoclean: last 1000 deletes took 6788572 nsec each plugin-autoclean: last 1000 deletes took 6603641 nsec each plugin-autoclean: last 1000 deletes took 6642947 nsec each plugin-autoclean: last 1000 deletes took 6590495 nsec each plugin-autoclean: last 1000 deletes took 6695076 nsec each plugin-autoclean: last 1000 deletes took 6477981 nsec each ``` After: ``` plugin-autoclean: last 1000 deletes took 342764 nsec each plugin-autoclean: last 1000 deletes took 375031 nsec each plugin-autoclean: last 1000 deletes took 357564 nsec each plugin-autoclean: last 1000 deletes took 381581 nsec each plugin-autoclean: last 1000 deletes took 337989 nsec each plugin-autoclean: last 1000 deletes took 329391 nsec each plugin-autoclean: last 1000 deletes took 328322 nsec each plugin-autoclean: last 1000 deletes took 372810 nsec each plugin-autoclean: last 1000 deletes took 351228 nsec each plugin-autoclean: last 1000 deletes took 413885 nsec each plugin-autoclean: last 1000 deletes took 348317 nsec each ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previous commit was a hack which *always* batched where possible, this is a more sophisticated opt-in varaint, with a timeout sanity check. Final performance for cleaning up 1M pays/forwards/invoices: ``` $ time l1-cli autoclean-once succeededpays 1 { "autoclean": { "succeededpays": { "cleaned": 1000000, "uncleaned": 26895 } } } real 6m9.828s user 0m0.003s sys 0m0.001s $ time l2-cli autoclean-once succeededforwards 1 { "autoclean": { "succeededforwards": { "cleaned": 1000000, "uncleaned": 40 } } } real 3m20.789s user 0m0.004s sys 0m0.001s $ time l3-cli autoclean-once paidinvoices 1 { "autoclean": { "paidinvoices": { "cleaned": 1000000, "uncleaned": 0 } } } real 6m47.941s user 0m0.001s sys 0m0.000s ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: JSON-RPC: `batching` command to allow database transactions to cross multiple back-to-back JSON commands.
Schema was too loose since we did deprecations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is actually what the autoclean plugin wants, especially since you can't otherwise delete a payment which has failed then succeeded. But insist on neither or both being specified, at least for now. Changelog-Added: JSON-RPC: `delpay` takes optional `groupid` and `partid` parameters to specify exactly what payment to delete. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
``` "label": "db/db_postgres.c:294:char[]", "backtrace": [ "ccan/ccan/tal/tal.c:442 (tal_alloc_)", "ccan/ccan/tal/tal.c:471 (tal_alloc_arr_)", "ccan/ccan/tal/str/str.c:91 (tal_vfmt_)", "ccan/ccan/tal/str/str.c:44 (tal_fmt_)", "db/db_postgres.c:294 (db_postgres_delete_columns)", "wallet/db.c:1546 (migrate_payments_scids_as_integers)", "wallet/db.c:986 (db_migrate)", "wallet/db.c:1019 (db_setup)", "wallet/wallet.c:101 (wallet_new)", "lightningd/lightningd.c:1023 (main)", "../csu/libc-start.c:308 (__libc_start_main)" ], "parents": [ "db/utils.c:317:struct db", "lightningd/lightningd.c:107:struct lightningd" ] ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…r migration Suggested-by: @cdecker Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CI is really slow: it sees all three expire at once. But making the timeouts too long is painful in non-VALGRIND, so I ended up making it conditional. ``` # First it expires. wait_for(lambda: only_one(l3.rpc.listinvoices('inv1')['invoices'])['status'] == 'expired') # Now will get autocleaned wait_for(lambda: l3.rpc.listinvoices('inv1')['invoices'] == []) > assert l3.rpc.autoclean_status()['autoclean']['expiredinvoices']['cleaned'] == 1 E assert 3 == 1 tests/test_plugin.py:2975: AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1b6e5ae
to
2ac334d
Compare
ACK 2ac334d |
autoclean is generalized to delete old listforwards/listpays/listinvoices (not just expired invoices). And much optimization is done, so that it takes only 3-6 minutes to delete 1M entries.
This should help with larger nodes, in particular.
There are some new APIs in here! listhtlcs and delforward in particular...