-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU exhaustion with address index HTTP API queries #589
Comments
Nice! This seems useful and also would go well with the #559 tx pagination issue/PR. Ideally, we can do this without breaking the existing API but if the exposed API isn't making use of the multiple address handling maybe it's ok and this fix would be mostly around the update to how these are stored in the db. Seems related to the updates proposed in #585. |
For changes around how data is stored in the DB, it would be better to organize and sort it in the way it's going to be queried often, chronological order is generally the order. So including a Another option would be to deprecate these endpoints in favor of using the equivalent wallet API with imported addresses. A single address could be used for a wallet for that use case, and the API would be similar between single and multiple address wallets, as well as calculate balances and etcetera. Having an address index sorted chronologically would support the use case of using the index for wallet rescans, a use case as mentioned earlier, avoiding the need to read thousands of blocks during a wallet rescan. This may also be related to BIP157/BIP158 and the use of filters for blocks there, as I think there is a similar function. |
Something also worth considering, BIP157 filters may be very useful aiding wallet rescans also, without an address index. This could be useful for performing a wallet rescan on a pruned node, where the filters are checked for which blocks are relevant and only those are downloaded and then eventually pruned again. |
Will be nice to have general pagination for indexers in general. Currently indexers are just similar to what TXDB was doing, but if we could provide some basic structure for pagination in indexers, I believe many other indexers will be able to easily add endpoints that don't cause CPU Exhaustion. Like: #605 |
Yep, with a change to add Example:
To support quering using an Example:
However, calculating the balance for the address would require summing up all the coins, and then staying syned with all the changes. It would be inefficient to periodically scan all coins to find those that have been spent to calculate the balance. Importing a single address (and multiple) as a watch only wallet, would provide the balance and pagination of history, which is another option. It may also be possible to add a balance the index specifically for single address use, so that all coins don't need to be queried for that case. Example:
Another option which is to query for the deltas of coins through ranges of blocks, so that clients syncing would only ever need to query for the history they don't already have. The query would include a summary of coins spent and coins newly available. However there may be some downsides to how reorgs are handled on the client in this case. Would need to dig into the code more to see what would be the best option. |
Is |
Using block height can be problematic because there can be many transactions that may be greater than the page size. It could be an entry point though, similar to how time is handled in #605 where it points to the count and uses that to iterate, and then the |
I think there's value having tx and address indexer, but they should provide same pagination API as wallet should do, like you described it(at least address indexer). Having to import all coins I am interested in watching sometimes does not make sense, also you have to do reindexing if you want to recover all the history. |
Wallet rescanning/resetting/reindexing to recover the history could be accelerated using an address index (so there may still be a use case there), so as far as speed of importing, there wouldn't be a difference. So I think it comes down to use cases for the API, here are some examples: Creating and getting history for a watch-only wallet for a single address with many txs:
For multiple addresses, all with many of txs:
An HD wallet with many txs:
All of these examples could work for full, pruned and spv nodes. Applications then can query the API the same way regardless with how much historical data is stored. The speed at which it would recovery history would depend:
For full archival nodes, there could be automation that would automatically add all addresses to a wallet, for example the API could be:
That would be the equivalent of querying a full archive node with:
|
Sure, it can work :) I was talking about addr indexer, instead of removing it we can fix cpu exhaustion |
Yeah, I agree. There is a use case, even without the |
@pinheadmz It's not addressed in #605, this is specifically for the node API w/ address index, unrelated to the wallet. |
So I think there is a case for the address index for transactions, however having looked into this more, I think it should also be worth considering removing the address index of coins. For a wallet building on the address index, all that is necessary are the transactions, those can then be used to generate a coins database. For example this is how |
Versions
bcoin v1.0.2
Description
HTTP Endpoints:
/coin/address/:address
/tx/address/:address
Using an address with thousands or millions of coins or transactions will overload these APIs and will not respond with a result (didn't wait to know how long), and will max out the CPU for quiet awhile.
Details
Taking a look at the address index layout:
The txids are sorted lexicographic by the txid/hash. For single address queries, it should be possible to introduce an
?after=<txid>&limit=<number>
query argument to be able to jump forward in the query in subsets. Note: Sorting by hash may not be desirable, however that could be handled with additional indexes, or by adding[count]
or[height][index]
with the block height and transaction index to sort it in that order.The exposed API only uses a single address, however the code is written to handle multiple addresses. It is problematic to optimize to query subsets with multiple addresses. A query is made to the address index to find matching txids. However the final order isn't known without querying all of the results for each address. To be able to have subsets, that order needs to be known, and be able to skip to various locations quickly. So the entire set of txids is queried each time, and then filling in only a portion of the txids. As this query is repeated often, it would be best if it were already organized on disk and memory, in that order. This optimization can't be made when there is near endless number of combinations of addresses that isn't known ahead of time. A wallet is capable of making these optimizations.
A solution could be to write the associated methods to only handle a single address and then be able to query subsets for larger results. And document to use a wallet API for any multiple addresses query. There may also be ways to use an address index internally to speed up wallet re-scans.
The text was updated successfully, but these errors were encountered: