CPU exhaustion with address index HTTP API queries #589

braydonf · 2018-08-25T00:15:38Z

Versions

bcoin v1.0.2

Description

HTTP Endpoints:

/coin/address/:address
/tx/address/:address

Using an address with thousands or millions of coins or transactions will overload these APIs and will not respond with a result (didn't wait to know how long), and will max out the CPU for quiet awhile.

Details

Taking a look at the address index layout:

 *  T[addr-hash][hash] -> dummy (tx by address)
 *  C[addr-hash][hash][index] -> dummy (coin by address)

The txids are sorted lexicographic by the txid/hash. For single address queries, it should be possible to introduce an ?after=<txid>&limit=<number> query argument to be able to jump forward in the query in subsets. Note: Sorting by hash may not be desirable, however that could be handled with additional indexes, or by adding [count] or [height][index] with the block height and transaction index to sort it in that order.

The exposed API only uses a single address, however the code is written to handle multiple addresses. It is problematic to optimize to query subsets with multiple addresses. A query is made to the address index to find matching txids. However the final order isn't known without querying all of the results for each address. To be able to have subsets, that order needs to be known, and be able to skip to various locations quickly. So the entire set of txids is queried each time, and then filling in only a portion of the txids. As this query is repeated often, it would be best if it were already organized on disk and memory, in that order. This optimization can't be made when there is near endless number of combinations of addresses that isn't known ahead of time. A wallet is capable of making these optimizations.

A solution could be to write the associated methods to only handle a single address and then be able to query subsets for larger results. And document to use a wallet API for any multiple addresses query. There may also be ways to use an address index internally to speed up wallet re-scans.

The text was updated successfully, but these errors were encountered:

bucko13 · 2018-08-28T15:47:07Z

Nice! This seems useful and also would go well with the #559 tx pagination issue/PR. Ideally, we can do this without breaking the existing API but if the exposed API isn't making use of the multiple address handling maybe it's ok and this fix would be mostly around the update to how these are stored in the db. Seems related to the updates proposed in #585.

braydonf · 2018-08-28T17:56:45Z

For changes around how data is stored in the DB, it would be better to organize and sort it in the way it's going to be queried often, chronological order is generally the order. So including a count, or height|index, as mentioned, would be an improvement there. It would be good to have a use case for it, and that could be with wallet rescans.

Another option would be to deprecate these endpoints in favor of using the equivalent wallet API with imported addresses. A single address could be used for a wallet for that use case, and the API would be similar between single and multiple address wallets, as well as calculate balances and etcetera.

Having an address index sorted chronologically would support the use case of using the index for wallet rescans, a use case as mentioned earlier, avoiding the need to read thousands of blocks during a wallet rescan. This may also be related to BIP157/BIP158 and the use of filters for blocks there, as I think there is a similar function.

braydonf · 2018-10-25T03:00:33Z

Something also worth considering, BIP157 filters may be very useful aiding wallet rescans also, without an address index.

This could be useful for performing a wallet rescan on a pruned node, where the filters are checked for which blocks are relevant and only those are downloaded and then eventually pruned again.

nodech · 2018-11-15T14:40:40Z

Will be nice to have general pagination for indexers in general. Currently indexers are just similar to what TXDB was doing, but if we could provide some basic structure for pagination in indexers, I believe many other indexers will be able to easily add endpoints that don't cause CPU Exhaustion.

Like: #605
@tuxcanfly

braydonf · 2018-11-15T17:08:33Z

Yep, with a change to add count to the db keys here, it will sort the transactions in the order they are queried, and making it possible to jump to various positions within the larger set quickly.

Example:

 *  T[addr-hash][tx-count][hash] -> dummy (tx by address)
 *  C[addr-hash][coin-count][hash][index] -> dummy (coin by address)

To support quering using an after parameter (e.g. after=txhash), an additional index can be added, to find the count for a transaction for an address.

Example:

 *  X[addr-hash][hash] -> tx-count
 *  X[addr-hash][hash] -> coin-count

However, calculating the balance for the address would require summing up all the coins, and then staying syned with all the changes. It would be inefficient to periodically scan all coins to find those that have been spent to calculate the balance. Importing a single address (and multiple) as a watch only wallet, would provide the balance and pagination of history, which is another option.

It may also be possible to add a balance the index specifically for single address use, so that all coins don't need to be queried for that case.

Example:

 *  B[addr-hash] -> balance

Another option which is to query for the deltas of coins through ranges of blocks, so that clients syncing would only ever need to query for the history they don't already have. The query would include a summary of coins spent and coins newly available. However there may be some downsides to how reorgs are handled on the client in this case.

Would need to dig into the code more to see what would be the best option.

nodech · 2018-11-15T17:52:29Z

Is after=txhash useful? I would argue block height or time is better indexing option than txhash.
Yeah, we can have balance (or indexer can implement it that way) -- We will just wrap this in two methods that will handle balance calculation and additional database writes that's necessary.

braydonf · 2018-11-15T18:02:54Z

Using block height can be problematic because there can be many transactions that may be greater than the page size. It could be an entry point though, similar to how time is handled in #605 where it points to the count and uses that to iterate, and then the after parameter with the txid to anchor each page. However, that would be provided in #605 with a watch only wallet (still need to look more into that code to verify).

nodech · 2018-11-16T12:28:50Z

I think there's value having tx and address indexer, but they should provide same pagination API as wallet should do, like you described it(at least address indexer). Having to import all coins I am interested in watching sometimes does not make sense, also you have to do reindexing if you want to recover all the history.

braydonf · 2018-11-19T00:25:08Z

Wallet rescanning/resetting/reindexing to recover the history could be accelerated using an address index (so there may still be a use case there), so as far as speed of importing, there wouldn't be a difference. So I think it comes down to use cases for the API, here are some examples:

Creating and getting history for a watch-only wallet for a single address with many txs:

PUT /wallet/:id
POST /wallet/:id/import
GET /wallet/:id/balance
GET /wallet/:id/tx/history

For multiple addresses, all with many of txs:

PUT /wallet/:id
POST /wallet/:id/import
POST /wallet/:id/import
POST /wallet/:id/import
GET /wallet/:id/balance
GET /wallet/:id/tx/history

An HD wallet with many txs:

PUT /wallet/:id
GET /wallet/:id/balance
GET /wallet/:id/tx/history

All of these examples could work for full, pruned and spv nodes. Applications then can query the API the same way regardless with how much historical data is stored. The speed at which it would recovery history would depend:

For a full node with an address index, it could be very fast, as it would be able to seek specific transactions from blocks rather than it being necessary to read full blocks at all.
For a pruned node, assuming there is a blockfilter index (see Add support for neutrino #370), it would be slower as blocks would need to be requested from peers, and transactions scanned.
Similarly, for spv, transactions would need to be requested from peers.

For full archival nodes, there could be automation that would automatically add all addresses to a wallet, for example the API could be:

GET /wallet/*/tx/address/:address

That would be the equivalent of querying a full archive node with:

GET /tx/address/:address

nodech · 2018-11-19T09:45:17Z

Sure, it can work :) I was talking about addr indexer, instead of removing it we can fix cpu exhaustion

braydonf · 2018-11-19T16:01:50Z

Yeah, I agree. There is a use case, even without the /tx/address/:address endpoint, and CPU exhaustion would still an issue in those cases.

pinheadmz · 2019-01-21T22:08:53Z

@braydonf is this addressed in #605 ?

braydonf · 2019-01-28T12:13:47Z

@pinheadmz It's not addressed in #605, this is specifically for the node API w/ address index, unrelated to the wallet.

braydonf · 2019-03-25T21:17:40Z

So I think there is a case for the address index for transactions, however having looked into this more, I think it should also be worth considering removing the address index of coins.

For a wallet building on the address index, all that is necessary are the transactions, those can then be used to generate a coins database. For example this is how bwallet functions. This is preferrable as only the latest transactions are necessary. In the case with coins, the coins can be removed from the result at any point, so a wallet needs to request all coins every time to know which coins are no longer available.

braydonf mentioned this issue Sep 22, 2018

http: use Address for bulk tx and utxo reads #612

Closed

braydonf changed the title ~~Performance of address index HTTP API queries~~ CPU exhaustion with address index HTTP API queries Sep 24, 2018

boymanjor mentioned this issue Sep 27, 2018

http: include warning of deprecation for bulk reads handshake-org/hsd#66

Merged

pinheadmz added the stability / efficiency Denial of service, better resource usage label Jan 21, 2019

This was referenced Mar 21, 2019

empty response retrieving thousands of transactions for address #733

Closed

missing transactions for address index #734

Closed

braydonf added the indexer label Mar 26, 2019

braydonf added the memory Memory issues label Apr 11, 2019

braydonf mentioned this issue Apr 12, 2019

Indexer fixes and improvements #758

Merged

braydonf closed this as completed in #758 May 16, 2019

pinheadmz mentioned this issue Oct 12, 2019

/tx/address/:address api request not responding #874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU exhaustion with address index HTTP API queries #589

CPU exhaustion with address index HTTP API queries #589

braydonf commented Aug 25, 2018 •

edited

Loading

bucko13 commented Aug 28, 2018

braydonf commented Aug 28, 2018 •

edited

Loading

braydonf commented Oct 25, 2018 •

edited

Loading

nodech commented Nov 15, 2018

braydonf commented Nov 15, 2018 •

edited

Loading

nodech commented Nov 15, 2018

braydonf commented Nov 15, 2018

nodech commented Nov 16, 2018 •

edited

Loading

braydonf commented Nov 19, 2018 •

edited

Loading

nodech commented Nov 19, 2018

braydonf commented Nov 19, 2018

pinheadmz commented Jan 21, 2019

braydonf commented Jan 28, 2019

braydonf commented Mar 25, 2019

CPU exhaustion with address index HTTP API queries #589

CPU exhaustion with address index HTTP API queries #589

Comments

braydonf commented Aug 25, 2018 • edited Loading

bucko13 commented Aug 28, 2018

braydonf commented Aug 28, 2018 • edited Loading

braydonf commented Oct 25, 2018 • edited Loading

nodech commented Nov 15, 2018

braydonf commented Nov 15, 2018 • edited Loading

nodech commented Nov 15, 2018

braydonf commented Nov 15, 2018

nodech commented Nov 16, 2018 • edited Loading

braydonf commented Nov 19, 2018 • edited Loading

nodech commented Nov 19, 2018

braydonf commented Nov 19, 2018

pinheadmz commented Jan 21, 2019

braydonf commented Jan 28, 2019

braydonf commented Mar 25, 2019

braydonf commented Aug 25, 2018 •

edited

Loading

braydonf commented Aug 28, 2018 •

edited

Loading

braydonf commented Oct 25, 2018 •

edited

Loading

braydonf commented Nov 15, 2018 •

edited

Loading

nodech commented Nov 16, 2018 •

edited

Loading

braydonf commented Nov 19, 2018 •

edited

Loading