-
Notifications
You must be signed in to change notification settings - Fork 36.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add address-based index (attempt 4?) #14053
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
2157893
to
196ff3d
Compare
54a8e72
to
4865275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed most of the code, but just skimmed tests. It looks to me like this PR could be merged basically in its current form, so I'm curious if you're intending to make the improvements cited in the PR description above here or in a separate PRs.
I left a few minor suggestions about the code, which you should feel free to ignore. The only changes I would definitely like to see here are:
- adding some python code in
test/functional/
to call the new rpc method - adding a blurb in
doc/release notes.md
to describe the feature and maybe mention some use-cases
src/index/addrindex.cpp
Outdated
std::unique_ptr<AddrIndex> g_addrindex; | ||
|
||
/** | ||
* Access to the addrindex database (indexes/addrindex/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Introduce address index" (3c7cc3c)
Note: new index/addrindex.cpp
, index/addrindex.h
, and test/addrindex_tests.cpp
files in this commit mirror existing index/txindex.cpp
and index/txindex.h
, test/txindex_tests.cpp
files and have some code and comments in common. It can help to diff the addr
files against the tx
files when reviewing this PR.
src/rpc/rawtransaction.cpp
Outdated
"\nArguments:\n" | ||
"1. \"address\" (string, required) The address to search for\n" | ||
"2. \"verbose\" (bool, optional, default = false) If set to false, only returns data for hex-encoded `txid`s. \n" | ||
"3. \"skip\" (numeric, optional, default = 0) If set, the result skips this number of initial values. \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip
and count
probably make sense on an options object instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm glad someone's working on this. Concept ACK.
- The AddrIndex should return information about the outpoints and differentiate between outputs and spends, not just return the raw transactions. In fact, the AddrIndex could just return outpoints, then the client code could use the TxIndex could to fetch the tx bodies. It'd involve a separate lookup though.
- I don't think it's necessary to delete keys from the database when a block is disconnected. There's no harm in leaving it. The higher level methods to search the index can then filter for results that are on the main chain if that's what the client wants. It'd have to do this anyway to avoid races with reorgs and such.
- I'm worried about collisions on address IDs because they are only 64 bits. I can think of three options, 1) use a 32 byte cryptographic hash, 2) use a 20 byte cryptographic hash of the script plus some randomly generated, database-global salt, 3) use a 32- or 64-bit non-cryptographic hash (might as well use Murmur3 or SipHash, not truncated SHA256), then store the full script as the database value to double check against. Option 3 feels best to me.
- What's the purpose of having the first byte of the block hash as the value? It doesn't seem robust nor particularly useful.
Thanks for all the reviews. To answer some of @jimpo's questions: I think that returning just the outpoint is a better idea than the current choice so I'll switch to that and try to incorporate all the other feedback here. |
@marcinja will this allow other wallets like Electrum to utilize this feature (RPC credentials provided, of course) and avoid having to run ElecturmX (https://github.com/spesmilo/electrumx), EPS (https://github.com/chris-belcher/electrum-personal-server) or BWT (https://github.com/shesek/bwt) intermediary software? @ecdsa @SomberNight @chris-belcher @shesek could you provide your input on how this feature might be useful (or not)? |
The main issue AFAIU is that Electrum is using |
@Talkless note that while some Electrum users run their own bitcoind, many do not. Electrum wants to support both use cases, and in fact the suspicion is that most users just use a public server. When using a public server, the client cannot use bitcoind RPC, hence in that case I don't see how the middleware (e.g. ElectrumX) could be avoided. For the own bitcoind use case, maybe the client could have another optional mode of operation where it uses bitcoind RPC directly, which is I guess what you are asking about. For that, an address-index in bitcoind is the main thing missing indeed, however not the only one. For one, the electrum protocol (the client<->"middleware server" connection) has address subscriptions - the client gets a notification when a history of one its addresses changes. We are also planning on soon adding another method into the protocol that allows IMHO there are multiple upsides for having this middleware setup for Electrum:
Nevertheless if someone steps up and contributes patches, this kind of thing could be added.
That sounds much larger than expected.
Instead of having both Also, I expect most users of this index would also want txindex enabled. You might want to consider making address index dependent on txindex. Have you investigated how much space that would save? There would be no need to store Another trick that ElectrumX uses is that only the best chain is indexed. We have a |
Concept ACK (might have already done that) |
I'm concept -0 on this. My primary objection is that I think it's a bad idea for any infrastructure to be built all that relies on having fully-indexed blockchain data available (this also applies to txindex, but we can't just remove support...). However, it seems many people want something like this, and are going to use it anyway. The question is then whether it belongs in the bitcoin-core codebase. Alternative, and more performant presumably, like electrs exist already too, so it isn't exactly impossible to do this elsewhere. Still, given that we now have the indexes infrastructure, it means that things like this are easy to add in a fairly modular way without invading consensus code. So if people really want this, fine. Overall approach comment: I don't think MurmurHash should be used for anything new; there are strictly better hash functions available. I'd suggest SipHash if that's fast enough. |
I'm also a little more negative on having this in Core than I previously was. After working in a few industrial contexts on wallet stuff, it's clear to me that an address index is really only required if you want to implement a block explorer or do chain analysis. For both of these applications, using something like electrs seems sufficient. For personal wallet management, a full address index is not required. I think the origin of some confusion is that things like the Electrum Personal Server have become synonymous with this kind of usage, but in reality a full index is overkill when descriptor-specific rescans can be done once for a historical backfill and then per-block scanning can be done from there on out. I want to point out that this is a nice implementation and good work by @marcinja, but I'm leaning slightly against the inclusion of such an index in Core at this point. |
I agree on this. IMO the only use cases to ever use a full address index are:
1 (instant backup recovery) could be solved with either 2 (a backend for thousands of wallets): out of scope for this project. 3 (explore purposes): I think this is a valid use case. Though adding this PR to Bitcoin Core will lead to many many projects using it in production increasing the traffic in this project and eventually steal time from existing contributors (rebase, maintenance, drag-along) My main fear is that people are going to use this index (a full address index) to use it as an electrum(ish) backend for a handful of wallets. With multiwallet, watch-only-wallets, PSBT, we have all tools to server multiple wallets in a scalable way for external applications. I also think merging this as it is, would be in contradiction to the process- and repository-separation effort. Therefore I'm ~0 (slightly towards NACK) to add this. |
Hi all, thanks for the feedback and review. This was an enjoyable PR to work on and I learned a lot from all your comments. I'm closing this PR because its size probably requires stronger support from contributors to get in. It also seems more clear now that all of the practical use-cases are covered by existing features and some lightweight alternatives (#20664) . I also agree that it would be bad to incentivize using an address index to support an external electrum wallet, when it's not the intended use-case and would cause unnecessary burden on contributors and maintainers in this project, e.g. from users of those wallets wanting new features or updates. |
@marcinja thank you and I hope to see more contributions of this quality from you. |
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak) Pull request description: Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053. ACKs for top commit: jnewbery: utACK 7668db3 promag: ACK 7668db3. Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
8ed2f1e Remove unused includes (Marcin Jachymiak) cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak) Pull request description: Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053. ACKs for top commit: fanquake: ACK 8ed2f1e Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
Adds index to transactions by scriptPubKey. Based off of #2802. Stores hashes of scripts (hashed using Murmurhash3, with a hash seed that is stored in the index database), along with the
COutPoint
for the output which was spent/created, and theCDiskTxPos
for the transaction in which this happened.