-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TrieDBRawIterator::prefix_then_seek
#190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing this one (I did not go through the fuzz code yet but prefer to approve asap).
Refactor the rest of the fuzz tests to use the arbitrary crate instead of manually generating the fuzz input data.
sure is desirable.
Cut down on the genericity and simplify everything.
we should fork to have substrate only crate, this is something we want for a while.
my opinion was to wait for removing rocksdb support (which will also allow simplifying a lot), but probably a better idea to
do sooner.
@@ -0,0 +1,725 @@ | |||
// Copyright (C) Parity Technologies (UK) Ltd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove test-support/reference-trie/src/substrate-like.rs (can be done later, might be some slight difference and would be better to get fix in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we can remove that one if we're going to have the real thing. I just didn't want to needlessly bloat up this PR with more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some slight modif in substrate_like (iirc only having the threshold size as a parameter), but generally would make sense to use the same names as substrate as in your file.
Like I said could be done later.
return Ok(()) | ||
} | ||
|
||
if !self.seek(db, prefix)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fix is to seek in two steps (and avoid doing it twice if prefix == seek actually just using prefix) with second seek that cannot go outside the prefix due to prechecks.
Indeed my code was broken on empty prefix, seek going after the prefix and trail being clean as if it is into the prefix (original code did not test the return value).
From what I read this bug is not in a host function (only clear_prefix v3 can go in the code path but it is "register_only" cc @bkchr is it possible to still use the function? (in this case I am not sure if it can be an issue since runtime will only call it with a cursor if it is not empty, and having cursor at last position do not make sense as we know it is the last position)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodePlan::Leaf { partial, .. } => { | ||
len += partial.len(); | ||
}, | ||
NodePlan::Extension { partial, .. } => { | ||
len += partial.len(); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheme why are we not adding here len += 1
? What is the difference to the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branches contains a nibble that is not part of its partial key, the index to the child node.
a trie with key (nibbles not bytes) '1,0' and '1,2' will be a branch and two empty partial key children leafs. The partial key of the branch will be 1 and the branch contains 0 and 2 nibble as index to the leafs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay!
if !seek.starts_with(prefix) { | ||
// We're supposed to seek *after* the prefix, | ||
// so just return an empty iterator. | ||
self.trail.clear(); | ||
return Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something that shouldn't be supported at all and the upperlying logic should forbid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what exactly you mean here, but without this check this method can give nonsensical results (testcase_4 will fail).
Basically I wanted this whole thing to work in the following way with no other special/corner cases:
- All of the keys returned by this are guaranteed to start with the
prefix
. - All of the keys returned by this are guaranteed to be equal or bigger lexicographically to
seek
.
And even if you give the method arguments which might seem silly (e.g. like here put a seek that's after the prefix and doesn't start with it) it's supposed to handle it in a way in which it makes sense without returning garbage results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it from the sp_io perspective and api should be I think:
- prefix some prefix
- seek some seek key without the prefix (I mean after or bellow the prefix).
So the first value you access will not be seek
when into the prefix, but prefix ++ seek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it from the sp_io perspective and api should be I think:
prefix some prefix
seek some seek key without the prefix
So the first value you access will not be
seek
when into the prefix, butprefix ++ seek
So if I understand this right, are you suggesting that the subsequent seek after seeking to the prefix should be allowed to "break out" of the prefix and iterate over values which do not start with the prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more an api suggestion that would avoid some corner cases (current code looks plain good to me):
Instead of calling prefix_then_seek(b"prefix", b"prefix_seek")
or prefix_then_seek(b"prefix", b"not_in_prefix")
.
We would call prefix_then_seek(b"prefix", b"_seek")
and could not do the second call at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doing this change is breaking api. I think it would be more interesting at sp_io level also.
(was just a thought I had during lunch and wanted to share, not something that should really be part of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I guess we could do that, but that might be more awkward to use depending on the use case I guess? But then it could be easier in some other cases? And also, in this case I think it could make sense for it to return keys which are also have the prefix stripped, e.g. assume the following code:
loop {
let accounts = fetch_values(ACCOUNTS_PREFIX, cursor, 100);
// ...
cursor = match accounts.last() {
Some(account) => account,
None => break
}
}
For this code to work and be convenient to use it'll have to return the same "type" of key (stripped of the prefix or not stripped) as what the function accepts in cursor
, otherwise you'll have to do awkward concats or have to slice off the prefix yourself.
Also, only tangentially related, but maybe naming wise we could rename prefix
to only_starting_with
and seek
to skip_until_or_after
or something like that? It's a little wordy but it would explain what it does more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah not evident it is better, if it does not sounds evidently better, then it is probably not worth the change.
Also, only tangentially related, but maybe naming wise we could rename prefix to only_starting_with and seek to skip_until_or_after or something like that? It's a little wordy but it would explain what it does more clearly.
Got no opinion for the trie crate, but for frame or host function, clearly would make sense but I think the api is somehow rather different already.
One idea that is hard to convey is that when we do prefix
operation, only nodes starting the prefix or bellow the prefix are accessed (we don t do the seek then iterate until a key that is not under the prefix is reached (avoiding adding the first key out of the prefix in the proof)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking twice could be good for trie to have better naming, but I got pretty use to the current one so my opinion is rather biased.
if !self.seek(db, prefix)? { | ||
// The database doesn't have a key with such a prefix. | ||
self.trail.clear(); | ||
return Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because before we still returned some data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix or?
Because before we still returned some data?
I think @cheme already explained it pretty well here, but basically yes, previously we seeked to a key which might not start with the prefix anymore, and the code just checked that the length is good but didn't check that it's still within the prefix.
Ahh you also came to the same conclusion as me! Good :D |
Okay, since I want to put up a substrate PR with this fix I'll merge this now. If you'd still like for me to make any further changes feel free to leave them and I'll do them in a followup. |
After calling the
TrieDBRawIterator::prefix_then_seek
the iterator is not supposed to return any keys outside of the requested prefix; this unfortunately is not the case, and this PR fixes the issue.Implementation notes
filter
in the fuzz test I added).Potential future work
paste
crate and run each trie layout as their own test instead of running every layout inside of a single test.arbitrary
crate instead of manually generating the fuzz input data.