Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive bound handling on multiindex - failing test #429

Closed
wants to merge 1 commit into from

Conversation

hashedone
Copy link
Contributor

Prove test, the simplest I managed to provide.

The test scenario logic:

Have a multi index map indexed with some arbitrary index (&str in the example), and mapped to some value (u64 in the test). The map can be however indexed by value itself - the secondary multiindex is created for this case. On range call I want to take all items with secondary items up to particular value, but it won't work - as it requires to provide second part of key, which is also a part of a bound. In practice if I call range this way, I don't know the "upper bound" of the primary index (it would be basically the highest index in the map), and it makes Bound::inclusive unuseable with many multi-index cases. Obviously in this very test I can fix it by changing vec![] to b"one".to_owned(), or even b"z".to_owned(), but it is a trap, as there is no guarantee in real world, that there is no item of index "Zumba" which should be inlcuded in range.

Currently the only workaround I see is never use Bound::inclusive for multi-index, and always use Bound::exclusive(key+1) but it is completely unintuitive, and even possibly not easy for more complex multi-keys.

@maurolacy
Copy link
Contributor

maurolacy commented Sep 17, 2021

No, this is correct, because you're setting the max value of the full key.

(1, "one") comes lexicographically after (1, vec![]) (i.e. (1, "")).

To make it work, you need to set a value greater than the (current) full key, either inclusively or exclusively, i.e. (1, "z"), (2, ""). Or (1, b"\xff"), but that doesn't work (yet), because of joined_key limitations.

Also notice that this is limited / poor, as to include the full range inclusively in all cases it is not enough with setting "z". You would have to build some string with the max possible length of the key, will all "z"s (b"\xff"s, actually.)

Now, if you want to keep the first element of the index fixed, it's just simple / better to use a prefix:

            let items: Vec<_> = map
                .idx
                .secondary
                .prefix(U64Key::new(1))
                .range(
                    &store,
                    None,
                    None,
                    Order::Ascending,
                )
                .collect::<Result<_, _>>()
                .unwrap();

Comment on lines +736 to +737
None,
Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None,
Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),
Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),
None,

or

Suggested change
None,
Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),
None,
Some(Bound::inclusive((U64Key::new(2), vec![]).joined_key())),

are both correct. And "intuitive".

@hashedone
Copy link
Contributor Author

I think you misunderstood me - I exactly understand why it works such way. The problem is, that I find it faulty, that there is no proper tool to set exclusive upper bound for the secondary key while iterating over range. In most cases if I need this kind of upper bound I have no idea about the upper bound of primary index, as if I know - I iterate over secondary index, because I know nothing about the primary one and I don't care about it. And I cannot fix the prefix of the index - I have no knowledge about primary key at all, I want to fully disregard it.

Basically my point is: I would really appreciate atom which behaves like MultiIndex, but doesn't include the primary key in it.

Also the second point is: there is no usecase for using Bound::inclusive for multidex upper bound, because of how it behaves (you need to know your primary key anyway to properly use it, or at least its upper range), which is very confusing for someone who didn't spend time on its internal implementation.

@maurolacy
Copy link
Contributor

maurolacy commented Sep 17, 2021

I think you misunderstood me - I exactly understand why it works such way. The problem is, that I find it faulty, that there is no proper tool to set exclusive upper bound for the secondary key while iterating over range. In most cases if I need this kind of upper bound I have no idea about the upper bound of primary index, as if I know - I iterate over secondary index, because I know nothing about the primary one and I don't care about it. And I cannot fix the prefix of the index - I have no knowledge about primary key at all, I want to fully disregard it.

Basically my point is: I would really appreciate atom which behaves like MultiIndex, but doesn't include the primary key in it.

I agree it's a limited / poor implementation. I've said it above. But it's a working one (as far as I can tell). One thing is for it to be poor / limited / unintuitive, and another is for it to be broken.

That said, I would very much like to get rid of the pk as part of the key. I tried and failed to do it in the past. See #211. Finally after a lot of work and attempts we settled with the currrent impl.

The pk should be used internally. But that requires using some generics magic (something like nested generics / types) that I don't know how to do yet. Or even, if it's possible to do.

Also the second point is: there is no usecase for using Bound::inclusive for multidex upper bound, because of how it behaves (you need to know your primary key anyway to properly use it, or at least its upper range), which is very confusing for someone who didn't spend time on its internal implementation.

Perhaps a helper like max_value() / max() for these types (Addr, etc.) would improve on that.

@maurolacy
Copy link
Contributor

Now that we'll have deserialized keys (see #432 ) we can probably use them in the MultiIndex too.

We wouldn"t handle them fully internally, but at least there will be a way to avoid dealing with their internal representation.

@ethanfrey
Copy link
Member

After reflecting on this a bit more, I agree this is an error calling the wrong function. The big hint is you using joined_key to generate bounds, which means you should be responsible for all the key mangling.

range Bounds work properly when you give actual keys you want to iterate on. This also makes sense for secondary indexes during pagination - where you provide the secondary index and the last Addr/Vec returned for that, and it can continue with the same secondary index and any following Addr/Vec.

However, this PR demonstrates that we have a missing method in the API: prefix_range.

If we have (A, B) and want to iterate over a range of A, then we would need to call such a function. Happy if you want to implement this. (Note the underlying Storage always considers the start as inclusive and end as exclusive - but please double-check that with how we implement the Bound -> Vec for the range queries).

My reasoning for bounds, when translating them to &[u8] for the underlying storage are:

  • None -> None
  • start, Bound::inclusive(x) -> x
  • start, Bound::exclusive(x) -> x, last byte incremented by 1
  • end, Bound::inclusive(x) -> x, last byte incremented by 1
  • end, Bound:: exclusive(x) -> x

actually, that is only for Order::Ascending. Maybe for Order::Descending these must change??

Happy if you want to try implementing such a function, so we could just:

let items: Vec<_> = map.idx.secondary.range(&store, None, Some(Bound::inclusive(U64Key::new(1))), Order::Ascending).collect();

@ethanfrey
Copy link
Member

Now that we'll have deserialized keys (see #432 ) we can probably use them in the MultiIndex too.

I agree this will make things nicer for the caller and would be a good issue. But is a bit orthogonal issue to the issue with bounds.

@ethanfrey
Copy link
Member

This is resolved with #446

@ethanfrey ethanfrey closed this Sep 21, 2021
@ueco-jb ueco-jb deleted the multi-index-range-failure branch December 22, 2021 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants