Unintuitive bound handling on multiindex - failing test #429

hashedone · 2021-09-17T11:01:42Z

Prove test, the simplest I managed to provide.

The test scenario logic:

Have a multi index map indexed with some arbitrary index (&str in the example), and mapped to some value (u64 in the test). The map can be however indexed by value itself - the secondary multiindex is created for this case. On range call I want to take all items with secondary items up to particular value, but it won't work - as it requires to provide second part of key, which is also a part of a bound. In practice if I call range this way, I don't know the "upper bound" of the primary index (it would be basically the highest index in the map), and it makes Bound::inclusive unuseable with many multi-index cases. Obviously in this very test I can fix it by changing vec![] to b"one".to_owned(), or even b"z".to_owned(), but it is a trap, as there is no guarantee in real world, that there is no item of index "Zumba" which should be inlcuded in range.

Currently the only workaround I see is never use Bound::inclusive for multi-index, and always use Bound::exclusive(key+1) but it is completely unintuitive, and even possibly not easy for more complex multi-keys.

maurolacy · 2021-09-17T14:15:44Z

No, this is correct, because you're setting the max value of the full key.

(1, "one") comes lexicographically after (1, vec![]) (i.e. (1, "")).

To make it work, you need to set a value greater than the (current) full key, either inclusively or exclusively, i.e. (1, "z"), (2, ""). Or (1, b"\xff"), but that doesn't work (yet), because of joined_key limitations.

Also notice that this is limited / poor, as to include the full range inclusively in all cases it is not enough with setting "z". You would have to build some string with the max possible length of the key, will all "z"s (b"\xff"s, actually.)

Now, if you want to keep the first element of the index fixed, it's just simple / better to use a prefix:

            let items: Vec<_> = map
                .idx
                .secondary
                .prefix(U64Key::new(1))
                .range(
                    &store,
                    None,
                    None,
                    Order::Ascending,
                )
                .collect::<Result<_, _>>()
                .unwrap();

maurolacy · 2021-09-17T14:39:50Z

packages/storage-plus/src/indexed_map.rs

+                    None,
+                    Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),


Suggested change

None,

Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),

Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),

None,

or

Suggested change

None,

Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),

None,

Some(Bound::inclusive((U64Key::new(2), vec![]).joined_key())),

are both correct. And "intuitive".

hashedone · 2021-09-17T16:13:28Z

I think you misunderstood me - I exactly understand why it works such way. The problem is, that I find it faulty, that there is no proper tool to set exclusive upper bound for the secondary key while iterating over range. In most cases if I need this kind of upper bound I have no idea about the upper bound of primary index, as if I know - I iterate over secondary index, because I know nothing about the primary one and I don't care about it. And I cannot fix the prefix of the index - I have no knowledge about primary key at all, I want to fully disregard it.

Basically my point is: I would really appreciate atom which behaves like MultiIndex, but doesn't include the primary key in it.

Also the second point is: there is no usecase for using Bound::inclusive for multidex upper bound, because of how it behaves (you need to know your primary key anyway to properly use it, or at least its upper range), which is very confusing for someone who didn't spend time on its internal implementation.

maurolacy · 2021-09-17T16:46:22Z

I think you misunderstood me - I exactly understand why it works such way. The problem is, that I find it faulty, that there is no proper tool to set exclusive upper bound for the secondary key while iterating over range. In most cases if I need this kind of upper bound I have no idea about the upper bound of primary index, as if I know - I iterate over secondary index, because I know nothing about the primary one and I don't care about it. And I cannot fix the prefix of the index - I have no knowledge about primary key at all, I want to fully disregard it.

Basically my point is: I would really appreciate atom which behaves like MultiIndex, but doesn't include the primary key in it.

I agree it's a limited / poor implementation. I've said it above. But it's a working one (as far as I can tell). One thing is for it to be poor / limited / unintuitive, and another is for it to be broken.

That said, I would very much like to get rid of the pk as part of the key. I tried and failed to do it in the past. See #211. Finally after a lot of work and attempts we settled with the currrent impl.

The pk should be used internally. But that requires using some generics magic (something like nested generics / types) that I don't know how to do yet. Or even, if it's possible to do.

Also the second point is: there is no usecase for using Bound::inclusive for multidex upper bound, because of how it behaves (you need to know your primary key anyway to properly use it, or at least its upper range), which is very confusing for someone who didn't spend time on its internal implementation.

Perhaps a helper like max_value() / max() for these types (Addr, etc.) would improve on that.

maurolacy · 2021-09-17T17:41:48Z

Now that we'll have deserialized keys (see #432 ) we can probably use them in the MultiIndex too.

We wouldn"t handle them fully internally, but at least there will be a way to avoid dealing with their internal representation.

ethanfrey · 2021-09-20T07:26:45Z

After reflecting on this a bit more, I agree this is an error calling the wrong function. The big hint is you using joined_key to generate bounds, which means you should be responsible for all the key mangling.

range Bounds work properly when you give actual keys you want to iterate on. This also makes sense for secondary indexes during pagination - where you provide the secondary index and the last Addr/Vec returned for that, and it can continue with the same secondary index and any following Addr/Vec.

However, this PR demonstrates that we have a missing method in the API: prefix_range.

If we have (A, B) and want to iterate over a range of A, then we would need to call such a function. Happy if you want to implement this. (Note the underlying Storage always considers the start as inclusive and end as exclusive - but please double-check that with how we implement the Bound -> Vec for the range queries).

My reasoning for bounds, when translating them to &[u8] for the underlying storage are:

None -> None
start, Bound::inclusive(x) -> x
start, Bound::exclusive(x) -> x, last byte incremented by 1
end, Bound::inclusive(x) -> x, last byte incremented by 1
end, Bound:: exclusive(x) -> x

actually, that is only for Order::Ascending. Maybe for Order::Descending these must change??

Happy if you want to try implementing such a function, so we could just:

let items: Vec<_> = map.idx.secondary.range(&store, None, Some(Bound::inclusive(U64Key::new(1))), Order::Ascending).collect();

ethanfrey · 2021-09-20T07:27:37Z

Now that we'll have deserialized keys (see #432 ) we can probably use them in the MultiIndex too.

I agree this will make things nicer for the caller and would be a good issue. But is a bit orthogonal issue to the issue with bounds.

ethanfrey · 2021-09-21T21:12:52Z

This is resolved with #446

hashedone requested review from ethanfrey, maurolacy and ueco-jb September 17, 2021 11:01

hashedone mentioned this pull request Sep 17, 2021

Unintuitive behavior of range on multi-index #430

Closed

Failing test, bug proof

4cbecbe

hashedone force-pushed the multi-index-range-failure branch from 13a3009 to 4cbecbe Compare September 17, 2021 11:06

maurolacy reviewed Sep 17, 2021

View reviewed changes

ethanfrey added this to the v0.10.0 milestone Sep 20, 2021

ethanfrey mentioned this pull request Sep 21, 2021

Add proper prefix_range helper when you want to iterate over the prefix space #446

Merged

ethanfrey closed this Sep 21, 2021

ueco-jb deleted the multi-index-range-failure branch December 22, 2021 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive bound handling on multiindex - failing test #429

Unintuitive bound handling on multiindex - failing test #429

hashedone commented Sep 17, 2021

maurolacy commented Sep 17, 2021 •

edited

Loading

maurolacy Sep 17, 2021

hashedone commented Sep 17, 2021

maurolacy commented Sep 17, 2021 •

edited

Loading

maurolacy commented Sep 17, 2021

ethanfrey commented Sep 20, 2021

ethanfrey commented Sep 20, 2021

ethanfrey commented Sep 21, 2021

		None,
		Some(Bound::inclusive((U64Key::new(1), vec![]).joined_key())),

Unintuitive bound handling on multiindex - failing test #429

Unintuitive bound handling on multiindex - failing test #429

Conversation

hashedone commented Sep 17, 2021

maurolacy commented Sep 17, 2021 • edited Loading

maurolacy Sep 17, 2021

Choose a reason for hiding this comment

hashedone commented Sep 17, 2021

maurolacy commented Sep 17, 2021 • edited Loading

maurolacy commented Sep 17, 2021

ethanfrey commented Sep 20, 2021

ethanfrey commented Sep 20, 2021

ethanfrey commented Sep 21, 2021

maurolacy commented Sep 17, 2021 •

edited

Loading

maurolacy commented Sep 17, 2021 •

edited

Loading