Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo: Generalizing the HashMap constructor, extract stable store #301

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nomeata
Copy link
Contributor

@nomeata nomeata commented Nov 3, 2021

this is to demonstrate what I meant in
#300 (comment)
and
#299 (comment),
and how to introduce this without breaking changes (although it’s kinda
ugly)

What I would want here is to introduce a second, more general,
constructor for the given class, but Motoko does not allow me to do that
easily. But I can hack around that by

  • Creating a new class, not public HashMap_ with the constructor I
    want
  • In HashMap’s constructor, call the HashMap_ constructor to create
    an inner object (i)
  • In HashMap, simply copy all the fields from the inner objects to
    the outer object.
  • A public module-level function (here wrapS) exposes the new
    constructor.

With this (generic, ugly) trick I can suppor the idiom

stable var userS : HashMap.S <UserId,UserData> = newS();
let user : HashMap.HashMap<UserId,UserData> = HashMap.wrapS(10, Nat.eq, Nat.hash, userS)

without changing the API.

But it is ugly, and the effect on documentation generation is probably
bad as well. So maybe a better course of action would be to have a midly
breaking change where we only have the “new” constructor, and people
will have to fix their code by passing HashMap.newS() as a new fourth
argument if they want their old behavior. Probably better than piling up
hacks like this.

In that case, simply rename class HashMap_ to HashMap, remove wrapS and
the old class HashMap.

this is to demonstrate what I meant in
dfinity#300 (comment)
and
dfinity#299 (comment),
and how to introduce this without breaking changes (although it’s kinda
ugly)

What I would _want_ here is to introduce a second, more general,
constructor for the given class, but Motoko does not allow me to do that
easily. But I can hack around that by

 * Creating a new class, not public `HashMap_` with the constructor I
   want
 * In `HashMap`’s constructor, call the `HashMap_` constructor to create
   an inner object (`i`)
 * In `HashMap`, simply copy all the fields from the inner objects to
   the outer object.
 * A public module-level function (here `wrapS`) exposes the new
   constructor.

With this (generic, ugly) trick I can suppor the idiom
```
stable var userS : HashMap.S <UserId,UserData> = newS();
let user : HashMap.HashMap<UserId,UserData> = HashMap.wrapS(10, Nat.eq, Nat.hash, userS)
```
without changing the API.

But it is ugly, and the effect on documentation generation is probably
bad as well. So maybe a better course of action would be to have a midly
breaking change where we only have the “new” constructor, and people
will have to fix their code by passing `HashMap.newS()` as a new fourth
argument if they want their old behavior. Probably better than piling up
hacks like this.

In that case, simply rename `class HashMap_` to `HashMap`, remove `wrapS` and
the old `class HashMap`.
@nomeata nomeata mentioned this pull request Nov 3, 2021
@nomeata
Copy link
Contributor Author

nomeata commented Nov 3, 2021

I just pushed a commit that removes the backward-compat-fluff, i.e. changes the constructor, but nothing else. You can look at the first commit (e851c09) explicitly if you want to see the backward compat trick in action.

@nomeata
Copy link
Contributor Author

nomeata commented Nov 4, 2021

Ok, new variant, which is fully backwards compatible, and still allows people to put the actual data store on stable memory.

WDYT, @matthewhammer

Copy link
Contributor

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but of course, the big problem is that the ability to adopt a random backing store with setStore breaks encapsulation and can violate every possible invariant of the class.

But short of user-defined "stabilisation" hooks, that's perhaps the best we can do at the moment.

Nevertheless, setStore should at least come with a big fat warning saying that the external store ought to be empty (modulo some language regarding adoption from earlier instances through upgrades?), and should not be mutated afterwards by anybody else than the class.

@@ -26,16 +26,15 @@ module {

/// An imperative HashMap with a minimal object-oriented interface.
/// Maps keys of type `K` to values of type `V`.
public class HashMap<K, V>(
public class HashMap<K,V>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally undoing style guide conformance? ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, accidentally :-)

@@ -219,4 +230,18 @@ module {
h2
};

/// The mutable state of a `HashMap`
public type S<K,V> = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clearer to move this to the top, since S is already used in the HashMap class (I was confused at first where S is coming from).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it could be a public type of the class itself, right? Then it could be near the end, but part of the class (if optimizing for documentation reader's preferred order, not implementation reader's.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it could be a public type of the class itself, right?

My understanding is that the type system (still) does not permit type members of classes, only modules, or maybe that changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, not sure. Why should it not? But if not, ignore me.

};

/// Creating the state of an empty `HashMap`
public func newS<K,V>() : S<K,V>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could turn S into a class and avoid some code duplication.

@rossberg
Copy link
Contributor

rossberg commented Nov 5, 2021

Btw, I believe that technically, the stable var userS doesn't need to be a var, it could just be a let, which would potentially be cleaner. Or was there some issue with stable lets?

@nomeata
Copy link
Contributor Author

nomeata commented Nov 5, 2021

Since the S actually contains all mutable state of the data structure, and the class is more or less just bundling operations, I think it's actually safe to share the store between different class wrappers, for example (as long as the hash and eq parameters are the same).

Essentially, you get the same safety guarantees as with a procedural interface.

Indeed, modifying the store directly can break stuff… but that seems inherent to having to store it in stable memory.

BTW, I don't plan to expand this into mergable form, I'm traveling soon without a laptop. This should serve as inspiration and @matthewhammer or whoever is free to steal ideas and idioms. In particular, the names here are surely not perfect.

You are right about stable var being sufficient, I think.

@rossberg
Copy link
Contributor

rossberg commented Nov 5, 2021

Essentially, you get the same safety guarantees as with a procedural interface.

Well, yes, but the safe encapsulation of the hash/eq parameters is the only real reason to use a class in the first place.

Indeed, modifying the store directly can break stuff… but that seems inherent to having to store it in stable memory.

Not necessarily, if we had stable functions. But that's probably a pipe dream.

@nomeata
Copy link
Contributor Author

nomeata commented Nov 5, 2021

Then why do we have a class for Buffer? Unreal reasons ;-)

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 5, 2021

BTW, I don't plan to expand this into mergable form, I'm traveling soon without a laptop. This should serve as inspiration and @matthewhammer or whoever is free to steal ideas and idioms. In particular, the names here are surely not perfect.

I like this idea, since it gets us the benefits of the Stable- versions without having to duplicate code.

But, I realize the design space is less than clear.

If we choose to prioritize code concision, as this PR does (perhaps that has lots of other engineering benefits too), I think that breaking encapsulation, while unfortunate, is the price we have to pay at this state of the language's maturity as we continue to want these two things, which are fundamentally at odds:

  • flexible and stable variables whose upkeep does not over-burden the programmer of the canister (everyone wants to avoid authoring manual upgrade hooks, understandably).
  • strong abstractions (that is to say "abstraction", in the mathematical sense) for data structures holding state.

OTOH, having the Stable- versions as separate gets us "real OO" abstraction for the non-stable versions.

Which is worse, I wonder:

  1. Writing potentially-wrong upgrade hooks, or
  2. misusing the "trap door" of setStore, and having an accident?

At least setStore is very grepable, making audits pretty straightforward, one would hope.

WDYT @crusso and @rossberg ?

@rossberg
Copy link
Contributor

rossberg commented Nov 8, 2021

Maybe it's clearer if we called this class HashMapProxy or something like that?

@nomeata
Copy link
Contributor Author

nomeata commented Nov 8, 2021

And no HashMap class at all? This PR doesn't introduce a new class, it “merely” trades state encapsulation against upgrade persistence.

But maybe that is simply not desirable, at least not until we have a stable memory story for Motoko that actually scales to noticable amounts of data.

@crusso
Copy link
Contributor

crusso commented Nov 8, 2021

What I would want here is to introduce a second, more general,
constructor for the given class, but Motoko does not allow me to do that
easily. But I can hack around that by

Why not make the constructor take a variant of old, original arg or continuing, existing state and have a reader for exposing the state.

Then it could perhaps even check the invariant when continuing with old state on re-construction?

Breaks existing clients, but only with minor impact since they need to insert an injection in their constructor calls.

class HashMap<K,V> (arg : {#new: (capacity : Nat, keyEq: K -> K -> Bool, hash : K -> Nat32);
                                            #continue: (state: S<K,V>, keyEq: K -> K -> Bool, hash : K -> Nat32)}) {
    ...
    public func stabilise: () -> S<K,V> { ... }
}

@nomeata
Copy link
Contributor Author

nomeata commented Nov 9, 2021

I tried to break no clients. If that's ok, yes, a variant is good option.

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 9, 2021

Why not make the constructor take a variant of old, original arg or continuing, existing state and have a reader for exposing the state.

Yep, I like that design as well.

Some initial Buffer implementation did this precisely for this purpose, IIRC but got simplified during group review. As I recall, we didn't even have stable vars and the complexity seemed premature at the time. Now it doesn't, IMO. The breaking change seems worth it, now.

Perhaps we should discuss a more systematic change that could amend the interface design guide with something along these lines?

WDYT @rossberg ?

@matthewhammer
Copy link
Contributor

From the guide

The share/unshare functions of a class need to convert to a type that is designed for stability (potentially, including extensibility) and space efficiency, not for enabling efficient direct operations.

In particular, it seems like unshare should really be done by the constructor, in many cases.

@rossberg
Copy link
Contributor

Hm, a complicated constructor like that would just make usability worse in the common case. It was intentional that the guide proposes unshare as a method instead (that you'd typically invoke on a freshly created instance). It also is more symmetric with share.

(If you truly need multiple constructors, you can often create wrapper functions, e.g.:

class HashMapProxy<K, V>(state : S<K, V>, eq, hash) { ... };

func HashMap<K, V>(capacity : Nat, eq, hash) = HashMapProxy<K, V>(createState(), eq, hash);
type HashMap<K, V> = HashMapProxy<K, V>;
// would be nicer if we allowed
// class HashMap<K, V>(capacity : Nat, eq, hash) = HashMapProxy<K, V>(createState(), eq, hash)

but I'm not convinced that's desirable in this case.)

@nomeata
Copy link
Contributor Author

nomeata commented Nov 10, 2021

Ah, we actually can pun like this? I considered it, but disregarded it, assuming that even if it's supported, the docs will look bad. Thsts why the first iteration of this introduces a proxy class. (Even more reasons to derive docs from the type of a module, not it's source…)

@ggreif ggreif force-pushed the master branch 2 times, most recently from d52aecd to 08507fc Compare October 21, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants