Skip to content

Automatic unique CONSTR_IDs #239

Closed
@nielstron

Description

@nielstron

Is your feature request related to a problem? Please describe.
When trying to define Union'd PlutusData in OpShin/PyCardano, one has to take care that constructor ids are unique for all elements in the Union. While other languages like PlutusTx take care of this automatically (assigning all elements in a Union unique ids at definition) in PyCardano we have to manually pick a set of unique CONSTR_IDs. Also we want to be able to benefit from the free declarability of Unions in Python, which lets us group together types across definitions (i.e Union[A,B] and Union[A,C], this is actually not possible with any other Cardano SC language !)

Describe the solution you'd like
We definetly want a few properties on the CONSTR_IDs

  1. small: ideally the constr_id integer should be as small as possible, as smaller integers are encoded more efficiently in CBOR and save the end user minutxo and txfees (constr_ids are encoded as the cbor tag up to 7 bit size, after that encoded as generic integer)
  2. unique: There should be as little overlap with other values as possible, so that we can group together classes in unions without having to worry about setting/overwriting the constr id
  3. overwritable: In order to mimic external interfaces such as the ledger, we need to be able to manually set the CONSTR_ID. We also may want to manually set the IDs in case of collisions
  4. deterministic: Datatypes that are defined in libraries may be imported in arbitrary contexts. the constr_id must therefore not depend on i.e. what other Unions the datatype is being used in or what other datatypes are declared in its surroundings

Overwritability is an implementation detail that we simply need to bear in mind when adopting this change.
IMO the best solution is based on a hash of the class name. The main question is now how to process the hash to serve for the goal of minimal size.

Alternative A: CONSTR_ID = hash(classname) % 127

This solution truncates the hash of the classname to the last 7 bits, giving us a constructor id between 0 and 127, neatly encodable in a single byte. However, due to the birthday paradox, this may result in a collision with another constructor id from 23 defined classes onwards. In practice we see ~5-10 user defined classes per contract so this could be fine if alternative B is too expensive

Alternative B: CONSTR_ID = hash(classname) % (2**32)

This solution truncates the hash of the classname to 32 bits, nicely fitting into a normal integer. This will reduce the likelyhood of conflicts to a negligible amount, however costing 3 bytes more than solution A per datum (mind that these datums end up on chain). It should be investigated how much the impact of this is on minutxo and script execution cost.

Describe alternatives you've considered

Solutions similar to auto() in enum (global incremental counter that increases with every definition of a plutusdatum) is not a good idea because it breaks determinism.

Additional context
OpShin lets you write smart contracts in Cardano and uses the PlutusData class for native data.
In haskell, data is usually defined together with all alternatives of the same type (i.e. A = B | C | D) while in python/Pycardano/opshin the classes are first defined, then later combined to unions (i.e. class B; class C; class D; A = Union[B, C, D])

Requesting comments from @cffls and @juliusfrost, but will most likely just go ahead and implement this as soon as I got time for it (~1-2 weeks from now)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions