Skip to content

MultiTable: thin wrapper around Table with sane API for duplicate keys; deprecate Table.add #200

Closed
@timotheecour

Description

@timotheecour

Table currently allows duplicate keys, which have issues, some fixable, others harder to fix without affecting code that doesn't care about duplicate keys (the common case is to not have duplicate keys and that should be as fast as possible); eg:

proposal

  • deprecate Table.add: would still work (for now at least) but would show a deprecation msg showing best course of action (including what this RFC introduces)
  • add a thin wrapper TableDup around Table to offer sane API to handle duplicate keys, including ability to access / re-assign / delete a specific duplicate
  • performance will be as fast as for regular tables witout duplicate keys for some operations involving duplicate keys, eg: access/re-assign/delete a specific duplicate
  • performance for values and add: I'm not sure but shouldn't be worse
  import tables

  type TableDup[A,B] =object
    timpl: Table[(A,int), B]

  template `[]=`[A,B](t: TableDup[A,B], a: A, b: B) =
    t.timpl[(a,-1)]=b

  template `[]`[A,B](t: TableDup[A,B], a: A): B =
    t.timpl[(a,-1)]

  template len[A,B](t: TableDup[A,B]): int = t.timpl.len

  template delIndex[A,B](t: var TableDup[A,B], a: A, index: int) =
    t.timpl.del((a, index))

  template add[A,B](t: var TableDup[A,B], a: A, b: B): int =
    var a2 = (a, 0)
    while true:
      if a2 in t.timpl:
        a2[1].inc
      else:
        t.timpl[a2] = b
        break
    a2[1]

  iterator pairs[A,B](t: TableDup[A,B]): tuple[key: A, val: B] =
    for k,v in t.timpl:
      yield (k[0], v)

  template dollarImpl(): untyped {.dirty.} = # FACTOR: same as in tableimpl
    if t.len == 0:
      result = "{:}"
    else:
      result = "{"
      for key, val in pairs(t):
        if result.len > 1: result.add(", ")
        result.addQuoted(key)
        result.add(": ")
        result.addQuoted(val)
      result.add("}")

  proc `$`*[A, B](t: TableDup[A, B]): string =
    ## The ``$`` operator for hash tables. Used internally when calling `echo`
    ## on a table.
    dollarImpl()

  proc main()=
    var t: TableDup[float, string]
    # echo t
    t[1.0] = "baz"
    doAssert t[1.0] == "baz"
    doAssert t.add(2.0, "goo0") == 0
    doAssert t.add(2.0, "goo1") == 1
    doAssert t.add(2.0, "goo2") == 2
    t.delIndex(2.0, 1)
    doAssert $t == """{2.0: "goo0", 1.0: "baz", 2.0: "goo2"}"""

  
main()

Note

it's a thin wrapper, not a re-implementation, so all the complex logic is reused instead of re-implemented; I know we have too many Table types but the chief concern I have is that these are not mere wrappers around a few basic table types but full blown implementations (with some amount of code reuse). Wheres this is just a wrapper.

alternatives considered

IMO this is worse because you pay a price for each key, not just duplicates (seq indirection); and there are other problems as well

  type TableDup2[A,B] =object
    timpl: Table[A, seq[B]]

refinements

I didn't want to obfuscate the main idea but it's possible to extend to OrderedTable or other types eg:

  • completely generic:
    (I haven't tested this)
  type TableDupGeneric[T] =object
    timpl: T[(T.A, int), T.B]

var t1: TableDupGeneric[OrderedTable[float, string]]
var t2: TableDupGeneric[Table[float, string]]
var t3: TableDupGeneric[TableRef[float, string]]
  • or via a flag
    (I haven't tested either)
type TableDup3[A, B, ordered: static bool] =object
  when ordered:
    timpl: OrderedTable[A,B]
  else:
    timpl: Table[A,B]

links

D20200304T004500

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions