Description
Table currently allows duplicate keys, which have issues, some fixable, others harder to fix without affecting code that doesn't care about duplicate keys (the common case is to not have duplicate keys and that should be as fast as possible); eg:
- OrderedTable crashes on
del
when there are repeated keys Nim#13500 ==
incorrect for tables with repeated keys Nim#13499- we can't delete a specific duplicate key,
del
just deletes "one of the duplicates" - we can't re-assign the value for a specific duplicate
- we can't get the value for a specific duplicate
- performance is bad for duplicate keys because of hash collisions; IIRC it's quadratic in
card(duplicates of a key)
proposal
- deprecate
Table.add
: would still work (for now at least) but would show a deprecation msg showing best course of action (including what this RFC introduces) - add a thin wrapper
TableDup
around Table to offer sane API to handle duplicate keys, including ability to access / re-assign / delete a specific duplicate - performance will be as fast as for regular tables witout duplicate keys for some operations involving duplicate keys, eg: access/re-assign/delete a specific duplicate
- performance for
values
andadd
: I'm not sure but shouldn't be worse
import tables
type TableDup[A,B] =object
timpl: Table[(A,int), B]
template `[]=`[A,B](t: TableDup[A,B], a: A, b: B) =
t.timpl[(a,-1)]=b
template `[]`[A,B](t: TableDup[A,B], a: A): B =
t.timpl[(a,-1)]
template len[A,B](t: TableDup[A,B]): int = t.timpl.len
template delIndex[A,B](t: var TableDup[A,B], a: A, index: int) =
t.timpl.del((a, index))
template add[A,B](t: var TableDup[A,B], a: A, b: B): int =
var a2 = (a, 0)
while true:
if a2 in t.timpl:
a2[1].inc
else:
t.timpl[a2] = b
break
a2[1]
iterator pairs[A,B](t: TableDup[A,B]): tuple[key: A, val: B] =
for k,v in t.timpl:
yield (k[0], v)
template dollarImpl(): untyped {.dirty.} = # FACTOR: same as in tableimpl
if t.len == 0:
result = "{:}"
else:
result = "{"
for key, val in pairs(t):
if result.len > 1: result.add(", ")
result.addQuoted(key)
result.add(": ")
result.addQuoted(val)
result.add("}")
proc `$`*[A, B](t: TableDup[A, B]): string =
## The ``$`` operator for hash tables. Used internally when calling `echo`
## on a table.
dollarImpl()
proc main()=
var t: TableDup[float, string]
# echo t
t[1.0] = "baz"
doAssert t[1.0] == "baz"
doAssert t.add(2.0, "goo0") == 0
doAssert t.add(2.0, "goo1") == 1
doAssert t.add(2.0, "goo2") == 2
t.delIndex(2.0, 1)
doAssert $t == """{2.0: "goo0", 1.0: "baz", 2.0: "goo2"}"""
main()
Note
it's a thin wrapper, not a re-implementation, so all the complex logic is reused instead of re-implemented; I know we have too many Table types but the chief concern I have is that these are not mere wrappers around a few basic table types but full blown implementations (with some amount of code reuse). Wheres this is just a wrapper.
alternatives considered
IMO this is worse because you pay a price for each key, not just duplicates (seq indirection); and there are other problems as well
type TableDup2[A,B] =object
timpl: Table[A, seq[B]]
refinements
I didn't want to obfuscate the main idea but it's possible to extend to OrderedTable or other types eg:
- completely generic:
(I haven't tested this)
type TableDupGeneric[T] =object
timpl: T[(T.A, int), T.B]
var t1: TableDupGeneric[OrderedTable[float, string]]
var t2: TableDupGeneric[Table[float, string]]
var t3: TableDupGeneric[TableRef[float, string]]
- or via a flag
(I haven't tested either)
type TableDup3[A, B, ordered: static bool] =object
when ordered:
timpl: OrderedTable[A,B]
else:
timpl: Table[A,B]
links
D20200304T004500