how can we get symbols to be written as dictionary encoded strings

it seems the mechanism for doing this is in the library:
for instance given a symbol vector:

```q
sym:`a`b`c`a`a`c
dvalues:distinct sym
indices:dvalues?sym
/ideally we would use the smallest type that can support the number of distinct symbols:
mt:(.arrowkdb.dt[`int8`int16`int32`int64])!im:floor 2 xexp 0 7 15 31
mkt:4 5 6 7h!im
indextype:mt bin c:count dvalues
indexktype:mkt bin c
datatype_symbol:.arrowkdb.dt.dictionary[.arrowkdb.dt.utf8[];indextype[]]
/we can even pretty print the type we want:
.arrowkdb.ar.prettyPrintArray[datatype_symbol;(dvalues;indexktype$indices);::]
```

but what's not clear is how to enhance the current inferSchema to do this calculation, this means that currently tables that have symbols are not the same after the round trip and all the symbols are cast to type string


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how can we get symbols to be written as dictionary encoded strings #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how can we get symbols to be written as dictionary encoded strings #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions