Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to specify the ID encoding or encode by lexicographical order? #3

Open
jiguanglizipao opened this issue Nov 17, 2022 · 1 comment

Comments

@jiguanglizipao
Copy link

In Sample usage, xcdat produces a string-to-ID encoding which seems to be random and not in lexicographical order. Is it possible to specify string IDs or make them ranked in lexicographical order? If not, what is the strategy/order for generating the encoding?

@kampersanda
Copy link
Owner

@jiguanglizipao Sorry for the late reply.

Is it possible to specify string IDs or make them ranked in lexicographical order?

No. String IDs must be in random order due to the data structure. If you want to obtain lex order mapping, you need to construct permutation outside Xcdat.

If not, what is the strategy/order for generating the encoding?

This is because Xcdat (almost randomly) arranges trie nodes in an array based on the double-array scheme and assigns string IDs based on the arrangement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants