Skip to content

Move char/unified conversion into precompiled Util module #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mroth
Copy link
Owner

@mroth mroth commented Sep 2, 2014

Move the .char_to_unified and .unified_to_char fundamental conversions into their own module, and then precompile the results for all known Emoji character values. This makes these conversions effectively zero-cost for all known Emoji values. If nothing is matched, falls back to the actual conversion functions.

This adds a fair amount of code complexity, and because to precompile stuff you need any function it depends on to be in a different module, hence the nested Util modules here.

Speed comparison can be seen here, check out the Exmoji PR#1 column:
https://docs.google.com/spreadsheets/d/1T08I6dlyFNqqdtvQykNt43tdT85lJy4US2SqErcpZ7g/edit?usp=sharing

Most interesting benefit I see in the speed increases is the knock-on effects in making .scan/1 even faster, so the amount of text that could be scanned by a single node is even higher. That said, we were already pretty darn fast, so the code complexity (especially the potential to introduce ordering gotchas in compilation) might not be worth it here.

@mroth
Copy link
Owner Author

mroth commented Sep 2, 2014

Madness?
giphy

Although... even after spending a day working on this, I'm currently leaning towards the tradeoff in code complexity not being worth it until someone actually hits this as the performance bottleneck, and leaving this unmerged. My guess is even in a tight loop the I/O might be more of a bottleneck at this point?

Comments appreciated.

don’t actually need to define variants first for `char_to_unified`
because unlike `Scanner.bscan/1`, this is matching against an entire
binary rather than just head, so no need to worry about similar out of
order issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant