Move char/unified conversion into precompiled Util module #1

mroth · 2014-09-02T17:54:56Z

Move the .char_to_unified and .unified_to_char fundamental conversions into their own module, and then precompile the results for all known Emoji character values. This makes these conversions effectively zero-cost for all known Emoji values. If nothing is matched, falls back to the actual conversion functions.

This adds a fair amount of code complexity, and because to precompile stuff you need any function it depends on to be in a different module, hence the nested Util modules here.

Speed comparison can be seen here, check out the Exmoji PR#1 column:
https://docs.google.com/spreadsheets/d/1T08I6dlyFNqqdtvQykNt43tdT85lJy4US2SqErcpZ7g/edit?usp=sharing

Most interesting benefit I see in the speed increases is the knock-on effects in making .scan/1 even faster, so the amount of text that could be scanned by a single node is even higher. That said, we were already pretty darn fast, so the code complexity (especially the potential to introduce ordering gotchas in compilation) might not be worth it here.

mroth · 2014-09-02T17:59:08Z

Madness?

Although... even after spending a day working on this, I'm currently leaning towards the tradeoff in code complexity not being worth it until someone actually hits this as the performance bottleneck, and leaving this unmerged. My guess is even in a tight loop the I/O might be more of a bottleneck at this point?

Comments appreciated.

madness? THIS IS SPARTA

don’t actually need to define variants first for `char_to_unified` because unlike `Scanner.bscan/1`, this is matching against an entire binary rather than just head, so no need to worry about similar out of order issues.

mroth added 2 commits September 3, 2014 16:16

move char/unified conversion into precompiled Util module

ad15015

madness? THIS IS SPARTA

use new codepoint_ids function to shorten up defs

8905c0a

don’t actually need to define variants first for `char_to_unified` because unlike `Scanner.bscan/1`, this is matching against an entire binary rather than just head, so no need to worry about similar out of order issues.

mroth force-pushed the performance branch from 1e5c165 to 8905c0a Compare September 3, 2014 20:26

mroth force-pushed the master branch from 1faf5e1 to 0982b7f Compare September 4, 2014 20:07

christhekeele mentioned this pull request Jul 30, 2016

Latest and greatest #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move char/unified conversion into precompiled Util module #1

Move char/unified conversion into precompiled Util module #1

mroth commented Sep 2, 2014

mroth commented Sep 2, 2014

Move char/unified conversion into precompiled Util module #1

Are you sure you want to change the base?

Move char/unified conversion into precompiled Util module #1

Conversation

mroth commented Sep 2, 2014

mroth commented Sep 2, 2014