Fix flow-remove-types for input with multi-byte characters#7781
Fix flow-remove-types for input with multi-byte characters#7781mourner wants to merge 3 commits intofacebook:mainfrom
Conversation
|
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
|
@mroch this is ready for review. While the performance didn't regress, there's no point in optimizing the code much because almost all time is spent in |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@motiz88 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
packages/flow-remove-types/index.js
Outdated
| // Creates a zero-width "node" with a value to splice at that position. | ||
| // WARNING: This is only safe to use when source maps are off! | ||
| function getSpliceNodeAtPos(context, pos, loc, value) { | ||
| context.bytesAdded += value.length; |
There was a problem hiding this comment.
This is technically mixing JS string length (UTF-16 code units) and UTF-8 bytes. Not a problem yet because the only call to getSpliceNodeAtPos uses a plain ASCII string (' =>'), but maybe we should future-proof this?
|
|
||
| return buf.toString('utf8', 0, offset); | ||
| }, | ||
| generateMap: function() { |
There was a problem hiding this comment.
How does this issue translate to source maps? I'm suddenly worried that the source map spec is silent on what "columns" mean, but I'd imagine it's meant to be interpreted more like "code points" than "bytes". So we might need to do some Unicode bookkeeping in generateSourceMappings as well.
| /* @flow */ | ||
| // @nolint | ||
|
|
||
| // multi-byte chars: Гарного дня, котики! |
There was a problem hiding this comment.
Can we add some non-BMP (multi UTF-16 code unit) characters to the test? e.g. emoji: '🐈'.length == 2, Buffer.from('🐈').length == 4
Also, I'd add a test case with non-ASCII characters in the actual code that we process, e.g.
var lambda: λ = (α: number): number => α;|
@mourner have you still got time to look into this and address the comments? |
|
@thehogfather apologies — can't get to it at the moment but hopefully I'll find some time in a week or two. The only comment left unaddressed is this one:
If anyone's up to helping out on the source maps part, I'd appreciate it! |
Closes #7779 by implementing the approach described by @mroch in #7779 (comment).
Remaining work:
Try eliminating all string manipulation andNot worth it — bottleneck is in parsing.buffer.writecalls in favor of purely byte-based operations for performance.