-
Notifications
You must be signed in to change notification settings - Fork 234
fix: make line_column_to_offset character-based #2203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
fix: make line_column_to_offset character-based #2203
Conversation
Some(start + ByteOffset::from_str_len(pre)) | ||
|
||
// Determine byte offset within the line corresponding to the character column | ||
let byte_in_line = if col_chars == num_chars { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also redundant - char_indices().nth()
already handles this case, and removing this conditional avoids the need for us to compute num_chars
at all (and thus iterate the characters of the line twice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also redundant -
char_indices().nth()
already handles this case, and removing this conditional avoids the need for us to computenum_chars
at all (and thus iterate the characters of the line twice).
all done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change, and then we should be able to merge this. Could you also add an entry to CHANGELOG.md
for this fix?
// Single pass over chars: accumulate byte length until reaching the desired column | ||
let mut byte_in_line = 0usize; | ||
let mut count = 0usize; | ||
for ch in line_src.chars() { | ||
if count == column_index { | ||
break; | ||
} | ||
byte_in_line += ch.len_utf8(); | ||
count += 1; | ||
} | ||
if count != column_index { | ||
// Out of bounds: requested column is greater than number of characters in line | ||
return None; | ||
} | ||
let (pre, _) = line_src.split_at(column_index); | ||
let start = line_span.start; | ||
Some(start + ByteOffset::from_str_len(pre)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this can be replaced with just:
let byte_in_line = line_src.char_indices().nth(column_index).map(|(byte_index, _)| byte_index)?;
It looks like this exposes an issue with We want the semantics of |
seems like failling tests are unrelated |
Can you rebase on |
b91581c
to
fd09b99
Compare
Yes, CI feels good now |
This change updates SourceContent::line_column_to_offset to interpret columns as Unicode character positions (Rust char) rather than raw bytes. The previous implementation treated columns as byte offsets and could panic when a column landed in the middle of a UTF-8 code point. It also contradicted location(), which already reports columns based on character counts.
With this change:
These fixes prevent panics on valid UTF-8, ensure consistent column semantics across APIs, and better support LSP-style operations. If UTF-16 positions are needed by a client, conversion should occur at the integration boundary rather than within these core types.