Skip to content

Conversation

VolodymyrBg
Copy link
Contributor

This change updates SourceContent::line_column_to_offset to interpret columns as Unicode character positions (Rust char) rather than raw bytes. The previous implementation treated columns as byte offsets and could panic when a column landed in the middle of a UTF-8 code point. It also contradicted location(), which already reports columns based on character counts.
With this change:

  • line_column_to_offset now computes a safe byte offset via char_indices(), aligning semantics with location() and editor/LSP expectations.
  • select() and update() become correct for non-ASCII input since they build on line_column_to_offset.
  • A new unit test verifies round-trip consistency on a line containing multi-byte characters (aβ中💖).

These fixes prevent panics on valid UTF-8, ensure consistent column semantics across APIs, and better support LSP-style operations. If UTF-16 positions are needed by a client, conversion should occur at the integration boundary rather than within these core types.

Some(start + ByteOffset::from_str_len(pre))

// Determine byte offset within the line corresponding to the character column
let byte_in_line = if col_chars == num_chars {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also redundant - char_indices().nth() already handles this case, and removing this conditional avoids the need for us to compute num_chars at all (and thus iterate the characters of the line twice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also redundant - char_indices().nth() already handles this case, and removing this conditional avoids the need for us to compute num_chars at all (and thus iterate the characters of the line twice).

all done

Copy link
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, and then we should be able to merge this. Could you also add an entry to CHANGELOG.md for this fix?

Comment on lines 636 to 650
// Single pass over chars: accumulate byte length until reaching the desired column
let mut byte_in_line = 0usize;
let mut count = 0usize;
for ch in line_src.chars() {
if count == column_index {
break;
}
byte_in_line += ch.len_utf8();
count += 1;
}
if count != column_index {
// Out of bounds: requested column is greater than number of characters in line
return None;
}
let (pre, _) = line_src.split_at(column_index);
let start = line_span.start;
Some(start + ByteOffset::from_str_len(pre))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this can be replaced with just:

let byte_in_line = line_src.char_indices().nth(column_index).map(|(byte_index, _)| byte_index)?;

@bitwalker
Copy link
Collaborator

It looks like this exposes an issue with SourceContent::update, namely that having a Selection whose end specifies a line that doesn't exist/is empty, will fail, presumably because the attempt to resolve a ColumnIndex of 0 on that line will fail (char_indices() on that line will be an empty iterator, so None is returned, which causes SourceContent::update to fail).

We want the semantics of Selection::from(LineIndex(0)..LineIndex(1)) on a source file that has a single line of text with a trailing newline, to produce a Selection whose end specifies the byte offset of the trailing newline. I think we probably need to handle that case in line_column_to_offset, by checking that when char_indices().nth(column_index) produces None, if column_index == 0, then we should return the byte index of the start of the line.

@VolodymyrBg
Copy link
Contributor Author

It looks like this exposes an issue with SourceContent::update, namely that having a Selection whose end specifies a line that doesn't exist/is empty, will fail, presumably because the attempt to resolve a ColumnIndex of 0 on that line will fail (char_indices() on that line will be an empty iterator, so None is returned, which causes SourceContent::update to fail).

We want the semantics of Selection::from(LineIndex(0)..LineIndex(1)) on a source file that has a single line of text with a trailing newline, to produce a Selection whose end specifies the byte offset of the trailing newline. I think we probably need to handle that case in line_column_to_offset, by checking that when char_indices().nth(column_index) produces None, if column_index == 0, then we should return the byte index of the start of the line.

seems like failling tests are unrelated

@bitwalker
Copy link
Collaborator

Can you rebase on next? That should clear up the CI failures

@VolodymyrBg VolodymyrBg force-pushed the fix/debug-types-char-columns branch from b91581c to fd09b99 Compare September 29, 2025 15:12
@VolodymyrBg
Copy link
Contributor Author

Can you rebase on next? That should clear up the CI failures

Yes, CI feels good now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants