fix: make line_column_to_offset character-based #2203

VolodymyrBg · 2025-09-23T11:43:28Z

This change updates SourceContent::line_column_to_offset to interpret columns as Unicode character positions (Rust char) rather than raw bytes. The previous implementation treated columns as byte offsets and could panic when a column landed in the middle of a UTF-8 code point. It also contradicted location(), which already reports columns based on character counts.
With this change:

line_column_to_offset now computes a safe byte offset via char_indices(), aligning semantics with location() and editor/LSP expectations.
select() and update() become correct for non-ASCII input since they build on line_column_to_offset.
A new unit test verifies round-trip consistency on a line containing multi-byte characters (aβ中💖).

These fixes prevent panics on valid UTF-8, ensure consistent column semantics across APIs, and better support LSP-style operations. If UTF-16 positions are needed by a client, conversion should occur at the integration boundary rather than within these core types.

crates/debug/types/src/source_file.rs

bitwalker · 2025-09-24T15:12:36Z

crates/debug/types/src/source_file.rs

-        Some(start + ByteOffset::from_str_len(pre))
+
+        // Determine byte offset within the line corresponding to the character column
+        let byte_in_line = if col_chars == num_chars {


This is also redundant - char_indices().nth() already handles this case, and removing this conditional avoids the need for us to compute num_chars at all (and thus iterate the characters of the line twice).

This is also redundant - char_indices().nth() already handles this case, and removing this conditional avoids the need for us to compute num_chars at all (and thus iterate the characters of the line twice).

all done

bitwalker

One last change, and then we should be able to merge this. Could you also add an entry to CHANGELOG.md for this fix?

bitwalker · 2025-09-24T17:50:26Z

crates/debug/types/src/source_file.rs

+        // Single pass over chars: accumulate byte length until reaching the desired column
+        let mut byte_in_line = 0usize;
+        let mut count = 0usize;
+        for ch in line_src.chars() {
+            if count == column_index {
+                break;
+            }
+            byte_in_line += ch.len_utf8();
+            count += 1;
+        }
+        if count != column_index {
+            // Out of bounds: requested column is greater than number of characters in line
            return None;
        }
-        let (pre, _) = line_src.split_at(column_index);
-        let start = line_span.start;
-        Some(start + ByteOffset::from_str_len(pre))
+


All of this can be replaced with just:

let byte_in_line = line_src.char_indices().nth(column_index).map(|(byte_index, _)| byte_index)?;

bitwalker · 2025-09-24T21:33:10Z

It looks like this exposes an issue with SourceContent::update, namely that having a Selection whose end specifies a line that doesn't exist/is empty, will fail, presumably because the attempt to resolve a ColumnIndex of 0 on that line will fail (char_indices() on that line will be an empty iterator, so None is returned, which causes SourceContent::update to fail).

We want the semantics of Selection::from(LineIndex(0)..LineIndex(1)) on a source file that has a single line of text with a trailing newline, to produce a Selection whose end specifies the byte offset of the trailing newline. I think we probably need to handle that case in line_column_to_offset, by checking that when char_indices().nth(column_index) produces None, if column_index == 0, then we should return the byte index of the start of the line.

VolodymyrBg · 2025-09-29T14:00:10Z

It looks like this exposes an issue with SourceContent::update, namely that having a Selection whose end specifies a line that doesn't exist/is empty, will fail, presumably because the attempt to resolve a ColumnIndex of 0 on that line will fail (char_indices() on that line will be an empty iterator, so None is returned, which causes SourceContent::update to fail).

We want the semantics of Selection::from(LineIndex(0)..LineIndex(1)) on a source file that has a single line of text with a trailing newline, to produce a Selection whose end specifies the byte offset of the trailing newline. I think we probably need to handle that case in line_column_to_offset, by checking that when char_indices().nth(column_index) produces None, if column_index == 0, then we should return the byte index of the start of the line.

seems like failling tests are unrelated

bitwalker · 2025-09-29T14:15:50Z

Can you rebase on next? That should clear up the CI failures

…test

VolodymyrBg · 2025-09-29T17:10:17Z

Can you rebase on next? That should clear up the CI failures

Yes, CI feels good now

bitwalker requested changes Sep 24, 2025

View reviewed changes

VolodymyrBg added 5 commits September 29, 2025 15:11

Make line_column_to_offset character-based; add non-ASCII round-trip …

9c7e5ee

…test

Update source_file.rs

2af62f5

Update source_file.rs

5540329

Update CHANGELOG.md

59a2d92

fix

fd09b99

VolodymyrBg force-pushed the fix/debug-types-char-columns branch from b91581c to fd09b99 Compare September 29, 2025 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make line_column_to_offset character-based #2203

fix: make line_column_to_offset character-based #2203

Uh oh!

VolodymyrBg commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

bitwalker Sep 24, 2025

Uh oh!

VolodymyrBg Sep 24, 2025

Uh oh!

bitwalker left a comment

Uh oh!

bitwalker Sep 24, 2025

Uh oh!

bitwalker commented Sep 24, 2025

Uh oh!

VolodymyrBg commented Sep 29, 2025

Uh oh!

bitwalker commented Sep 29, 2025

Uh oh!

VolodymyrBg commented Sep 29, 2025

Uh oh!

Uh oh!

fix: make line_column_to_offset character-based #2203

Are you sure you want to change the base?

fix: make line_column_to_offset character-based #2203

Uh oh!

Conversation

VolodymyrBg commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

bitwalker Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

VolodymyrBg Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

bitwalker left a comment

Choose a reason for hiding this comment

Uh oh!

bitwalker Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

bitwalker commented Sep 24, 2025

Uh oh!

VolodymyrBg commented Sep 29, 2025

Uh oh!

bitwalker commented Sep 29, 2025

Uh oh!

VolodymyrBg commented Sep 29, 2025

Uh oh!

Uh oh!