Change character units from UTF-16 code unit to Unicode codepoint

Text document offsets are based on a UTF-16 string representation. This is strange enough in that text contents are transmitted in UTF-8.

```
Text Documents
......... The offsets are based on a UTF-16 string representation.
```

Here in `TextDocumentContentChangeEvent`, `range` is specified in UTF-16 column offsets while `text` is transmitted in UTF-8.

```typescript
interface TextDocumentContentChangeEvent {
	range?: Range;
	rangeLength?: number;
	text: string;
}
```

Is it more reasonable to unify these, remove UTF-16 from the wording, and use UTF-8 as the solely used encoding? Line/character can be measured in units of Unicode codepoints, instead of UTF-16 code units.
A line cannot be too long and thus doing extra computing to get the N'th Unicode codepoint would not lay too much burden on editors and language servers.

https://github.com/jacobdufault/cquery/issues/57

**Survey**: counting method of Position.character offsets supported by language servers/clients
 https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change character units from UTF-16 code unit to Unicode codepoint #376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Change character units from UTF-16 code unit to Unicode codepoint #376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions