Skip to content

Change character units from UTF-16 code unit to Unicode codepoint #376

Closed
@MaskRay

Description

@MaskRay

Text document offsets are based on a UTF-16 string representation. This is strange enough in that text contents are transmitted in UTF-8.

Text Documents
......... The offsets are based on a UTF-16 string representation.

Here in TextDocumentContentChangeEvent, range is specified in UTF-16 column offsets while text is transmitted in UTF-8.

interface TextDocumentContentChangeEvent {
	range?: Range;
	rangeLength?: number;
	text: string;
}

Is it more reasonable to unify these, remove UTF-16 from the wording, and use UTF-8 as the solely used encoding? Line/character can be measured in units of Unicode codepoints, instead of UTF-16 code units.
A line cannot be too long and thus doing extra computing to get the N'th Unicode codepoint would not lay too much burden on editors and language servers.

jacobdufault/cquery#57

Survey: counting method of Position.character offsets supported by language servers/clients
https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    breakingfeature-requestRequest for new features or functionalityhelp wantedIssues identified as good community contribution opportunitiesinitialization

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions