LSIF spec could use some extra clarity around embedded contents

The LSIF spec [states that the contents of a file included in an LSIF index is encoded base64](https://github.com/Microsoft/language-server-protocol/blob/master/indexFormat/specification.md#embedding-contents):

> It can be valuable to embed the contents of a document or project file into the dump as well. For example, if the content of the document is a virtual document generated from program meta data. The index format, therefore, supports an optional contents property on the document and project vertex. If used the content needs to be base64 encoded.

Given base64 is a encoding of a binary stream, this implies that there's a text encoding question. So some questions:

1. Should the binary stream be the raw file on disk, in whatever text encoded form it is? This then means it's the responsibility of any consumer to do encoding sniffing which may come to a different conclusion (and therefore different contents) than the indexer. The alternative is the indexer re-encodes in some preferred/specified text encoding prior to the base64 encoding, although that still creates other fun questions around binary file inputs to compilers.
2. For files that have no "native" encoding because the indexer generated them directly in memory, which encoding should be chosen as a preferred choice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LSIF spec could use some extra clarity around embedded contents #1139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LSIF spec could use some extra clarity around embedded contents #1139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions