-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std/jsonc seems to strip comments #3502
Comments
@bjesuiter I'm not sure what you mean by an example? From esm-dev/esm.sh#687 (comment):
The parser. The comment on the |
Thanks @farsightsoftware for the explanation. After thinking about it, I also realized the problem with returning a JS Object vs. an AST. However, I find it a big roadblock for developers when this lib is not able to reconstruct the file correctly with comments. I have a conference this week, so I'd add this topic to my list of things to tackle the weeks after. But I think, the std lib should provide a correct implementation for stringifying jsonc in general, so I'd like to keep this issue open. |
Has there been any status update or progress on this bug lately? I find it to be quite an annoying developer experience with ESM not yet doing the right thing for updating my deno.jsonc automagically. So much so that I would love to design/prototype/contribute a PR if nobody is actively working on this now! I am crudely thinking of something like a ParseOption to slightly modify the return type of parse (or else to make a new function called e.g. parseTree), such that the returned type includes all details necessary to faithfully stringify the JSON value while reintroducing the comments. (In addition, the returned type could serve as an API for subsequent changes and/or stringification.) |
@mmuddasani I like this idea! I don't know though whether deno core team would accept the pr. but we should try! Step 1: we should look for a standard format for representing parsed jsonc as a concrete syntax tree, so that we avoid creating a new proprietary format. Step 2: adding this parseTree function you proposed (I find it more clear than adding a parameter to the parse function, since you'd have to use a union return type for the parse function otherwise) I cannot write it this week, but if you're interested, I'd do a review of the PR! :) |
Microsoft's jsonc module that @farsightsoftware found appears incredibly comprehensive. Someone has even ported it as a third-party Deno module: https://deno.land/x/jsonc It has an MIT license, so I'm wondering if Deno's standard library can leverage the existing implementation, with appropriate copyright attribution. Alternatively: does parsing JSONC have any reason to belong in the standard library, compared to folks reusing Microsoft's implementation? If backwards compatibility is important, and if there is reason for Deno to keep support, then merging the two APIs via method overloading seems feasible, despite the existing divergent
Aside: Deno's |
@mmuddasani @bjesuiter Thanks for considering jsonc support with retained comments.
Microsoft's jsonc module looks too complex to me. It looks exposing too much details about internals, such as scanner, parser, visiter, etc. I don't think the users need to access such lower level primitives. Also that is not aligned to other std modules such as yaml, toml, etc. If the goal here is to keep the comments, then how about following |
@kt3k Nice! The lib I found was just the first one I found for the limited purpose I was looking. |
I agree that the api of comment-json is very nice & simple to understand. (As long as you don't need to dig deeper into the representation). But I agree, using Symbols seems smart for me in this case, since we avoid a complete parseTree. @kt3k do you agree with "importing" this lib into deno std or do you simply propose usage of this lib? |
I imagine the only downside with incorporating the API of npm:comment-json as-is into deno std is the pollution of the global Symbol registry. Symbols are a very neat idea, and I imagine they can still be effectively used without resorting to I also imagine it would be re-implemented, to avoid its external dependencies like npm:esprima. |
Because we have If we include features like cc @ayame113 Do you have any opinion or advise to this issue? |
I'm in favor of including features like I think If you decide to add these to std...
console.log(typeof "hello world"); //=> string
console.log(typeof new String("hello world")); //=> object
// object is parsed like...
{
key: "value",
[Symbol.for("comments")]: {
key: {
before: {...},
after-prop: {...},
after-colon: {...},
after-value: {...},
after: {...},
},
},
};
// array is parsed like...
[
"foo",
"bar",
"baz",
[Symbol.for("comments")]: [
{
before: {...},
after-value: {...},
after: {...},
},
...
]
]; |
@ayame113 I am in agreement with both of your suggestions. I would also propose (crudely atm) something as follows: export const comments_api: unique symbol = Symbol("comments api")
export interface CommentsApi {
// methods to read/modify/write JsoncValue and consistently maintain comments
// including order, before/after, maybe whitespace indentation, etc.
}
export type JsoncValue =
| string
| number
| boolean
| null
| (
(
| JsoncValue[]
| { [key: string]: JsoncValue | undefined }
) & { [comments_api]: CommentsApi }
) For fun, I'll be prototyping an API. I'll keep an open eye for additional feedback / suggestions along the way, and for anyone else who wants to make it happen! If I can't come up with anything more friendly than the |
I discovered that In other news, I've so far been examining the types of comments that |
@kt3k @ayame113 Is it possible and/or worthwhile to implement the core business logic in Rust as part of the Pros
Cons
|
Core team is not in favor of exposing jsonc related features as Deno CLI feature. I think using Rust is not an option for this issue. |
Is the unicodeSets feature ( At the very least, I've developed an appreciation for parsing complexity and can follow along better with the current tokenizer logic. Since last week, I've discovered how to tokenize text input using modern Unicode-aware regexes. For example, it takes less than 400 characters of code to parse JSONC files when const get_jsonc_tokenizer = () =>
/([\p{ID_Continue}.+\-]+|[[\p{Pattern_Syntax}]--[\/\x22.+\-]]|[\r\n \t]+|\x22[^\x22\\]*(?:\\(?:$|[^])[^\x22\\]*)*(?:$|\x22))|\/(?:\/[^\r\n]*(?:$|\n|\r\n?)|\*[^*]*(?:$|\*+(?:$|\/|[^*\/][^*]*))+)?/gvy
function parse(text: string, { allowTrailingComma = false as const } = {}) {
return JSON.parse(text.replaceAll(get_jsonc_tokenizer(), "$1"))
} (Caveat: When parsing fails due to SyntaxError, positions inside error message do not correspond to positions of the original text.) And an example for tokenizing input: function* tokenize(text: string) {
let match
for (match of text.matchAll(get_jsonc_tokenizer())) yield match[0]
const position = match ? match.index! + match[0].length : 0
if (position !== text.length) {
const token = text[position]
const error = `Unexpected token ${token} in JSONC at position ${position}`
throw new SyntaxError(error)
}
} How It WorksThe regexp loosely considers a JSONC token to be the largest non-empty prefix of one of these:
When combined with sticky mode, The example regexp was designed to vet more identifier-like tokens, numbers, punctuation, and string escapes than JSON allows. Relying on The example regexp was also designed with a single capture group for all non-comment tokens, allowing (Side note: Using |
To avoid comment loss by default, perhaps the below scenarios provide a sensible default mode. In order to assess for any unforeseen ambiguity/unknowns, I will try to prototype an API and implementation. Perhaps the MVP should not care about preserving whitespace formatting beyond what is consistent with If a client program cares enough about whitespace formatting, then the client program could either:
Scenario: Allow deleting/replacing hereditarily uncommented values without a trace.Example before: { "k.keep": { "k.old": "v.old", }, "k.delete": "v.delete" } Example after modifications (whether done via a replace/update/rename): { "k.keep": { "k.new": "v.new", } } Scenario: Allow replacing/updating/renaming comment-containing values by preserving relevant portions as documentation.Example before: {"k.keep":/*keep.start*/{/*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/},//keep.end
/*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/} Example after a kept key is updated via value replacement: {
"k.keep":
// /*keep.start*/{/*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/},//keep.end
{ "k.new": "v.new", },
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
} Example after a kept value is updated: individual key-value pairs are deleted, new individual key-value pairs are introduced: {
"k.keep":/*keep.start*/{
// /*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/
"k.new": "v.new",
},//keep.end
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
} Example after a kept value has its keys and/or values renamed (i.e. to preserve semantic placement of associated comments): {
"k.keep":/*keep.start*/{/*a*/"k.renamed"/*b*/:/*c*/"v.renamed"/*d*/,/*e*/},//keep.end
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
} |
Describe the bug
esm.sh is currently unable to update an existing deno.jsonc file. See: esm-dev/esm.sh#686.
While implementing, it came to light, that the jsonc parser in the deno std lib seems to discard comments.
(see this issue, comment by farsightsoftware: esm-dev/esm.sh#687)
I've not checked it myself yet, but I asked farsightsoftware to add an example to this issue.
The main usecase for this std lib is currently updating the deno.jsonc file, so it would make it a lot easier for all tools dealing with the deno.jsonc file if it would preserve comments automatically.
Steps to Reproduce
TODO
Expected behavior
The jsonc parser output is able to be stringified again and produce identical output to the original jsonc input.
In diagram:
jsonc file
=> jsonc parser
=> some intermediate representation
=> jsonc stringifier
=> a file identical to the original jsonc file
(especially with comments)
Environment
TODO
The text was updated successfully, but these errors were encountered: