-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synopsis & description characters are escaped #9728
Comments
I'd certainly expect dune to not touch any values inside string fields. opam doesn't. |
I had a look at this a while ago, so let me comment on this (actually, even some of the Dune authors have non-ascii characters in their names!). In short, this is a CST vs AST issue: assuming a UTF-8 encoded To support that properly in We already have 2 parsers (for AST and CST), but instead of doing AST -> CST (which would be a lossy operation), we instead materialize CSTs from ASTs (where we "invent" which concrete syntax was used) for performance reasons. The situation is described here: https://github.com/ocaml/dune/blob/3.12.2/src/dune_sexp/parser.ml#L5-L15 It describes several options: do the conversion the other way around (probably too slow), write another parser, GADT tricks. |
Can we make dune not parse anything inside double-quotes ie treat it as opaque binary data? |
Just to clarify one thing: Dune is not mangling user data here. If you put The thing you're seeing is an issue with To make an analogy with ocaml code, |
Here's a high-level overview of how I think it could be solved: First, we need tests for this. For example, add the following to
Template payloads should be covered too, Then, to have a working roundtrip through
The first point is false at the moment: in (for performance, it is probably critical that This conversion ( Some interesting tests we can write at that stage is something to ensure that So, at that stage we have step 1 implemented: a CST parser that hold unescaped strings in let pp_simple t = Cst.abstract t |> Option.value_exn |> Ast.remove_locs |> Dune_sexp.pp And it doesn't feel right to go back to |
There's also an "easy" alternative to that, which is to only change What do you think @rgrinberg ? |
Personally, I don't see a problem with this; what would be the downside with enforcing UTF-8 encoding of Dune files? |
I was thinking of Windows and especially #9396. But reading that conversation, it seems that we're ready to commit to just utf8, which is a great step forward. |
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Sticking to utf8 for encoding seems fine to me. We can always retroactively add an explicit toggle if it doesn't work for someone. |
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
* test: for non-ASCII characters (#9728) Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> * Update test/blackbox-tests/test-cases/formatting/non-ascii-characters.t Co-authored-by: Etienne Millon <etienne.millon@gmail.com> Signed-off-by: Alpha Issiaga DIALLO <alpha@tarides.com> * test: evaluation through comparison of non-ascii Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> * fix output Signed-off-by: Etienne Millon <me@emillon.org> --------- Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> Signed-off-by: Alpha Issiaga DIALLO <alpha@tarides.com> Signed-off-by: Etienne Millon <me@emillon.org> Co-authored-by: Etienne Millon <etienne.millon@gmail.com> Co-authored-by: Etienne Millon <me@emillon.org>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> Signed-off-by: Etienne Millon <me@emillon.org>
* fix: handle utf8 characters in the dune files(#9728) Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> Signed-off-by: Etienne Millon <me@emillon.org> * Add the changes file and revert the modification of the lexer Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> Signed-off-by: Etienne Millon <me@emillon.org> * Use uutf Also leaving a note about using the stdlib later. Signed-off-by: Etienne Millon <me@emillon.org> * changelog Signed-off-by: Etienne Millon <me@emillon.org> --------- Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com> Signed-off-by: Etienne Millon <me@emillon.org> Co-authored-by: Etienne Millon <me@emillon.org>
CHANGES: ### Added - Add link flags to to `ocamlmklib` for ctypes stubs (ocaml/dune#8784, @frejsoya) - Remove some unnecessary limitations in the expansions of percent forms in install stanza. For example, the `%{env:..}` form can be used to select files to be installed. (ocaml/dune#10160, @rgrinberg) - Allow artifact expansion percent forms (`%{cma:..}`, `%{cmo:..}`, etc.) in more contexts. Previously, they would be randomly forbidden in some fields. (ocaml/dune#10169, @rgrinberg) - Allow `%{inline_tests}` in more contexts (ocaml/dune#10191, @rgrinberg) - Remove limitations on percent forms in the `(enabled_if ..)` field of libraries (ocaml/dune#10250, @rgrinberg) - Support dialects in `dune describe pp` (ocaml/dune#10283, @emillon) - Allow defining executables or melange emit stanzas with the same name in the same folder under different contexts. (ocaml/dune#10220, @rgrinberg, @jchavarri) ### Fixed - coq: Delay Coq rule setup checks so OCaml-only packages can build in hybrid Coq/OCaml projects when `coqc` is not present. Thanks to @vzaliva for the test case and report (ocaml/dune#9845, fixes ocaml/dune#9818, @rgrinberg, @ejgallego) - Fix conditional source selection with `select` on `bigarray` in OCaml 5 (ocaml/dune#10011, @moyodiallo) - melange: fix inconsistency in virtual library implementation. Concrete modules within a virtual library can now refer to its virtual modules too (ocaml/dune#10051, fixes ocaml/dune#7104, @anmonteiro) - melange: fix a bug that would cause stale `import` paths to be emitted when moving source files within `(include_subdirs ..)` (ocaml/dune#10286, fixes ocaml/dune#9190, @anmonteiro) - Dune file formatting: output utf8 if input is correctly encoded (ocaml/dune#10113, fixes ocaml/dune#9728, @moyodiallo) - Fix expanding dependencies and locks specified in the cram stanza. Previously, they would be installed in the context of the cram test, rather than the cram stanza itself (ocaml/dune#10165, @rgrinberg) - Fix bug with `dune exec --watch` where the working directory would always be set to the project root rather than the directory where the command was run (ocaml/dune#10262, @gridbugs) - Regression fix: sign executables that are promoted into the source tree (ocaml/dune#10263, fixes ocaml/dune#9272, @emillon) - Fix crash when decoding dune-package for libraries with `(include_subdirs qualified)` (ocaml/dune#10269, fixes ocaml/dune#10264, @emillon) ### Changed - Remove the `--react-to-insignificant-changes` option. (ocaml/dune#10083, @rgrinberg)
I believe #10113 fixed this. |
CHANGES: ### Added - Add link flags to to `ocamlmklib` for ctypes stubs (ocaml/dune#8784, @frejsoya) - Remove some unnecessary limitations in the expansions of percent forms in install stanza. For example, the `%{env:..}` form can be used to select files to be installed. (ocaml/dune#10160, @rgrinberg) - Allow artifact expansion percent forms (`%{cma:..}`, `%{cmo:..}`, etc.) in more contexts. Previously, they would be randomly forbidden in some fields. (ocaml/dune#10169, @rgrinberg) - Allow `%{inline_tests}` in more contexts (ocaml/dune#10191, @rgrinberg) - Remove limitations on percent forms in the `(enabled_if ..)` field of libraries (ocaml/dune#10250, @rgrinberg) - Support dialects in `dune describe pp` (ocaml/dune#10283, @emillon) - Allow defining executables or melange emit stanzas with the same name in the same folder under different contexts. (ocaml/dune#10220, @rgrinberg, @jchavarri) ### Fixed - coq: Delay Coq rule setup checks so OCaml-only packages can build in hybrid Coq/OCaml projects when `coqc` is not present. Thanks to @vzaliva for the test case and report (ocaml/dune#9845, fixes ocaml/dune#9818, @rgrinberg, @ejgallego) - Fix conditional source selection with `select` on `bigarray` in OCaml 5 (ocaml/dune#10011, @moyodiallo) - melange: fix inconsistency in virtual library implementation. Concrete modules within a virtual library can now refer to its virtual modules too (ocaml/dune#10051, fixes ocaml/dune#7104, @anmonteiro) - melange: fix a bug that would cause stale `import` paths to be emitted when moving source files within `(include_subdirs ..)` (ocaml/dune#10286, fixes ocaml/dune#9190, @anmonteiro) - Dune file formatting: output utf8 if input is correctly encoded (ocaml/dune#10113, fixes ocaml/dune#9728, @moyodiallo) - Fix expanding dependencies and locks specified in the cram stanza. Previously, they would be installed in the context of the cram test, rather than the cram stanza itself (ocaml/dune#10165, @rgrinberg) - Fix bug with `dune exec --watch` where the working directory would always be set to the project root rather than the directory where the command was run (ocaml/dune#10262, @gridbugs) - Regression fix: sign executables that are promoted into the source tree (ocaml/dune#10263, fixes ocaml/dune#9272, @emillon) - Fix crash when decoding dune-package for libraries with `(include_subdirs qualified)` (ocaml/dune#10269, fixes ocaml/dune#10264, @emillon) ### Changed - Remove the `--react-to-insignificant-changes` option. (ocaml/dune#10083, @rgrinberg)
Expected Behavior
Dune format does nothing:
Actual Behavior
Dune formats does:
Reproduction
Specifications
dune
(output ofdune --version
): 3.12.1ocaml
(output ofocamlc --version
): 4.14.1Additional information
Discovered here: https://discuss.ocaml.org/t/why-can-t-i-create-a-project-with-non-ascii-characters/13865/12
@yawaramin says this shouldn’t happen for certain fields so they can still be human-readable.
The text was updated successfully, but these errors were encountered: