fix a few failures when printing a modified AST #2659

v-gb · 2025-02-27T14:27:25Z

No description provided.

v-gb · 2025-02-27T15:03:25Z

The failure might be because of carriage returns on windows. But before figuring out what to do, I'd like to know if the approach is reasonable.

Also it might be useful to see what comes out of ocaml/ocaml#13833, because the parser is producing loc_ghost in cases that don't make sense to me, so I'm not sure that the use of loc_ghost to distinguish modifications made to the AST always work (I got a strange test change when using loc_ghost in Fmt_ast.sequence_blank_line).

Julow · 2025-03-03T11:04:05Z

lib/Cmts.ml

        match Source.loc_of_first_token_at src whole_loc LBRACKETPERCENT with
-        | Some loc -> loc
+        | Some _ as o -> o


Perhaps Source.loc_of_first_token_at shouldn't attempt to read the source at all and always return None for ghost loc ? I don't see other impossible cases related to these functions returning None.

Also, this relocate_ext_cmts shouldn't raise errors at all as it's just using heuristics to move comments around. The impossible case should be removed.

I removed the impossible case.

The problem with checking loc_ghost earlier is that the upstream parser sets loc_ghost incorrectly, and the bug is inherited in the ocamlformat parser, and the result is blank lines or comments moving around.
Specifically, the tests that break use begin%test e end in one case and new%foo e in the other, and in both case, we're exercising the same parser semantic action that creates Pexp_extension with a ghost location, and the inner expression with a real location, whereas the opposite is what would be correct.

In general, I think the use of loc_ghost is not unreasonable, but the most principled solution would be to make the printer work using the ast only, without the side information from Source.t or Cmts.t.

This can be seen with this thought experimental: imagine that you're combining files with print (parse Structure file1 @ parse Structure file2). In principle, this should have the same effect as In_channel.read_all file1 ^ "\n" ^ In_channel.read file2. But in practice, either you set loc_ghost on one ast and lose part of the formatting, or you don't and instead risk the printer raising exceptions.

Whereas if the ast was self contained, the concatenated asts would get printed the same as the concatenation of the source files, modulo the whitespace at the boundary. I don't know how complex that would be, but in this function, it would probably mean that the Pexp_extension should have a loc_of_first_token field.

Right!

but the most principled solution would be to make the printer work using the ast only, without the side information from Source.t or Cmts.t.

That was once a goal but we're not there yet as it's a lot of work. The modified AST is closer to the concrete syntax than the upstream parser but not enough to stop reading the source yet.
Any work in that direction is very welcome :)

Comments is a much more complex story. Representing comments at every nodes in the AST was deemed too hard since the beginning.

This can be seen with this thought experimental: imagine that you're combining files with print (parse Structure file1 @ parse Structure file2).

In theory, you can concatenate the Cmts.t as well and not loose the comments.
For the source part, we could make use the filename field of a location but that's just adhoc. We definitely want to get rid of it.

But keep in mind that OCamlformat is a string -> string function. In a much simpler use-case than yours I made it a string -> (ast -> ast) -> string function and haven't had to deal with source and comments. Could that help ?

but the most principled solution would be to make the printer work using the ast only, without the side information from Source.t or Cmts.t.

That was once a goal but we're not there yet as it's a lot of work. The modified AST is closer to the concrete syntax than the upstream parser but not enough to stop reading the source yet. Any work in that direction is very welcome :)

Yes, clearly doing this everywhere would be a lot of work. But it seems like the kind of work where you chip away at it, and the most common problems get solved, and then you solve the long tail of remaining problems over time.

I don't think it requires bringing the AST close to the concrete syntax though, it only requires capturing the relevant information from the concrete syntax into the tree.
ocamlformat's AST is actually sufficient difficult to manipulate due to some of these choices that I ended up rewriting the AST into a more usable form, processing the ast the way I want, then rewriting the AST back into a "normal" ocamlformat ast. For instance, the fact that functions are represented by 3 nodes (Pexp_function, Pexp_let, Pexp_letop), that opens are represented by 2 nodes, that applications are represented by 4 nodes (Pexp_apply, Pexp_infix, Pexp_prefix, Pexp_indexedop and potentially Pexp_letop) complicate things significantly.

Comments is a much more complex story. Representing comments at every nodes in the AST was deemed too hard since the beginning.

This can be seen with this thought experimental: imagine that you're combining files with print (parse Structure file1 @ parse Structure file2).

In theory, you can concatenate the Cmts.t as well and not loose the comments. For the source part, we could make use the filename field of a location but that's just adhoc. We definitely want to get rid of it.

Ok, good to know!

I agree that comments are likely much harder. ocamlformat handles comments in this way:

parse string into ast

assign comments to ast locations

print ast, considering the comments assigned to the locations

Without being very familiar with these steps, I was thinking that perhaps the second step could turn comments into attributes like [@ocamlformat.comment_before] in various places. But it's very possible that it's not close to good enough.

Anyway, I don't anything urgent to solve, it's just that I've been thinking that using loc_ghost is limited in general. Given:

let x = 1 in let y = 1 in (x, y)

If you inlined y, you'd end up with an undesired empty line:

let x = 1 in (x, 1)

because the position between in and the tuple are 2 lines apart, and yet every node has loc_ghost=false.
Whereas if you Pexp_let stored both the loc_of_in and also loc_of_body (which is redundant with the body, in the ast produced by the parser, but not when you start to modify the ast), then this spurious empty line wouldn't show up.

But keep in mind that OCamlformat is a string -> string function. In a much simpler use-case than yours I made it a string -> (ast -> ast) -> string function and haven't had to deal with source and comments. Could that help ?

Oh yeah, that's what I'm doing as well, but it sometimes causes problems, like the exceptions fixed in this pull request. But I was thinking more generally about "what would it take for ocamlformat to fully support printing modified asts". If the modifications are simple, ocamlformat is pretty reasonable already.

it only requires capturing the relevant information from the concrete syntax into the tree.

That's exactly what we called "closer to the concrete syntax" so far. What we called "CST" in older issues is not a tree full of lexer tokens and snippets of source code.

We found that having different nodes that represent things with similar semantic but different syntax to be extremely useful (Pexp_function, Pexp_let, Pexp_letop or Pexp_apply, Pexp_infix, Pexp_prefix, Pexp_indexedop).
We could change ocamlformat's AST again if that could help your usecase, though we don't want to make ocamlformat harder to maintain.

because the position between in and the tuple are 2 lines apart, and yet every node has loc_ghost=false.

I don't think loc_ghost is relevant here. This is a case where we tried to not use Source and used line numbers instead. To fix this issue, we could perhaps record whether there's an empty line by adding a boolean into the Pexp_let.

test/unit/test_fmt_ast.ml

Julow · 2025-03-06T10:45:46Z

test/unit/test_fmt_ast.ml

+        (* Ideally we'd improve two things about this test:
+
+           - check the new string parses, to the same AST as the original one
+           - use ppx_expect, so we have a nicer workflow and more readable


I'd prefer cram tests. ppx_expect requires defining a library in Dune, which becomes installable and is annoying to deal with during releases. You can build an executable with Dune that uses ocamlformat-lib within a cram test.

I'm confused. Libraries are only installable if they have a public name, isn't it?

Libraries become installable (even private libraries) when they have a (package) field. This field might not be needed for ocamlformat right now but I had enough problems with ppx_expect and so much success with cram tests that I really don't want to go back.

Julow

Thanks!

@Julow

….28.1) CHANGES: ### Highlight - \* Support for OCaml 5.4 (ocaml-ppx/ocamlformat#2717, ocaml-ppx/ocamlformat#2720, ocaml-ppx/ocamlformat#2732, ocaml-ppx/ocamlformat#2733, ocaml-ppx/ocamlformat#2735, @Julow, @Octachron, @cod1r, @EmileTrotignon) OCamlformat now supports OCaml 5.4 syntax. Module packing of the form `((module M) : (module S))` are no longer rewritten to `(module M : S)` because these are now two different syntaxes. - \* Reduce indentation after `|> map (fun` (ocaml-ppx/ocamlformat#2694, @EmileTrotignon) Notably, the indentation no longer depends on the length of the infix operator, for example: ```ocaml (* before *) v |>>>>>> map (fun x -> x ) (* after *) v |>>>>>> map (fun x -> x ) ``` `@@ match` can now also be on one line. ### Added - Added option `module-indent` option (ocaml-ppx/ocamlformat#2711, @HPRIOR) to control the indentation of items within modules. This affects modules and signatures. For example, module-indent=4: ```ocaml module type M = sig type t val f : (string * int) list -> int end ``` - `exp-grouping=preserve` is now the default in `default` and `ocamlformat` profiles. This means that its now possible to use `begin ... end` without tweaking ocamlformat. (ocaml-ppx/ocamlformat#2716, @EmileTrotignon) ### Deprecated - Starting in this release, ocamlformat can use cmdliner >= 2.0.0. When that is the case, the tool no longer accepts unambiguous option names prefixes. For example, `--max-iter` is not accepted anymore, you have to pass the full option `--max-iters`. This does not apply to the keys in the `.ocamlformat` configuration files, which have always required the full name. See dbuenzli/cmdliner#200. (ocaml-ppx/ocamlformat#2680, @emillon) ### Changed - \* The formatting of infix extensions is now consistent with regular formatting by construction. This reduces indentation in `f @@ match%e` expressions to the level of indentation in `f @@ match`. Other unknown inconsistencies might also be fixed. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - \* The spacing of infix attributes is now consistent across keywords. Every keyword but `begin` `function`, and `fun` had attributes stuck to the keyword: `match[@A]`, but `fun [@A]`. Now its also `fun[@A]`. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - \* The formatting of`let a = b in fun ...` is now consistent with other contexts like `a ; fun ...`. A check for the syntax `let a = fun ... in ...` was made more precise. (ocaml-ppx/ocamlformat#2705, @EmileTrotignon) - \* `|> begin`, `~arg:begin`, `begin if`, `lazy begin`, `begin match`, `begin fun` and `map li begin fun` can now be printed on the same line, with one less indentation level for the body of the inner expression. (ocaml-ppx/ocamlformat#2664, ocaml-ppx/ocamlformat#2666, ocaml-ppx/ocamlformat#2671, ocaml-ppx/ocamlformat#2672, ocaml-ppx/ocamlformat#2681, ocaml-ppx/ocamlformat#2685, ocaml-ppx/ocamlformat#2693, @EmileTrotignon) For example : ```ocaml (* before *) begin fun x -> some code end (* after *) begin fun x -> some code end ``` - \* `break-struct=natural` now also applies to `sig ... end`. (ocaml-ppx/ocamlformat#2682, @EmileTrotignon) ### Fixed - Fixed `wrap-comments=true` not working with the janestreet profile (ocaml-ppx/ocamlformat#2645, @Julow) Asterisk-prefixed comments are also now formatted the same way as with the default profile. - Fixed `nested-match=align` not working with `match%ext` (ocaml-ppx/ocamlformat#2648, @EmileTrotignon) - Fixed the AST generated for bindings of the form `let pattern : type = function ...` (ocaml-ppx/ocamlformat#2651, @v-gb) - Print valid syntax for the corner case (1).a (ocaml-ppx/ocamlformat#2653, @v-gb) - `Ast_mapper.default_mapper` now iterates on the location of `in` in `let+ .. in ..` (ocaml-ppx/ocamlformat#2658, @v-gb) - Fix missing parentheses in `let+ (Cstr _) : _ = _` (ocaml-ppx/ocamlformat#2661, @Julow) This caused a crash as the generated code wasn't valid syntax. - Fix bad indentation of `let%ext { ...` (ocaml-ppx/ocamlformat#2663, @EmileTrotignon) with `dock-collection-brackets` enabled. - ocamlformat is now more robust when used as a library to print modified ASTs (ocaml-ppx/ocamlformat#2659, @v-gb) - Fix crash due to edge case with asterisk-prefixed comments (ocaml-ppx/ocamlformat#2674, @Julow) - Fix crash when formatting `mld` files that cannot be lexed as ocaml (e.g. containing LaTeX or C code) (ocaml-ppx/ocamlformat#2684, @emillon) - \* Fix double parens around module constraint in functor application : `module M = F ((A : T))` becomes `module M = F (A : T)`. (ocaml-ppx/ocamlformat#2678, @EmileTrotignon) - Fix misplaced `;;` due to interaction with floating doc comments. (ocaml-ppx/ocamlformat#2691, @EmileTrotignon) - The formatting of attributes of expression is now aware of the attributes infix or postix positions: `((fun [@A] x -> y) [@b])` is formatted without moving attributes. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - `begin%e ... end` and `begin [@A] ... end` nodes are always preserved. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - `begin end` syntax for `()` is now preserved. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - Fix a crash on `type 'a t = A : 'a. {a: 'a} -> 'a t`. (ocaml-ppx/ocamlformat#2710, @EmileTrotignon) - Fix a crash where `type%e nonrec t = t` was formatted as `type nonrec%e t = t`, which is invalid syntax. (ocaml-ppx/ocamlformat#2712, @EmileTrotignon) - Fix commandline parsing being quadratic in the number of arguments (ocaml-ppx/ocamlformat#2724, @let-def) - \* Fix `;;` being added after a documentation comment (ocaml-ppx/ocamlformat#2683, @EmileTrotignon) This results in more `;;` being inserted, for example: ```ocaml (* before *) print_endline "foo" let a = 3 (* after *) print_endline "foo" ;; let a = 3 ``` - Fix dropped comment in `if then (* comment *) begin .. end` (ocaml-ppx/ocamlformat#2734, @Julow)

Julow reviewed Mar 3, 2025

View reviewed changes

v-gb force-pushed the push-mwmryunvouso branch from 6d892ea to 3445f8f Compare March 5, 2025 23:04

Julow reviewed Mar 6, 2025

View reviewed changes

fix a few failures when printing a modified AST

5515457

v-gb force-pushed the push-mwmryunvouso branch from 3445f8f to 5515457 Compare March 6, 2025 15:34

Merge branch 'MAIN' into pr/2659

b631ad9

Julow approved these changes Mar 10, 2025

View reviewed changes

Julow merged commit 9a050fb into ocaml-ppx:main Mar 10, 2025
8 of 9 checks passed

davesnx pushed a commit to ocaml-mlx/ocamlformat-mlx that referenced this pull request May 6, 2025

fix a few failures when printing a modified AST (ocaml-ppx#2659)

676c1d1

Julow mentioned this pull request Oct 21, 2025

[new release] ocamlformat-lib, ocamlformat and ocamlformat-rpc-lib (0.28.0) ocaml/opam-repository#28752

Closed

Julow mentioned this pull request Oct 24, 2025

[new release] ocamlformat-lib, ocamlformat and ocamlformat-rpc-lib (0.28.1) ocaml/opam-repository#28774

Merged

fix a few failures when printing a modified AST #2659

fix a few failures when printing a modified AST #2659

Uh oh!

Conversation

v-gb commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

v-gb commented Feb 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Julow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v-gb commented Feb 27, 2025 •

edited

Loading