Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: Free-floating comments are single-biggest issue when manipulating the AST #20744

Open
griesemer opened this issue Jun 21, 2017 · 41 comments
Assignees
Labels
Refactoring Issues related to refactoring tools
Milestone

Comments

@griesemer
Copy link
Contributor

Reminder issue.

The original design of the go/ast makes it very difficult to update the AST and retain correct comment placement for printing. The culprit is that the comments are "free-floating" and printed based on their positions relative to the AST nodes. (The go/ast package is one of the earliest Go packages in existence, and this is clearly a major design flaw.)

Instead, a newer/updated ast should associate all comments (not just Doc and Comment comments) with nodes. That would make it much easier to manipulate a syntax tree and have comments correctly attached. The main complexity with that approach is the heuristic that decides which comments are attached to which nodes.

@griesemer griesemer added this to the Unreleased milestone Jun 21, 2017
@griesemer griesemer self-assigned this Jun 21, 2017
@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jun 21, 2017 via email

@griesemer
Copy link
Contributor Author

@jimmyfrasche Thanks for your comment. I do have a design and prototype that is close, but not good enough for gofmt purposes. I'd be interested in how you integrate your comment nodes in the rest of a syntax tree.

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jun 21, 2017 via email

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jun 21, 2017 via email

@griesemer
Copy link
Contributor Author

@jimmyfrasche Thanks for the clarification. Because /* */ style comment can appear anywhere, they do add complexity as it's not always clear if they belong to the token before or after.

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jun 21, 2017 via email

@jimmyfrasche
Copy link
Member

I suppose one way around that would be, in expressions, to always have comments come after (where, for binops "after" is defined as after the binary operator so that

a /* 1 */ + /* 2 */ b /* 3 */

stores 1 on a, 2 on +, and 3 on b.

Stuff like CallExpr would need to store multiple comments for code like

f /* 1 */ () /* 2 */

and statements would need before and after comments for

/* before */ return /* after - but if expression is returned comment is attached to that */

With that and the "comments are nodes" approach plus the extra CommentGroup on the File, I think that would get everything.

Verbose and filled with cases but explicit and easy to manipulate.

@griesemer
Copy link
Contributor Author

@jimmyfrasche My approach is similar. On the one extreme, each token could have a potential comment attached; and on the other, there's one comment for each node. The former is too expensive in terms of representation but would probably preserve comment positioning perfectly; and the latter is relatively cheap but may not preserve all comments in place (depending on heuristics). The stored position is helpful, but relying on it introduces the same problems we have already. Thus, any solution will have to work based on the graph alone, and ignore positions (they are used only in the beginning for the heuristics to determine how to associate a node).

@quasilyte
Copy link
Contributor

Are there any advices in how this can be done today?

In my experience, trailing comments are particularly hard to handle.
Suppose a tool that sorts constants in alphabetical order:

const (
  C // C comment
  A
  B // B comment
)

With manual Pos update and format.Node, the best result I managed to achieve is:

const (
  A
  // C comment
  C 
  // B comment
  B
)

I am sorry if this is just me being lame.

#17108 had some progress, but it is unclear if it has any relation to this.

@ianlancetaylor
Copy link
Member

@quasilyte This issue is about fixing the problem. Let's not complicate the issue with a discussion of how to deal with the problem before it is fixed. Please take that discussion to a forum; see https://golang.org/wiki/Questions. Thanks.

@matthewmueller
Copy link

matthewmueller commented Jul 20, 2018

Hey guys, I'm wondering if I can help push this one along in some way or if there are any workarounds to this issue? I keep finding reasons to modify, append and delete pieces of the AST, but adding and positioning comments ends up biting pretty hard.

Right now I use text/template to generate Go code, but that doesn't work well if you want to make adjustments to existing code. My current use-case is that I want to automatically adjust function signatures and add new functions based on schema changes, without breaking or overriding these function's handwritten bodies. This kind of code surgery is really difficult to pull off in Go right now.

I'm sure you guys have thought about this a ton already, so I'm not sure how helpful this is, but I think the AST changes would have the following characteristics:

  • comment nodes implement ast.Decl and ast.Stmt, so they can be added anywhere
  • free-floating comments (e.g. comment nodes whose next sibling isn't an ast.GenDecl) would be scattered throughout the AST, in the same way declarations and statements are.
  • attached comments (on top of a declaration or to the side of a field), would be considered owned by the node and live in the Doc field.
  • ast/printer uses gofmt's spacing rules, rather than token position

And then, any changes to these comment nodes would be reflected when ast/printer traverses the AST and prints the code out in a consistent format. That would be amazing ✨

@griesemer
Copy link
Contributor Author

@matthewmueller This sounds about right to me, but the devil is in the details.

I think we would want each AST node to have an additional comment field (that is usually a nil pointer) and that field would then be used to attach comments that "belong" (are associated) with that node. There may be additional information about the comment's position needed (as in "above", "before", "after", or "below" the node's line) for good positioning. Such associated comments would be always attached with the relevant nodes. (It's not clear to me that we can just use the existing Doc fields and expand the concept of Doc fields to all nodes given that those have specific meaning at the moment.) Additionally, as you have observed, "free-floating" (non-associated) comments would need to act like Decls and Stmts (and presumably those would always have an empty line before and after them as otherwise they would be associated with a node.

It should be possible to do this is an backward-compatible way: Add a new *Comment (or similar) field to all nodes, and define the respective data structure. Then, a new function could take the existing list of comments from the parser and associate them with the nodes. This is where a good heuristic is needed (I've had a prototype for this a couple of years back but I didn't get to make it complete). Finally, go/printer would need to be adjusted to position comments based on node association (I'd copy go/printer and create a 2nd version that is appropriately modified - this would make sure the existing code will continue to run as always).

The hard part is getting the comment association heuristic work well.

@josharian
Copy link
Contributor

@matthewmueller some concrete examples of cases that are currently mishandled: #21755, #22371.

@dave
Copy link
Contributor

dave commented Sep 15, 2018

I'm currently working on a project that attempts to address this problem.

github.com/dave/dst is a fork of the go/ast package with a few changes. The decorator package converts from *ast.File + *token.FileSet to *dst.File and back again.

All the position fields have been removed from dst so it's just the location in the tree that determines the position of the tokens. Decorations (e.g. comments and newlines) are stored along with each node, and attached to the node at various points. The intention is that any place gofmt will allow a comment / new-line to be attached, dst will allow this.

I've finished a very rough prototype that works pretty well. (Take a look at restorer_test.go - all the tests pass apart from FuncDecl now).

There's several special cases that it doesn't currently handle. Right now I'm generating much of the code, so the special cases are non-trivial to implement. (e.g. Look at FuncDecl - the func token from the Type field is rendered before Recv and Name). Over the next few weeks I'll refactor and handle the special cases.

As @griesemer points out a big problem is where to attach the decorations so as you manipulate the tree they remain attached to the node you were expecting. My algorithm probably needs improvement here too (see decorator_test.go), but I think it currently works well enough to be useful.

This issue is concerned with a replacement for the go/ast package - dst isn't necessarily aiming to be this, so I'll invite further conversation in the dst repo issues or in the #dst Gophers Slack channel.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/137076 mentions this issue: cmd/goforward: add a tool for moving packages

@dave
Copy link
Contributor

dave commented Oct 20, 2018

I'm pretty happy with how development of https://github.com/dave/dst has come on in the past few weeks.

My main yardstick is a test that converts the entire Go standard library from ast to dst and back to ast. It prints the resultant ast and compares with the original source. Using this I've found a host of edge cases which are now all fixed. It's possible there's more bugs but I think the standard library is a pretty good corpus of code, and it's now working with 100% fidelity.

I'm also a lot happier with the algorithm that attaches comments to nodes. It gives acceptable results in the vast majority of cases.

Over the next few weeks I'll be dog-fooding the system in a couple of projects, so I'll get a better idea of how it works in the real-world.

I would invite you all to take a look. I may make some tweaks to the API that controls adding / removing comments from the attachment points, and I'd very much appreciate feedback on this.

@stapelberg
Copy link
Contributor

I would invite you all to take a look. I may make some tweaks to the API that controls adding / removing comments from the attachment points, and I'd very much appreciate feedback on this.

I have just gotten a chance to test the dst package, and it indeed deals with newlines/whitespace much better than the go/ast package in that dst allows you to insert line breaks/blank lines before/after inserted AST nodes.

I haven’t had the need to touch comments for my use-case (yet?), but the whitespace controls are very valuable, and I did not run into any correctness issues with dst yet.

Thanks for the helpful package!

@dave
Copy link
Contributor

dave commented Jan 29, 2019

Thanks @stapelberg! I worked really hard on getting the output perfect. Let me know if you find anything unexpected and I'll look into it.

@matttproud
Copy link
Contributor

matttproud commented Jan 29, 2019 via email

@ZhengHe-MD
Copy link

ZhengHe-MD commented Jun 11, 2019

@dave the dst package just saved my life. it's very helpful! and the dstutil module makes the migration seamless.

@Zemnmez
Copy link

Zemnmez commented Jul 30, 2019

@dave really appreciate dst. this was such an issue for generating good code

@mitar
Copy link
Contributor

mitar commented Aug 3, 2023

What is the status of this? Anything which made the work here stuck somewhere?

@adonovan
Copy link
Member

adonovan commented Aug 7, 2023

What is the status of this? Anything which made the work here stuck somewhere?

https://go.dev/cl/429639 is as far as I have gotten; I plan to return to it later this year. There are a couple of failing tests in the formatter which I think I could fix in a few hours; after that I will be convinced that the approach solves the problem of faithfully recording the stream of tokens-and-comments so that AST modifications preserve that order. However, two questions remain, one easy, one hard:

  1. Is the 17% space overhead--even when the feature is disabled--acceptable?
  2. Is this new API adequate for refactoring tools? Preserving token-and-comment order may turn out to be both unnecessary and insufficient to preserve the logical association between comments and declarations or expressions, and I can imagine that the new representation could cause comments to migrate in surprising ways during refactoring because it is concerned more with order than with semantic relations. Perhaps we would get a better result by taking one approach for expressions and another for statements and declarations: the latter are more naturally line-oriented and may benefit from a representation that preserves comment relations such as above, below, and after-on-same-line. Further research is needed.

@KingPuiWong
Copy link

Hello, I would like to ask my code 130 lines, so write, why the function comments are not generated?
This is my code address: https://github.com/KingPuiW....
If you know how to solve it, I look forward to hearing from you. Thank you.

@iamemilio
Copy link

iamemilio commented Apr 2, 2024

This seems to be hanging. Is there some sort of expectation for when this may be ready to release? It would be nice to be able to depend on the standard library AST tools to manipulate syntax tree nodes rather than needing to roll your own or use third party tools. Are there are any stuck-points or questions that the community could help with?

In terms of preserving comment order for different field types, I think that the primary goal, from what I understand from the comments and my own use case, is to preserve the association between comments, spacing, and the lines of code (AST nodes) that they are associated with. With that in mind, I think your API that links node types to their positionally associated comments makes sense.

@adonovan
Copy link
Member

adonovan commented Apr 2, 2024

Not hanging, but it's the third priority on my to-do list; I hope to start on it this month. Glad to know this will be as useful to others as it will be to us.

@josharian
Copy link
Contributor

Glad to know this will be as useful to others as it will be to us.

It will! I hit #22371 again just now.

@adonovan
Copy link
Member

I hope to start on it this month.

Not even close, unfortunately. This will have to wait for go1.24.

@MattiasMartens
Copy link

This continues to be a sore spot in Go’s otherwise great tooling system. A parser whose output can’t re-generate the file it parsed?
This issue turned what could have been a twenty-minute task into one lasting several hours. It hadn’t occurred to me to imagine a widely-used library of a widely-used language prized for its pragmatic effectiveness would contain such a basic oversight.

@joa23
Copy link

joa23 commented Dec 17, 2024

@MattiasMartens - agreed - did you try https://github.com/dave/dst ?

@MattiasMartens
Copy link

@MattiasMartens - agreed - did you try https://github.com/dave/dst ?

Great recommendation, thanks. To rebuild the code from tree is not needed for my case, and I can work around the problem without too much difficulty. I was just very surprised to encounter it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactoring Issues related to refactoring tools
Projects
None yet
Development

No branches or pull requests