Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework oxc_prettier #5068

Open
Boshen opened this issue Aug 22, 2024 · 17 comments
Open

Rework oxc_prettier #5068

Boshen opened this issue Aug 22, 2024 · 17 comments
Labels
A-prettier Area - Prettier C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators

Comments

@Boshen
Copy link
Member

Boshen commented Aug 22, 2024

crates/oxc_prettier was my attempt at the prettier bounty.

I thought I could finish it in time, except the fact that I rushed too quickly without looking at all the requirements ... It was too late when I got blocked by printing comments.

In order to rework oxc_prettier, we need to understand at least:

As for the infrastructure, we already have most of the code:

Feel free to remove everything and start from scratch, and copy over the format code https://github.com/oxc-project/oxc/tree/main/crates/oxc_prettier/src/format

@Boshen Boshen added C-enhancement Category - New feature or request A-prettier Area - Prettier E-Help Wanted Experience level - For the experienced collaborators labels Aug 22, 2024
@parkin-lin

This comment was marked as off-topic.

@DonIsaac

This comment was marked as off-topic.

@Boshen
Copy link
Member Author

Boshen commented Sep 4, 2024

@leaysgur is writing a series of articles in preparation of this task:


I'm also working on comment attachments to unblock prettier.

@Boshen Boshen pinned this issue Sep 4, 2024
@leaysgur
Copy link
Collaborator

leaysgur commented Sep 5, 2024

I'm also working on comment attachments to unblock prettier.

Does this mean that you aim to implement Prettier equivalent which achieves with 60+ utils? 🫨

FYI: Babel also has relevant code(just 300 loc), but that did not seem to meet Prettier's requirement.

@Boshen
Copy link
Member Author

Boshen commented Sep 5, 2024

Does this mean that you aim to implement Prettier equivalent which achieves with 60+ utils? 🫨

We need to figure out what exactly is prettier doing with those 60+ utils 🥲

@IWANABETHATGUY
Copy link
Contributor

IWANABETHATGUY commented Sep 7, 2024

For those who are interested in algorithms under the hood, prettier is based on https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf,
https://prettier.io/docs/en/technical-details

@leaysgur
Copy link
Collaborator

It has been 3 weeks since I started reading the Prettier source code.
It's still far from being complete, but I'd like to leave some progress and summary here.

How to debug Prettier

There are 3 ways:

  • Playground
    • Enable "Debug" options in the sidebar
  • CLI
    • With --debug-* args
  • Node.js API
    • Under __debug exports

https://leaysgur.github.io/posts/2024/09/03/111109/

It is written in Japanese, but it is all code, so you can understand it. 😉

I also recommend to run node --inspect-brk the code with debugger and inspect it from Chrome DevTools.

How to handle comments

I will post some topics for discussion in a few days.

@leaysgur
Copy link
Collaborator

How to handle comments

As you may know, Prettier's formatting process consists of roughly 3 phases:

  • P1: original text > AST
  • P2: AST > Doc
  • P3: Doc > formatted text

Comments are collected in P1 and used in P2.

In P1:

  • It simply parses to the AST with comments in the output
    • However, there are also some adjustments made to the AST nodes
    • Some parsers, such as Babel, already attach comments to nodes at this stage
      • However, Prettier does not use them
      • The reason is to support parsers other than Babel
  • As the final step of P1 (more technically, the beginning of P2), comments are classified and attached to nodes
    • First, it starts by finding nearby nodes for each comment
    • Based on that, it determines the placement(ownLine, endOfLine, remaining) from the lines before and after each comment
    • Then, it handles about 30(!) known special patterns for each placement
    • Finally, it finishes using unique tie-breaking algorithm

As a result, some AST nodes have comments property with array of Comment extended with leading, trailing and few more props.

In P2 (I haven’t read the code in detail here yet),

  • When each node is converted into a Doc, comments are also converted into Docs
    • Therefore, how they are output seems to have already been decided in P1

In OXC, part of the necessary information is already implemented and can be obtained. / #5785

However, just like with Babel, that information may be different from what Prettier requires...


So, I think I’ve generally understood "what" Prettier is doing.

However, as for "why" Prettier does it that way, I can only say it’s because that’s Prettier’s opinion.

Incidentally, there seem to be at least around 120 issues related to JS/TS at the moment, but

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug

about 50 of them are related to comments, with some remaining unresolved since as far back as 2017.

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug+label%3Aarea%3Acomments

@leaysgur
Copy link
Collaborator

So, what I would like to discuss(confirm?) is: where should oxc_prettier aim to go?
As the name suggests, should it strive for 100% compatibility with Prettier?

If the goal is to achieve compatibility, I think it would mean porting almost all of Prettier’s lines.

I believe the original aim of this issue was this, but again, is this an acceptable?

If we decide to proceed, let’s consider how we might specifically approach it...

Fortunately, the 3 phases have isolated inputs and outputs, so:

  • If we have AST + Comments (in .json), we can create a Doc
  • If we have a Doc (in .json), we can generate formatted text

I think reducing the gaps between each phase will be important.

For that, we’ll need some way to directly compare the result of Prettier.parse() with our implementation.

// OXC
let oxcASTWithComments = oxc_prettier::parse(text);
let oxcEstreeASTWithComments = estreeJSON(oxcASTWithComments);
// let oxcBabelASTWithComments = babelJSON(oxcASTWithComments);

// Prettier
let prettierEstreeASTWithComments = Prettier.parse(text, { parser: "meriyah" });
// let prettierBabelASTWithComments = Prettier.parse(text, { parser: "babel" });

// Diff!
assert(oxcEstreeASTWithComments, prettierEstreeASTWithComments);

What if we don’t aim for full compatibility?

  • For example, we could continue to use Prettier just as a reference?
    • But handle comments in our own way without focusing on compatibility
    • In my opinion as a user, it would be fine to simply restore comments as close to their original positions as possible without special handling
      • (More personally, I also dislike when parentheses are removed without permission)

But in that case, maybe it would be more like oxformat rather than oxc_prettier?

Anyway, this is what I was thinking these days. What do you think?

@Boshen
Copy link
Member Author

Boshen commented Sep 25, 2024

For the long run, I envision a more configurable and less opinionated formatter.

Under this assumption, we can relax the "100% compatibility" constraint.

My intention is to reach a high compatibility with prettier so we can bootstrap ourself, and then start deviating behavior.

I've also looked at the prettier and rustfmt issues before, comment positioning is a really difficult subject due to the natural of matching the original intent of the comment.

To move things forward, I suggest @leaysgur to start refactoring our prettier code in preparation for what to come, I believe you know more about prettier than I do. You may ask @Sysix for help given they have started working on this area as well.

Our code for P2 and P3 is also incomplete, I suggest to do a little bit of rework first, as well as adding more debug utilities.

@Boshen
Copy link
Member Author

Boshen commented Sep 25, 2024

I think reducing the gaps between each phase will be important.
For that, we’ll need some way to directly compare the result of Prettier.parse() with our implementation.

I don't think we need to do this. This will leave us too coupled with the prettier implementation, and it may end up being a waste of effort.

We are already matching half of the generated text, a few more iterations of refactoring and implementing details should close our gap to prettier.

@leaysgur
Copy link
Collaborator

I see, what I likely concerned about was: what exactly is meant by "high compatibility". 😅
Does it have to be 100%, or should aim for at least 80% if possible, or is something around 70% good enough? etc...

So, It's good to know that we're not necessarily aiming for 100% compatibility.

It's still uncertain how much the percentage will drop if we give up on porting completeness for comment handling, but for now, I'll move forward.

Following your suggestion:

  • Understanding and refactoring the oxc_prettier
  • Continuing with code reading of Prettier from P2

I’ll work on these whenever I have time! 🚀


For the long run, I envision a more configurable and less opinionated formatter.

I love this idea!

@magic-akari
Copy link
Collaborator

For the long run, I envision a more configurable and less opinionated formatter.

Does this mean that oxc_prettier will provide a wide range of configurable options, and offer a preset called prettier to match with prettier?

@leaysgur
Copy link
Collaborator

How to handle comments

(Follow up of #5068 (comment))

As I posted above, comments are collected and attached to AST nodes in P1.
Then in P2, comments are printed as Doc along with other AST nodes.

Most comments are printed with their attached nodes like:

[leadingCommentsDoc, nodeDoc]
// or
[nodeDoc, trailingCommentsDoc]

https://github.com/prettier/prettier/blob/e3b8a16c7fa247db02c483fb86fc2049d45763b8/src/main/ast-to-doc.js#L128-L138

But the rest of the comments are handled as needed.

  • Dangling comments
  • Not dangling(leading or trailing) but need special treatment

There are about 40 files for printing ESTree AST to Doc.

https://github.com/prettier/prettier/tree/main/src/language-js/print

And 15 files of them print comments on demand.

❯ rg 'print(Dangling)?Comments(Separately)?' -l src/language-js/print
estree.js
class.js
type-annotation.js
function-parameters.js
mapped-type.js
component.js
module.js
function.js
ternary.js
array.js
binaryish.js
property.js
call-arguments.js
block.js
jsx.js
arrow-function.js
object.js
member-chain.js
type-parameters.js

@Boshen
Copy link
Member Author

Boshen commented Oct 1, 2024

@leaysgur I was just informed that biome_formatter is a port of the Prettier IR and printer infrastructure. It may be possible to reduce repeated work by using biome_formatter. More investigation for the rework!

So instead of using our current unfinished IR and printer, we can replace them with biome_formatter.

I pinged you on discord, for full context.

@leaysgur
Copy link
Collaborator

leaysgur commented Oct 2, 2024

Thanks for the pinning!

biome_formatter is a port of the Prettier IR and printer infrastructure.

Yes, and I just started reading through biome_formatter and biome_js_formatter, so the timing is perfect. 😄

I hadn’t considered it as a viable option for OXC (though I’m not entirely sure why), but if we shift our goal to generating biome_formatter's IR, it could significantly reduce the amount of work, maybe?

I’ll look into this soon (after finishing the regex parser rework), including:

@Boshen
Copy link
Member Author

Boshen commented Oct 2, 2024

but if we shift our goal to generating biome_formatter's IR

biome_formatter IR, aka format_element

I think it's the same thing as prettiers IR. Also Doc in our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-prettier Area - Prettier C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators
Projects
None yet
Development

No branches or pull requests

6 participants