Conversation
a466ae5 to
0a705ca
Compare
|
Not exactly sure why this is failing CI. My previous update to this extension was essentially the exact same. The only difference is this update has a few more changes to the extensions grammar, etc. |
I suspect it is a problem with building the Tree-sitter grammar. We have had some grammars that take too long to build on our GitHub Actions runners. |
Even on my local machine (MacBook Pro M3 Max) it seems to get stuck compiling the Nu parser: I've been sitting here for several minutes and it still hasn't completed. |
Interesting, that explains why I was having trouble testing this. I looked at the logs and assumed it just didn't show when it completed compilng. I couldn't explain why sometimes the language syntax wasn't registered, then later on it showed as registered and finally worked. I guess it just took a long time to build. Is there any way to get more detailed diagnostics as to what is going on under the covers? I searched and searched but can't see any verbose logging options for dev extensions. |
|
Not sure what's going on. On my mac I can easil build tree-sitter-nu in about 10 seconds. 7 seconds for Maybe @JosephTLyons knows? |
|
I remember we had problems building the It seems that some of the Tree-sitter-nu changes that have happened since then have slowed down the compile again. |
|
Yup, I remember those changes fondly @maxbrunsfeld nushell/tree-sitter-nu#85. That was 100 PRs ago, wow time flies, lol. @blindFS has made quite a few changes recently, maybe they can help get the WASM build performance tolerable? |
|
@fdncred I'm looking at the Nu grammar, and I see it does something very odd where it loads this very big file of specific shell commands, and builds them directly in the grammar, as a token called This results in the lexer function having thousands and thousands of states for matching all of these specific strings. I don't think this is good. IMO, this code should be removed. It's not efficient to have a lexer match all of this long specific strings. I don't know why things like It looks like this code was introduced in nushell/tree-sitter-nu#161. That PR looks like at added about 1200 lex states. That's not the only reason the Nu lexer is huge, but is one big piece. |
|
To investigate, I'd look at the logs since nushell/tree-sitter-nu#85, and see which PRs caused the greatest increase in the number of cases in the generated I think the fix is going to involve simplifying the tokens by making things more permissive, or by extracting common tokens instead of having many very-similar tokens. |
|
Thanks for the investigation @maxbrunsfeld. The key to the text file was to get tree-sitter-nu to recognize the nushell commands with spaces in them. Each of those commands in the text file is a nushell command with spaces in them. We were having issues where "str substring" wouldn't be recognized properly and then also highlighting was broken where str was one color and substring was another. We're all for simplifying things but it would be nice be able to recognize commands with spaces in them as one command. That's all we could really figure out. There was also an attempt here nushell/tree-sitter-nu#160 and also an attempt here nushell/tree-sitter-nu#156 to get the highlighting working right. |
|
I think you can probably recognize commands with spaces in them in a generic way, as opposed to hard-coding a large set of specific commands. I'm guessing that Nu's production parser doesn't hard code the commands. |
That was a questionable PR, and I have tested the performance lost, the parsing speed dropped 7-8% on my machine. And we thought it was an acceptable trade off, so let that in. As for the command recognition, AFAIK, it's not possible to tell whether Anyway, I generally agree with you that generating a hard coded list of commands as rules is too much hack. Moreover the unquoted rules in the parser is also deeply involved, I once tried to migrate that to the external scanner, but that's a very heavy task and I gave up. You probably know some tricks to simplify that piece too. |
|
I guess I don't understand the desired behavior. |
Because the command name is |
|
This shouldn't be happening in the lexer/parser. There can be arbitrarily many commands, that take arbitrary subcommands. They should all just be parsed as commands with arguments. In particular, https://www.nushell.sh/commands/docs/str.html
|
|
At the end of the day, I don't object to what we call it, cmd + args, cmd only or whatever. However, I do care about how it works.
As an example, you can see the highlighting in the repl. Below you see the command in cyan, the argument range in magenta and the flags in blue. Again, I don't care about colors because that can be changed but you'll notice that the command is all the same color which is differently from the other parameters.
That's kind of a facade to help users understand better. BTW - I'm not arguing about things, just trying to help understand how nushell works. I'm up for changing whatever we need to in order to make things work better. |
|
Yeah, all of that can be done at a higher level than the parser, and that would better match what nushell is doing. Their parser does not have a hard coded list of commands like this, from what I can tell reading the code. I think we should look for ‘str replace’ and friends in the highlight query. the go-to-def is gonna be implemented in the language server, so I don’t think it’s affected by this. If it were, I think we could add semantic hand-written logic that treats certain known built-in commands specially. |
I just checked nushell/tree-sitter-nu#85 to learn how to optimize the rules. Seems to me the key is to reduce conflicts? Then this PR: nushell/tree-sitter-nu#139 introduced many new conflicts regarding common newline patterns. It might have something to do with the slow down of WASM compilation. It would be super helpful if you take some time to demonstrate how optimization could be done on current |
In case we might want to revert #161 as [discussed](zed-industries/extensions#2068 (comment))
|
@blindFS I think that the biggest issue for using Tree-sitter-nu in Zed is the time it takes to compile to WASM. For whatever reason, Clang's wasm backend takes much longer to compile the In order to reduce the lexer state count, the biggest opportunities I saw when I last looked were removing the many different copies of the In particular:
|
Is there any way to check the actual state count that directly affects the WASM compilation time? Or is directly testing
There are nuances of sets of allowed characters in them. For example, I'll try to test it on real-world nu scripts to see whether keeping those nuances is worth the cost. |
|
I think it is possible to unify these rules without introducing any parse errors in valid code, but I think it would take some experimentation. |
That is at the end of my wits. Would you please lend a hand? |
|
I'm going to go ahead and close this PR as it sounds like there is some improvements to tree-sitter-nu which need to happen as a pre-requisite. Feel free to open a subsequent PR in the future when it's ready to go. As always, thanks @maxbrunsfeld for your help with tree-sitter internals. Learning every day! |
|
Sorry @blindFS , it might be a little while before I can hack on Tree-sitter-nu, just because there is Zed stuff that we're launching that I need to focus on. For now, all I'd suggest is trying to make the grammar simpler by making it less restrictive. For example, maybe you don't need to explicitly forbid
Yeah, it's the number of |
Understood, I'll see what I can do before you have the leisure.
Someone already put lowest lexical precedence to the unquoted rules. But when it comes to, e.g. immediate
There's no way for the nu parser to get any close to the size of |
|
Yeah, an external scanner is an option too, although I wouldn't think it would be needed, just to match unquoted arguments. |
|
The latest changes that @blindFS has made, wasm is building in 2min 7sec, down from 6min 49sec originally. Is 2min fast enough or still too slow to pass this CI? |
Try it out in CI and we can see. |
|
Looks like it compiled! |
|
yay! all thanks to @blindFS |

Includes: