-
Notifications
You must be signed in to change notification settings - Fork 80
Optimize getDiagnostics for speed #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize getDiagnostics for speed #211
Conversation
src/DiagnosticsProvider.php
Outdated
* @return string[] | ||
*/ | ||
private static function initTokenKindToText() { | ||
self::$tokenKindToText = \array_flip(\array_merge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could someone explain to me what the advantage of array_flip(array_merge([...]))
is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a hashmap that can be looked up in O(1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that's cool. Nice trick, will use that when I need it :)
src/DiagnosticsProvider.php
Outdated
* @return string | ||
*/ | ||
public static function getTextForTokenKind($kind) { | ||
if (!isset(self::$tokenKindToText)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we simply initialise this after the class declaration, like a static constructor in other languages would?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the maintainers are fine with it, I can do that and it would work. I wasn't sure if you had any style guidelines, e.g.
http://www.php-fig.org/psr/psr-1/
Files SHOULD either declare symbols (classes, functions, constants, etc.) or cause side-effects (e.g. generate output, change .ini settings, etc.) but SHOULD NOT do both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wouldn't consider a static constructor a side effect, as it only initialises the class defined in this file. It won't change any state outside and it's impossible to get into the uninitialised state.
Ideally of course, this would not use static properties at all, but only instance properties (and if some property is supposed to be long-lived than put it on an object that is supposed to be long-lived)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would be ok with initializing it after the class declaration.
src/Node.php
Outdated
} | ||
} elseif ($child instanceof Token) { | ||
yield $child; | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Iterate over all descendant Nodes and Tokens, calling $callback. | ||
* This can often be faster than getDescendantNodesAndTokens if you just need to call something and don't be a generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite clear when to use one or the other. Is it as simple as "convenient" vs "faster"? Should this be documented in the public API?
By public API, do you mean the markdown file with method descriptions? Also, I should have written "don't need" |
Yeah - I don't want to confuse people by documenting two methods which do basically the same thing. But, if this is significantly faster than the other one, then it makes sense for it to be documented. Btw thanks for tackling this! |
Sorry that this was left behind. I don't think the new method should be documented as public API at the moment. There are several methods that return generators, and we need to be consistent - we would need to decide, do we stick to generators officially, or provide alternates for all of them, or give up on generators entirely? If you remove that and fix the merge conflict, I'll merge this. |
I wonder what the real-project perf impact of this is. In general I think the Generator/Iterator pattern is better than the visitor pattern. @TysonAndre would you be willing to make a PR for the language server to use this and check the impact? |
Add a simple benchmarking script Extreme optimization, sacrificing code style in some areas. - This is meant for discussion, since the project is in the optimization phase. This commit reduces the time needed to getDiagnostics from 0.013 to 0.007 seconds. Average time to generate diagnostics (before): 0.013... Average time to generate diagnostics (after): 0.006921 time to parse (Without diagnostics): 0.036374 (E.g. getDiagnostics previously took 36% of the time compared to generating an AST, and now it takes 19% of that time.)
1ea6850
to
b6fb023
Compare
I'm not familiar with the benchmarks used for felixfbecker's php-language-server. https://github.com/felixfbecker/php-language-server/search?utf8=%E2%9C%93&q=benchmark&type= doesn't reveal anything. My own use case was similar to the benchmark I included in experiments/ (Parse the AST, then immediately call getDiagnostics. The summary of that was mentioned in the PR description: getDiagnostics previously took 36% of the time compared to generating an |
31.6 vs 31.4 seconds for php-language-server (Excluding the checks of tolerant-php-parser), there was a lot of variance, though. Using the same Benchmark.php with/without the updated composer.json and With the change from this PR (second of two consecutive runs)
Without the change (second of two consecutive runs):
|
You may be interested in looking at felixfbecker/php-language-server#451 |
The TreeAnalyzer iterates the tree on its own and calls checkDiagnostics, instead of getDiagnostics, so I guess this wouldn't impact the language server's perf, is that right? |
Well getDiagnostics is called by the language server when a document is opened or changed: https://github.com/felixfbecker/php-language-server/blob/master/src/Server/TextDocument.php#L138 so it would help there. But I don't think it will show an impact in that perf test. |
Never mind, that's not the same The changes to Anyway, I think it still makes sense to merge this, knowing that it doesn't really impact php-language-server's perf, but does help anyone else calling Does this summary seem correct? |
The summary seems correct. The new method on Node isn't documented as part of the public API. I could add I assume there's nothing else that needs to be added? |
Add a simple benchmarking script
Extreme optimization, sacrificing code style in some areas.
in the optimization phase. Avoiding nested generators did help performance.
This commit reduces the time needed to getDiagnostics
from 0.013 to 0.007 seconds. (On php 7.2.0RC5 with opcache enabled for CLI. Sample file was src/Token.php)
(E.g. getDiagnostics previously took 36% of the time compared to generating an
AST, and now it takes 19% of that time.)