Skip to content

Optimize getDiagnostics for speed #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 4, 2018

Conversation

TysonAndre
Copy link
Contributor

Add a simple benchmarking script

Extreme optimization, sacrificing code style in some areas.

  • This is meant for discussion, since the project is
    in the optimization phase. Avoiding nested generators did help performance.

This commit reduces the time needed to getDiagnostics
from 0.013 to 0.007 seconds. (On php 7.2.0RC5 with opcache enabled for CLI. Sample file was src/Token.php)

Average time to generate diagnostics (before): 0.013...
Average time to generate diagnostics (after):  0.006921
time to parse (Without diagnostics):           0.036374

(E.g. getDiagnostics previously took 36% of the time compared to generating an
AST, and now it takes 19% of that time.)

* @return string[]
*/
private static function initTokenKindToText() {
self::$tokenKindToText = \array_flip(\array_merge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could someone explain to me what the advantage of array_flip(array_merge([...])) is?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a hashmap that can be looked up in O(1)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's cool. Nice trick, will use that when I need it :)

* @return string
*/
public static function getTextForTokenKind($kind) {
if (!isset(self::$tokenKindToText)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we simply initialise this after the class declaration, like a static constructor in other languages would?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the maintainers are fine with it, I can do that and it would work. I wasn't sure if you had any style guidelines, e.g.

http://www.php-fig.org/psr/psr-1/

Files SHOULD either declare symbols (classes, functions, constants, etc.) or cause side-effects (e.g. generate output, change .ini settings, etc.) but SHOULD NOT do both.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't consider a static constructor a side effect, as it only initialises the class defined in this file. It won't change any state outside and it's impossible to get into the uninitialised state.

Ideally of course, this would not use static properties at all, but only instance properties (and if some property is supposed to be long-lived than put it on an object that is supposed to be long-lived)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would be ok with initializing it after the class declaration.

src/Node.php Outdated
}
} elseif ($child instanceof Token) {
yield $child;
}
}
}

/**
* Iterate over all descendant Nodes and Tokens, calling $callback.
* This can often be faster than getDescendantNodesAndTokens if you just need to call something and don't be a generator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear when to use one or the other. Is it as simple as "convenient" vs "faster"? Should this be documented in the public API?

@TysonAndre
Copy link
Contributor Author

By public API, do you mean the markdown file with method descriptions?

Also, I should have written "don't need"

@roblourens
Copy link
Member

Yeah - I don't want to confuse people by documenting two methods which do basically the same thing. But, if this is significantly faster than the other one, then it makes sense for it to be documented.

Btw thanks for tackling this!

@roblourens
Copy link
Member

Sorry that this was left behind. I don't think the new method should be documented as public API at the moment. There are several methods that return generators, and we need to be consistent - we would need to decide, do we stick to generators officially, or provide alternates for all of them, or give up on generators entirely?

If you remove that and fix the merge conflict, I'll merge this.

@felixfbecker
Copy link

I wonder what the real-project perf impact of this is. In general I think the Generator/Iterator pattern is better than the visitor pattern.

@TysonAndre would you be willing to make a PR for the language server to use this and check the impact?

Add a simple benchmarking script

Extreme optimization, sacrificing code style in some areas.

- This is meant for discussion, since the project is
  in the optimization phase.

This commit reduces the time needed to getDiagnostics
from 0.013 to 0.007 seconds.

Average time to generate diagnostics (before): 0.013...
Average time to generate diagnostics (after):  0.006921
time to parse (Without diagnostics):           0.036374

(E.g. getDiagnostics previously took 36% of the time compared
to generating an AST,
and now it takes 19% of that time.)
@TysonAndre TysonAndre force-pushed the optimize-getDiagnostics-v2 branch from 1ea6850 to b6fb023 Compare December 20, 2017 05:57
@TysonAndre
Copy link
Contributor Author

I'm not familiar with the benchmarks used for felixfbecker's php-language-server. https://github.com/felixfbecker/php-language-server/search?utf8=%E2%9C%93&q=benchmark&type= doesn't reveal anything.

My own use case was similar to the benchmark I included in experiments/ (Parse the AST, then immediately call getDiagnostics.

The summary of that was mentioned in the PR description: getDiagnostics previously took 36% of the time compared to generating an
AST, and now it takes 19% of that time.)

@jens1o
Copy link
Contributor

jens1o commented Dec 20, 2017

@TysonAndre
Copy link
Contributor Author

31.6 vs 31.4 seconds for php-language-server (Excluding the checks of tolerant-php-parser), there was a lot of variance, though.
I'd guess that parsing the AST is cheaper than everything else that's being done in php-language-server (That was the case for my application).

Using the same Benchmark.php with/without the updated composer.json and composer.phar update : https://github.com/felixfbecker/php-language-server/compare/master...TysonAndre:use-optimized-getDiagnostics?expand=1

With the change from this PR (second of two consecutive runs)

time php -d opcache.enable_cli=1 Performance.php                                                             
0
1000
2000
3000
4000
5000
6000
------------------------------
Time [drupal]: 17.724359035492
0
------------------------------
Time [wordpress]: 1.3051180839539
0
------------------------------
Time [php-language-server]: 0.28266787528992
0
------------------------------
Time [math-php]: 0.61766982078552
0
1000
2000
3000
------------------------------
Time [symfony]: 8.395693063736
0
------------------------------
Time [codeigniter]: 0.61100316047668
0
------------------------------
Time [cakephp]: 2.2087850570679
php -d opcache.enable_cli=1 Performance.php  31.04s user 0.37s system 99% cpu 31.408 total

Without the change (second of two consecutive runs):

» time php -d opcache.enable_cli=1 Performance.php
0
1000
2000
3000
4000
5000
6000
------------------------------
Time [drupal]: 17.855414867401
0
------------------------------
Time [wordpress]: 1.3178341388702
0
------------------------------
Time [php-language-server]: 0.28475403785706
0
------------------------------
Time [math-php]: 0.62388896942139
0
1000
2000
3000
------------------------------
Time [symfony]: 8.4373149871826
0
------------------------------
Time [codeigniter]: 0.61493396759033
0
------------------------------
Time [cakephp]: 2.2180669307709
php -d opcache.enable_cli=1 Performance.php  31.23s user 0.38s system 99% cpu 31.614 total

@jens1o
Copy link
Contributor

jens1o commented Jan 1, 2018

I'd guess that parsing the AST is cheaper than everything else that's being done in php-language-server (That was the case for my application).

You may be interested in looking at felixfbecker/php-language-server#451

@roblourens
Copy link
Member

The TreeAnalyzer iterates the tree on its own and calls checkDiagnostics, instead of getDiagnostics, so I guess this wouldn't impact the language server's perf, is that right?

@roblourens
Copy link
Member

Well getDiagnostics is called by the language server when a document is opened or changed: https://github.com/felixfbecker/php-language-server/blob/master/src/Server/TextDocument.php#L138 so it would help there. But I don't think it will show an impact in that perf test.

@roblourens
Copy link
Member

Never mind, that's not the same getDiagnostics. DiagnosticsProvider::getDiagnostics is actually not called by the language server. The language server has the TreeAnalyzer which is already using a loop + recursion to walk the tree. https://github.com/roblourens/php-language-server/blob/master/src/TreeAnalyzer.php#L57

The changes to checkDiagnostics are good with or without the extra method to walk descendent nodes. If that other method was part of the parser's public API, then the language server would use it there, but I still don't want to support that. And it doesn't make sense for the language server to use getDiagnostics since it also has to walk the tree to collect definitions and references.

Anyway, I think it still makes sense to merge this, knowing that it doesn't really impact php-language-server's perf, but does help anyone else calling getDiagnostics a good amount (based on the original numbers).

Does this summary seem correct?

@TysonAndre
Copy link
Contributor Author

The summary seems correct.

The new method on Node isn't documented as part of the public API. I could add @internal if you felt it was necessary.

I assume there's nothing else that needs to be added?

@roblourens roblourens merged commit 42fe00b into microsoft:master Jan 4, 2018
@TysonAndre TysonAndre deleted the optimize-getDiagnostics-v2 branch February 17, 2018 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants