-
Notifications
You must be signed in to change notification settings - Fork 185
Autocomplete speedup #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autocomplete speedup #451
Conversation
Codecov Report
@@ Coverage Diff @@
## master #451 +/- ##
============================================
- Coverage 79.38% 79.36% -0.03%
- Complexity 837 868 +31
============================================
Files 56 56
Lines 1979 2069 +90
============================================
+ Hits 1571 1642 +71
- Misses 408 427 +19
|
Nice! I have a hard time believing the memory usage though - exactly the same amount of bytes used, even though there is a new array that duplicates the whole index? I wonder whether we could just always use the namespace tree. |
I've added an other commit to ceate an index representation of the global definitions (reduces the number of checks, depends on your dependencies (10 times faster on my project)). However, I have some other memory consumption metrics : on this branch : |
I saw that you had a discussion with the maintainer of php-integrator, it could be a good idea to store the indexes in a sqlite DB too instead of in-memory and serialized files (will reduce the RAM consumption IMHO). |
I switched to using this in vscode and it is significantly faster using 466MiB compared to 454MiB before. Before it meant this was unusable because autocompletes were over a second. So I'll keep this branch installed for now so I can actually use the add-on to see if I like it. |
Could you try always using the tree? I could imagine it is not significantly slower than the map |
Hello @felixfbecker , I'm not sure to get what you're saying 🤔 $this->definitions['Symfony\Component\HttpFoundation\Request'] => [
'Symfony\Component\HttpFoundation\Request->getLocale()' => $definition
] and to get rid of the |
I mean to get rid of |
Oh yeah I see, I'll try it then ;) |
OK @felixfbecker , I added 3 commits to save memory, here are the consumptions (on my testing project):
I also yield indexed definitions to save memory instead of copying them in an array when using |
Does using Generators have an impact on the time? I remember from looking at them earlier, that they can add a surprising overhead. |
@roblourens , here are some time records grabbed for the accessor ( With generators :
without :
These benchmarks were executed in the same way than the ones I posted on the description of this PR. |
Generators should actually not decrease memory usage because PHP variables are copy-on-write. But it will reduce time because you save an iteration. |
Iterator/Generator also can safe memory when you no longer need to build whole arrays but work item per item (stream like) |
That's true if the source is an IO stream, but here the source is an already in-memory array. |
src/Index/Index.php
Outdated
@@ -15,11 +15,19 @@ class Index implements ReadableIndex, \Serializable | |||
use EmitterTrait; | |||
|
|||
/** | |||
* An associative array that maps fully qualified symbol names to Definitions | |||
* An associative array that maps namespaces to | |||
* an associative array that maps fully qualified symbol names to global Definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example what an entry (the keys) could look like? Does it use a nested array for each namespace component?
src/Index/Index.php
Outdated
* @param string $namespace | ||
* @return Definition[] | ||
*/ | ||
private function doGetDefinitionsForNamespace(string $namespace): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of this method over getDefinitionsForNamespace()
? The method name doesn't indicate any difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid logic duplication because I'm using it twice in this class :
- https://github.com/nicolasmure/php-language-server/blob/6ba3d6c0c52cd9177d68a3701ef81159dda13aa9/src/Index/Index.php#L130
- https://github.com/nicolasmure/php-language-server/blob/6ba3d6c0c52cd9177d68a3701ef81159dda13aa9/src/Index/Index.php#L145
Moreover, this private function returns an array, in opposition to the public function getDefinitionsForNamespace
which returns a generator. This array is then used in other public functions of this class to yield definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fully see the need for this since it's only $this->namespaceDefinitions[$namespace] ?? []
but if you want to keep it please rename it to something like getDefinitionArrayForNamespace
- the current name has no semantic difference to the non-do
version
@staabm yes and it is definitively the case here : we're just looping over it, using each item one by one (no need to know how many items are in the iterator, nor to access to the ones next to the one currently used). @felixfbecker as the index used by the |
Master: 454MiB |
@robclancy yes, make sure that the cache has finished to be populated too before making any mesurements. The last commits have changed the cache version so the server has to rebuild all the cache items, so memory consumption will start from 0 to x until it has completed the project parsing. You can see when it has finished to built by watching at CPU consumption and also by listing the cache directory by last modification date : $ watch -n 0.5 ls -c ~/.phpls/ |
I'm getting my numbers from the output in vscode after the indexing is done. |
Master: [Info - 11:55:37 pm] All 8950 PHP files parsed in 25 seconds. 454 MiB allocated. So my numbers above are correct for this project. |
I now have... This branch: [Info - 1:02:48 am] All 8950 PHP files parsed in 17 seconds. 460 MiB allocated. This is more inline with your numbers, I'm not sure what happened before as this has come from the same cache. |
src/Index/Index.php
Outdated
* an associative array that maps fully qualified symbol names | ||
* to global Definitions, e.g. : | ||
* [ | ||
* 'Psr\Log\LoggerInterface' => [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment says the first level contains namespaces, but this is an interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right it's a bit confusing. What about renaming the namespaceDefinitions
to fqnDefinitions
? It's also confusing because there's already a
getDefinition(string $fqn, bool $globalFallback = false)
function where the $fqn
parameter is Psr\Log\LoggerInterface->log()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the thing is that everywhere in the app we're using $fqn
and referring to fully qualified namespace
, but we're using it for all symbols, ie Psr\Log\LoggerInterface->log()
and also Psr\Log\LoggerInterface
.
Which term could describe Psr\Log\LoggerInterface
? The left hand side of access operator
? Quite long for a name 😝 . What's your opinion on this point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fqn
stands for fully qualified name, not namespace 😉
src/Index/Index.php
Outdated
* [ | ||
* 'Psr\Log\LoggerInterface' => [ | ||
* 'Psr\Log\LoggerInterface->log()' => $definition, | ||
* 'Psr\Log\LoggerInterface->debug()' => $definition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a full tree? e.g.
[
'Psr' => [
'\Log' => [
'\LoggerInterface' => [
'->log()' => $definition
]
]
]
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it worth it as we don't need this kind of representation for now. The actual representation is mainly used to get definitions of the resolved class directly (w/o having to parse its namespace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with "mainly"? Autocompletion for classes needs to get all classes in a namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's not a full tree, this should duplicate the prefix on the second level. That's redundant data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
I started to change the index to use a full tree as you suggested.
I still have to use two arrays to avoid definition collisions :
[
'Psr' => [
'\Log' => [
'\LoggerInterface' => $definition, // the non member definition
],
],
]
[
'Psr' => [
'\Log' => [
'\LoggerInterface' => [ // here it's an array, not a non-member definition
'->log()' => $definition, // the member definition
],
],
],
]
I haven't pushed it yet as the tests aren't passing on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great! Instead of two arrays, you can just use the empty string as a key for the LoggerInterface itself
src/Index/Index.php
Outdated
* @param string $namespace | ||
* @return Definition[] | ||
*/ | ||
private function doGetDefinitionsForNamespace(string $namespace): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fully see the need for this since it's only $this->namespaceDefinitions[$namespace] ?? []
but if you want to keep it please rename it to something like getDefinitionArrayForNamespace
- the current name has no semantic difference to the non-do
version
src/CompletionProvider.php
Outdated
@@ -310,26 +314,22 @@ public function provideCompletion(PhpDocument $doc, Position $pos): CompletionLi | |||
// Suggest global symbols that either | |||
// - start with the current namespace + prefix, if the Name node is not fully qualified | |||
// - start with just the prefix, if the Name node is fully qualified | |||
foreach ($this->index->getDefinitions() as $fqn => $def) { | |||
foreach ($this->index->getGlobalDefinitions() as $fqn => $def) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where that method is used and then the FQN is checked for a prefix. Why doesn't it make use of the namespace tree here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there was previously a loop on all definitions, with a check to keep only the global ones. By using this new function, we can reduce the loop size by having directly the global definitions (and so return a result faster).
I've added a commit to correct the terminology (as you pointed out @felixfbecker ). |
Hello @felixfbecker , any updates on this ? :) |
Omg, that would be so awesome! Please! 😅 |
src/Index/AbstractAggregateIndex.php
Outdated
|
||
/** | ||
* Returns a Generator providing an associative array [string => Definition] | ||
* that maps fully qualified symbol names to global Definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global meaning non-members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :)
src/Index/Index.php
Outdated
* An associative array that maps fully qualified symbol names to Definitions | ||
* An associative array that maps fully qualified names to | ||
* an associative array that maps fully qualified symbol names | ||
* to global Definitions, e.g. : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "global" is incorrect here.
Yes, so the method can simply traverse into |
In the [
'SomeNamespace' => [
'\SomeClass' => [
'' => $def1, // the `SomeNamespace\SomeClass` class itself
'->method()' => $def2, // a method of this class
],
],
] When calling 'SomeNamespace\SomeClass' => $def1
'SomeNamespace\SomeClass->method()' => $def2 This is acceptable when we know what to look for (ie when we have some parts of a namespace), but in that case, there are too many places to look at (current namespace, global functions, global consts), and there's not always a way to get the defs in these places easily / efficiently. This is why I tried to reduce the number of places to look at by having the 2 arrays. I agree that it's not pleasant to have these 2 arrays, maybe the I might look insistent on this part but I'd like to keep the perfs improvements for the most cases :) So if you have an idea on how we could get the definitions for that case efficiently with a simple index 👍 |
So in that case, we have to look at the top namespace (first level of the array) and the current namespace (some level x in the array), so just 2 places to look at. That would be bad if it searched recursively, but if we make the method only search one level, it should be equal perf better as the second member array (actually better, because that doesn't help with the top namespace). |
Yes, that's why the To be continued so... |
I might take a stab at this and if it turns out to be too complicated, we can use the two arrays for the meantime. |
I changed it to only return direct children in my branch https://github.com/felixfbecker/php-language-server/tree/autocomplet-speedup
looks good? :) |
Yup, if tests pass then 👍 :) It would be good to have some methods in the index (as the I haven't tried your branch yet, just took a look at it for the moment. Thanks for the completion bench 👍 |
@felixfbecker Did you already worked on your own branch? I'd be interested in this. |
Hello @felixfbecker :) |
I'm on the fence whether to merge this in this state or not, and I am guilty of procrastinating the decision. You are right, there are some changes that need to happen at the bottom of CompletionProvider to also not O(n) iterate the whole index (note tests are failing on my branch atm). Which shouldn't be that hard, it might even be easy to just rewrite that part from scratch. If that is done, then this PR would be a very clean improvement for both UX and software architecture. However: In it's current form I feel like this PR adds a lot of complexity, for improving the completion only for members. The CompletionProvider and DefinitionResolver are definitely the most complex parts of the whole server and low complexity is key for bugfreeness and a low barrier for contributions. We kind of went in circles here by starting with an extra array, then going to a tree, then going back to the separate array. My branch was an attempt to base on the state of the PR a few commits ago when it was still a tree, I just didn't have the time to finish it. By no means I want to discredit your work, I highly respect the work you put into this and the improvement on the UX site is awesome. Now if anyone wants to bring that pure-tree implementation over the finish line that would of course make the decision easier :) Curious to hear your thoughts Merry Christmas! |
Thank you for your answer, I agree with you, there were a lot of decision changes in this PR. Anyway, thank you for your attention on this PR and for sharing your thoughts 👍 , and also for the maintenance work you're doing here ;) |
To clarify, I only meant rewriting that very last section of CompletionProvider (that provides completion for global names). |
Current stable is basically unusable because of how slow it is... I haven't been following the code here but shouldn't the end user experience be more important than how complicated the code is? Since this isn't directly an add-on but installed by the add-on would it be possible to put this on a beta branch and then in the add-on have a "use beta language server" option? |
I made the requested change in my branch. Turns out I can't push commits to this PR, and I wouldn't really like to open a new PR because that would make it seem like I did all the hard work, not @nicolasmure. Benchmarks before and after ( Before:
After:
|
@felixfbecker - I also started to add support for the case below (no commits yet) use Microsoft\PhpParser as TheParser;
TheParser\No| However, I feel like the code is currently pretty hard to read. Would you be against adding a class for representing a class or namespace name, and which could provide useful operations such as:
The |
I just realized one thing - the current Index stores both |
@Declspeck For that we just need to make sure FQNs for namespaces cannot clash with classes, which I would do by simply having namespace FQNs end in a backslash |
Re: QualifiedName class, we already have to interact with a lot of implementations of those from the parser, phpDocumentor etc. so I found just using strings much easier (and probably more performant) so far instead of introducing a third one (and, for example, you don't need |
@felixfbecker Pure functions would also work, I'll try that. As for equals - for pure strings, you'd still need to use Adding a trailing backslash to namespaces seems like a good idea, I'll try that and see where it leads. |
Whenever you have something you feel like is ready to be merged I would encourage you to open a PR! For case insensitivity please see #69 |
I'd love to see this getting merged into the master branch in the very near future. The current version that's also being used with Visual Studio Code, using the auto-completion, takes to long to come up with suggestions. |
I could not edit this PR, so I opened a new PR: #599 |
Thanks for your fantastic work @Declspeck @nicolasmure, merged in 24388bc |
Thank you @felixfbecker and the contributors :D |
Hello,
This PR speeds up the autocomplete response time for the object access operator (
->
) and the static access operator (::
).On large projects (with many definitions indexed), these autocomplete features were long to respond because all the indexes were parsed. I added a new way to represent the indexed definition by their namespace (eg
Symfony\Component\HttpFoundation\Request
). So, when requesting for autocomplete definitions of a symbol, we can use this namespace index directly instead of parsing all the indexes.The new index key is stored in memory only and is not written in the cache file. It is rebuilt when reading the cache file, so it has no negative impacts on memory consumption.
Some metrics (using a symfony 3.1.10 project, see the first commits to figure out how they were recorded) :
Autocomplete requests on
$request->|
:before :
after :
Autocomplete requests on
Response::H|
:before :
after :
The time to find the items was reduced from ~2s to ~0.0002s, which is 10,000 times faster 🚀 .
It also reduces a lot the CPU consumption because less checks are performed.
I know that there are some feature requests to avoid some directories indexation, but I think that this PR could be useful too to speed up response times ;)