-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser gets paused when low_memory=false #85
Comments
So if you have low_memory=false then what happens is that searchcode will add processed files into a queue which occasionally is flushed to disk in a single transaction. When low_memory=true each one is written to disk when processed, which is slower (lots more disk IO) but since there is no buffering uses less memory. The indexing is currently limited to a single core so that is why you are seeing a pegged 25% (#34). The queue needs to be in place because with enough repository processors or complex files the indexer can lag behind the number of things to index. The lines in the log you are referring to show the inbuilt buffering portion of searchcode. You can however tweak these values through the properties file.
The first is the maximum number of files that will be added to the queue because the indexer pauses. By default it is set to 1,000 files. The second is the number of lines in those files. A count is kept of the number of lines that are in each file in the queue and when the sum of them is over the value the indexer is paused. NB this is a soft cap that can be breached a few times before the pause kicks in. The last is the number of lines that searchcode will read into a file for indexing. Generally file over 10,000 lines are not that useful for search results and just slow things down. This value dictates how much the soft cap of max_document_queue_line_size can be breached. If you have enough memory on the machine (and this is a case by case basis) you can increase the first two to have a larger buffer. When the indexer is busy and falling behind that could be because it ran into a bunch of complex/large files which cause it to do more work. However once it runs into a batch of files that it can work on quickly this can cause it to work on larger chunks for indexing. BTW the number of files that the indexer will work on at any one time is 1000
|
This is now configurable through the properties file with this change 1810b05 Let me know if you need more information on this one, otherwise feel free to close it down. |
Going to close this as the question has be answered. |
I still have this issue with initial indexing, using only file repo and settings like:
since two days it is same:
It seems to stuck on 120706 and not proceed with any further, the only solution I found to solve it is to initially index with low_memory = true Not sure what more I could provide you to make it possible to narrow down the case |
That is indeed odd. The only thing I can think of is that somehow the parser is choking on a piece of code and hitting the heap issue. Can you check the console output and see if you have a heap exception thrown? This would explain why it is stuck. If that is indeed the case you will either need to increase the amount of RAM that searchcode can get access to via Xmx setting or use low_memory=true. |
Log is on INFO level and that is the only message I'm getting (one set every second or so) |
So it is running then. That is indeed odd. I am going to take a guess here that getting a copy of the code to run this against is impossible. I will have a look over the logic and see if there is anything that can be done. For the moment you can try using low_memory=true to get past it. Would you be willing to try out a special build? Can you try increasing the amount of RAM using Xmx to searchcode? No idea what your current setting is but if you set it to 2G that may resolve the issue. |
Sure, just make a branch and I can run it. low_memory=true does the thing, it adds few files every second, while without it progress was not visible at all after initial ~1000 files. Xmx is already set to 8G, using java 64 and it never used more than 6G as I can tell. |
Is this using a GIT SVN or File repository type? |
about 30 file repos
…On 3 May 2017 at 23:12, Ben Boyter ***@***.***> wrote:
Is this using a GIT SVN for File repository type?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#85 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJ2zBhDpb9POwNkJ0rUFjxCHHv_P-_5Dks5r2PvigaJpZM4MiaRA>
.
|
OK cool. Can you build from the following branch https://github.com/boyter/searchcode-server/tree/issue85 Then start fresh and run it again. Make sure the logs are set to INFO level. What we are interested in is the addtional logging in the index thread b53dc62 We can grep the logs for "issue85" which will indicate which is the last file it ran into issues with and hopefully allow me to replicate whatever it is doing that is causing things to choke. |
seems to be doing normal job (repeating this procedure 89 times):
up to a point, then after indexing 89 files, then it goes nuts, and no issue85 appears anymore
I can confirm that it does parallel indexing, using different file repos (M:\ and C:\ appears) so I would say it is not doggy file but rather something with indexer actually choking, maybe it has something to do that I have more than one file repo? You could make indexer more verbose, but I don't think it will help us much. Also this 89 files is other number than last which was like 1005, and I reverted those settings to default:
I used *10 previously. There are no new, unfinished |
Would it be possible for you to report on what the last file you see is before the problem starts? IE have a look at the file that appears on the very last
and see if there is anything unusual about it. Also can you set the following property,
This will ensure that only a single thread will be processing the file repository. Preferable if you can do 1 file repository at a time waiting each time, or if you are able to identify which file from the start just that file. |
Limiting to single processor solves this issue, hope it will give you some ideas where it can be located (still over 30 file repos) |
That is an interesting result. A single file processor should make no difference, I only suggested it because it would make sure the logs were clean. How many were you running? Could you try pulling the latest build of branch issue85 and trying again please. This has a tweak to the file walker which may resolve the issue in full for you. You will need to to ensure though that you have the searchcode.properties file value of
This is the default, but don't override it to true. |
I was running 10. Maybe its just too much? Anyway with 10 it takes few secs to replicate this issue Again there is cycle:
And then about 50 lines like:
To finally start producing only this output:
|
So the Pausing parser output is to be expected. I think I know what the issue may be based on the above though. I suspect that the codeIndexLinesCount is not being reset when it is finding binary or minified files in this case. I will have a look to see if this is the cause. |
So the issue is indeed that codeIndexLinesCount is not being reset for some reason when there are multiple threads for the parsers. Odd because the calls to modify this state are synchronized. I have a feeling that the values being used to set and unset this value in this case are not in sync, and somehow it ends up not being able to ever reduce the value below 100000 which blocks the indexing. I am going to add a button to allow resetting of this value from within the admin page for this. Would it be possible for you to set the value
in the properties file to something larger? Say 1000000. If you have a lot of RAM this should have little impact on how things are running, and should allow it to get past your issue for the moment. I am going to scrutinize the code in question and add some more tests for it. |
I can survive with low_memory = true for the time being (as I did till now). |
Its on of those things that I am slowly decoupling over time and adding more test coverage. As with all software projects it has evolved over time and the internal design needs to change. Once I have the refactor in place I should be able to resolve a lot of these issues (crosses fingers). |
Should mention that the value is now displayed on the admin page as well, so its possible to identify this issue without having to look in the logs. |
Seems like no good news, with 10 file indexers and default values for memory taking options it get paused after a minute or two and never recovers.
|
So this has been added in. If you could have a look at the fine level logs (not clickable, just make the level fine) http://localhost:8080/admin/logs/?level=fine This might help diagnose the issue. I am looking to see if there are any that take longer then say the below example, FINE: Tue Sep 12 08:32:59 AEST 2017: ForkJoinPool.commonPool-worker-1 190: indexDocument:parallelStream:updateDocument:0.001057959 I would not expect there to be may above ~0.001 but if so there may be an issue there. Please ignore slowness from the "GIT Blame Info Time:" portion. I am aware of the performance issue there and working on it now. |
Stopped logging 10 minutes ago, attached file. |
Yep looks like that might be the issue. I will need to add some more debugging to be sure though. Will add that and comment back in here if you are able to verify for me. Sorry about this, but its the only way I can debug without actually looking at the repository that you are using. |
That is me sorry, because I can't show you the repository. I'm happy to help you to track it down. |
Ok going to start adding some very verbose logging to diagnose this one. Would you be comfortable with me logging out details such as the
I can probably get away with just the length but the others would be helpful as well. |
Have added additional details tot he fine log 35e3413 @quasarea if you could build from that commit on master and try it out then we should be able to either work out which file is causing the issue or rule out one source of the issue. What I am hoping to see is a file with something like the below,
But with it missing the |
This log has no new entry even that I left it running for like 30 minutes. There were 2 indexers per type set and quartz was set to 2 as well. there is exactly same amount of |
I did few runs like this, and it is totally file independent, like once it was indexing c# units test, other time some c++ project and other time just text files |
That is really really odd. It suggests that the issue is not with the indexing or the processing pipeline because they start and finish. When this happens if you look at the admin screen let me know what you see for the following values?
Although checking the above shows it looks fine. It just looks like the indexer is blocked for some reason. I have added some additional logging into master for this. |
Another aproach:
Console started (actually started second time as I wanted to not get index recreation into logs):
restarted service and run again: Config for that instance: |
Then I switched to low_memory=true and restarted service, whole thing started to indexing happily |
I have to wonder if this is something that only happens on Windows. I certainly have never encountered anything like this on macOS or GNU/Linux to date. While the code was written such that it should work on Windows its not the main target platform. I might have a go at digging out a Windows and give it a go there to see if I can replicate the issue. |
So I was able to reliably cause this to happen. I am also pretty confident I have the fix. Strictly speaking it is not a bug in the implementation. You were running into a Thread starvation issue.
I had assumed that quartz would yield the other sleeping threads (which happens when they hit the file limits). Going to fiddle around with the priority of quartz for this. To get around this you can edit the quartz.properties file in the root directory like the below. Any value up-to 100 should work.
Try it again. It should resolve it. |
Stupid Github. I didn't tell you to close this. Anyway the commit that closed this issue has some changes.
The above is not required to resolve it in your case @quasarea but it couldn't hurt to get the latest version anyway. I verified that this works by setting the number of quartz threads to 2 (one for the index and the other for everything else) and letting it run for a while. Going to stick this one in test since it looks like that has resolved the issue for me. |
Hmm, wanted to ask you how System Statistics / Threads relate to quartz threads, but never considered it as important :) actually now its around 66-69 but setting quartz to 50 definitely helped. I was considering issue with windows implementation too :) I will keep it running for a while and see if it still gets stuck, hopefully not ;) |
The system statistics just shows the estimated number of threads used by the JVM process, http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#activeCount() The internal web-server reserves some (about 20-30 I think) and there are about 3 background processes that searchcode uses internally which will impact the number. You can then configure the number of GIT/SVN/File processors to get a rough idea of how many you need. It actually will increase by as many CPU's as you have on occasion now that the indexing runs in parallel. If you have enough GIT/SVN/File processors and enough IO then you can actually peg 8 cores pretty easily (see below). In fact if you hit the "Pausing parser." message in the log frequently it suggests you need to improve the CPU of your server now :) |
I should mention a lot of the above text is now out of date. For example
The above check has been removed. Everything that is queued to be indexed will be done in a single batch now. The only time the queue will be limited is when it hits the soft caps of the below.
|
Going to make the call that this has been resolved. Certainly I could trigger or disable it at will, and with the new code that is no longer possible. |
Indexing seems much faster if low_memory=false but it makes indexing suspend and not sure how to push it further.
In log I can see:
INFO: Mon Mar 20 13:10:35 GMT 2017: codeIndexLinesCount 1148817 larger then 1000000
INFO: Mon Mar 20 13:10:32 GMT 2017: Pausing parser.
There is no such case with low_memory=true
Also with low_memory=true CPU usage is even 25% all the time (one of 4 cores I guess), but with low_memory=false it stops using CPU at all when the buffer hits limit.
The text was updated successfully, but these errors were encountered: