Open
Description
When indexing the linux Git repository from scratch with annotation cache enabled, the 2nd phase of history used the CPU sub-optimally:
Observing the thread states, there are 20 blocked indexer threads (out of 48) with stack like this one:
"OpenGrok-index-worker-230" #230 prio=5 os_prio=64 cpu=3011328.50ms elapsed=17757.81s tid=0x0000000009f69000 nid=0x19e waiting for monitor entry [0x00007fef738f0000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:592)
- waiting to lock <0x00007fef8217d960> (a org.eclipse.jgit.internal.storage.file.WindowCache$Lock)
at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:385)
at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:335)
at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:234)
at org.eclipse.jgit.internal.storage.file.Pack.readFully(Pack.java:602)
at org.eclipse.jgit.internal.storage.file.Pack.load(Pack.java:785)
at org.eclipse.jgit.internal.storage.file.Pack.get(Pack.java:273)
at org.eclipse.jgit.internal.storage.file.PackDirectory.open(PackDirectory.java:223)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:423)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:376)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:361)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:140)
at org.eclipse.jgit.treewalk.CanonicalTreeParser.reset(CanonicalTreeParser.java:191)
at org.eclipse.jgit.treewalk.TreeWalk.reset(TreeWalk.java:772)
at org.eclipse.jgit.revwalk.TreeRevFilter.include(TreeRevFilter.java:121)
at org.eclipse.jgit.revwalk.filter.AndRevFilter$Binary.include(AndRevFilter.java:104)
at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:108)
at org.eclipse.jgit.revwalk.RewriteGenerator.applyFilterToParents(RewriteGenerator.java:114)
at org.eclipse.jgit.revwalk.RewriteGenerator.next(RewriteGenerator.java:72)
at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:161)
at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:625)
at org.eclipse.jgit.revwalk.RevWalk.nextForIterator(RevWalk.java:1606)
at org.eclipse.jgit.revwalk.RevWalk.iterator(RevWalk.java:1630)
at org.opengrok.indexer.history.GitRepository.getFirstRevision(GitRepository.java:358)
at org.opengrok.indexer.history.GitRepository.annotate(GitRepository.java:334)
at org.opengrok.indexer.history.HistoryGuru.getAnnotationFromRepository(HistoryGuru.java:294)
at org.opengrok.indexer.history.HistoryGuru.createAnnotationCache(HistoryGuru.java:1189)
at org.opengrok.indexer.index.IndexDatabase.createAnnotationCache(IndexDatabase.java:1274)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1253)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$8(IndexDatabase.java:1887)
at org.opengrok.indexer.index.IndexDatabase$$Lambda$751/0x00007fef76134168.call(Unknown Source)
at java.util.concurrent.FutureTask.run(java.base@11.0.7-internal/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7-internal/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7-internal/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.7-internal/Thread.java:834)
The indexer is running with:
3217: /jdk/openjdk11/bin/java -server -XX:-UseGCOverheadLimit -Dorg.opengrok.indexer.
argv[0]: /jdk/openjdk11/bin/java
argv[1]: -server
argv[2]: -XX:-UseGCOverheadLimit
argv[3]: -Dorg.opengrok.indexer.history.Subversion=/usr/bin/svn
argv[4]: -Xmx64g
argv[5]: -XX:HeapDumpPath=/data/jvm
argv[6]: -Dorg.opengrok.indexer.history.Mercurial=/usr/bin/hg
argv[7]: -Dorg.opengrok.indexer.history.SCCS=/usr/bin/sccs
argv[8]: -Djava.util.logging.config.file=/tmp/tmputw77pge
argv[9]: -XX:+HeapDumpOnOutOfMemoryError
argv[10]: -jar
argv[11]: /opengrok/dist/lib/opengrok.jar
argv[12]: -R
argv[13]: /tmp/tmp5gwneixa
argv[14]: --renamedHistory
argv[15]: on
argv[16]: -r
argv[17]: dirbased
argv[18]: -G
argv[19]: -m
argv[20]: 256
argv[21]: --leadingWildCards
argv[22]: on
argv[23]: -c
argv[24]: /usr/local/bin/ctags
argv[25]: --connectTimeout
argv[26]: 8
argv[27]: -U
argv[28]: http://localhost:8080/source
argv[29]: -o
argv[30]: /opengrok/etc/ctags.config
argv[31]: -H
argv[32]: linux
Perhaps JGit can be tuned:
- https://gerrit.googlesource.com/jgit/+/refs/tags/v5.13.0.202108250949-m3/Documentation/config-options.md
- https://stackoverflow.com/questions/18221987/how-to-tune-egit-for-large-repositories
Namely the packedGitLimit
option which is 10 MiB by default and given the size of the heap this is disproportional.