Skip to content

JGit tuning for large repositories #4729

Open
@vladak

Description

@vladak

When indexing the linux Git repository from scratch with annotation cache enabled, the 2nd phase of history used the CPU sub-optimally:

Image

Observing the thread states, there are 20 blocked indexer threads (out of 48) with stack like this one:

"OpenGrok-index-worker-230" #230 prio=5 os_prio=64 cpu=3011328.50ms elapsed=17757.81s tid=0x0000000009f69000 nid=0x19e waiting for monitor entry  [0x00007fef738f0000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:592)
        - waiting to lock <0x00007fef8217d960> (a org.eclipse.jgit.internal.storage.file.WindowCache$Lock)
        at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:385)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:335)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:234)
        at org.eclipse.jgit.internal.storage.file.Pack.readFully(Pack.java:602)
        at org.eclipse.jgit.internal.storage.file.Pack.load(Pack.java:785)
        at org.eclipse.jgit.internal.storage.file.Pack.get(Pack.java:273)
        at org.eclipse.jgit.internal.storage.file.PackDirectory.open(PackDirectory.java:223)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:423)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:376)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:361)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:140)
        at org.eclipse.jgit.treewalk.CanonicalTreeParser.reset(CanonicalTreeParser.java:191)
        at org.eclipse.jgit.treewalk.TreeWalk.reset(TreeWalk.java:772)
        at org.eclipse.jgit.revwalk.TreeRevFilter.include(TreeRevFilter.java:121)
        at org.eclipse.jgit.revwalk.filter.AndRevFilter$Binary.include(AndRevFilter.java:104)
        at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:108)
        at org.eclipse.jgit.revwalk.RewriteGenerator.applyFilterToParents(RewriteGenerator.java:114)
        at org.eclipse.jgit.revwalk.RewriteGenerator.next(RewriteGenerator.java:72)
        at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:161)
        at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:625)
        at org.eclipse.jgit.revwalk.RevWalk.nextForIterator(RevWalk.java:1606)
        at org.eclipse.jgit.revwalk.RevWalk.iterator(RevWalk.java:1630)
        at org.opengrok.indexer.history.GitRepository.getFirstRevision(GitRepository.java:358)
        at org.opengrok.indexer.history.GitRepository.annotate(GitRepository.java:334)
        at org.opengrok.indexer.history.HistoryGuru.getAnnotationFromRepository(HistoryGuru.java:294)
        at org.opengrok.indexer.history.HistoryGuru.createAnnotationCache(HistoryGuru.java:1189)
        at org.opengrok.indexer.index.IndexDatabase.createAnnotationCache(IndexDatabase.java:1274)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1253)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$8(IndexDatabase.java:1887)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$751/0x00007fef76134168.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(java.base@11.0.7-internal/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7-internal/ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7-internal/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run(java.base@11.0.7-internal/Thread.java:834)

The indexer is running with:

3217:   /jdk/openjdk11/bin/java -server -XX:-UseGCOverheadLimit -Dorg.opengrok.indexer.
argv[0]: /jdk/openjdk11/bin/java
argv[1]: -server
argv[2]: -XX:-UseGCOverheadLimit
argv[3]: -Dorg.opengrok.indexer.history.Subversion=/usr/bin/svn
argv[4]: -Xmx64g
argv[5]: -XX:HeapDumpPath=/data/jvm
argv[6]: -Dorg.opengrok.indexer.history.Mercurial=/usr/bin/hg
argv[7]: -Dorg.opengrok.indexer.history.SCCS=/usr/bin/sccs
argv[8]: -Djava.util.logging.config.file=/tmp/tmputw77pge
argv[9]: -XX:+HeapDumpOnOutOfMemoryError
argv[10]: -jar
argv[11]: /opengrok/dist/lib/opengrok.jar
argv[12]: -R
argv[13]: /tmp/tmp5gwneixa
argv[14]: --renamedHistory
argv[15]: on
argv[16]: -r
argv[17]: dirbased
argv[18]: -G
argv[19]: -m
argv[20]: 256
argv[21]: --leadingWildCards
argv[22]: on
argv[23]: -c
argv[24]: /usr/local/bin/ctags
argv[25]: --connectTimeout
argv[26]: 8
argv[27]: -U
argv[28]: http://localhost:8080/source
argv[29]: -o
argv[30]: /opengrok/etc/ctags.config
argv[31]: -H
argv[32]: linux

Perhaps JGit can be tuned:

Namely the packedGitLimit option which is 10 MiB by default and given the size of the heap this is disproportional.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions