Skip to content

file descriptor leak when creating history cache with renamed files #2050

Closed
@vladak

Description

@vladak

Indexing https://github.com/oracle/solaris-userland (its internal Mercurial flavor) on post 1.1-rc21 bits (cset 0ab5df4 , so it should have the fix for #1993) on Solaris 11.3SRU (with Java 1.8.0_152-b16) with ulimit -n 1024 and

23869:	/usr/jdk/latest/bin/java -agentlib:jdwp=transport=dt_socket,server=y,address=80
argv[0]: /usr/jdk/latest/bin/java
argv[1]: -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y
argv[2]: -d64
argv[3]: -XX:-UseGCOverheadLimit
argv[4]: -Xmx8192m
argv[5]: -server
argv[6]: -Dorg.opensolaris.opengrok.history.NumCacheRenamedThreads=48
argv[7]: -Dorg.opensolaris.opengrok.history.noFetchWhenNotInCache=1
argv[8]: -Dorg.opensolaris.opengrok.history.Mercurial=/usr/bin/hg
argv[9]: -Dorg.opensolaris.opengrok.history.cvs=/usr/bin/cvs
argv[10]: -Dorg.opensolaris.opengrok.history.Subversion=/usr/bin/svn
argv[11]: -Dorg.opensolaris.opengrok.history.git=/usr/bin/git
argv[12]: -Djava.util.logging.config.file=/opengrok/log/configs/userland-s11u4
argv[13]: -jar
argv[14]: /usr/opengrok/bin/../lib/opengrok.jar
argv[15]: -G
argv[16]: -r
argv[17]: on
argv[18]: -H
argv[19]: --renamedHistory
argv[20]: on
argv[21]: -c
argv[22]: /usr/local/bin/ctags
argv[23]: -o
argv[24]: /opengrok/etc/ctags.config
argv[25]: -m
argv[26]: 256
argv[27]: --leadingWildCards
argv[28]: on
argv[29]: -R
argv[30]: /tmp/opengrok-conf.3XqnGc
argv[31]: -U
argv[32]: localhost:2424
argv[33]: -H
argv[34]: userland-s11u4
argv[35]: userland-s11u4

hits the file limit after some 500+ hg log commands executed. The symptom looks like this in the log:

2018-03-28 15:45:12.743+0200 SEVERE t68 Executor.exec: Failed to read from process: /usr/bin/hg
java.io.IOException: Cannot run program "/usr/bin/hg" (in directory "/ws-local/userland-s11u4"): error=24, Too many open files
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.opensolaris.opengrok.util.Executor.exec(Executor.java:174)
        at org.opensolaris.opengrok.history.MercurialHistoryParser.parse(MercurialHistoryParser.java:81)
        at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:592)
        at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:578)
        at org.opensolaris.opengrok.history.FileHistoryCache.doFileHistory(FileHistoryCache.java:110)
        at org.opensolaris.opengrok.history.FileHistoryCache.access$000(FileHistoryCache.java:64)
        at org.opensolaris.opengrok.history.FileHistoryCache$1.run(FileHistoryCache.java:535)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=24, Too many open files
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 12 more


2018-03-28 15:45:12.744+0200 FINE t80 Executor.exec: Executing command [/usr/bin/hg, log, --follow, --template, changeset: {rev}:{node|short}\nuser: {author}\ndate: {date|isodate}\ndescrip
tion: {desc|strip|obfuscate}\nmercurial_history_end_of_entry\n, components/mercurial/mercurial-GENFRAG.p5m] in directory /ws-local/userland-s11u4
2018-03-28 15:45:12.745+0200 WARNING t68 Executor.exec: Non-zero exit status -1 from command [/usr/bin/hg, log, --follow, --template, changeset: {rev}:{node|short}\nuser: {author}\ndate: {
date|isodate}\ndescription: {desc|strip|obfuscate}\nmercurial_history_end_of_entry\n, components/tcl/tk/test/results-32-i386.master] in directory /ws-local/userland-s11u4
2018-03-28 15:45:12.747+0200 FINE t79 Executor.exec: Executing command [/usr/bin/hg, log, --follow, --template, changeset: {rev}:{node|short}\nuser: {author}\ndate: {date|isodate}\ndescrip
tion: {desc|strip|obfuscate}\nmercurial_history_end_of_entry\n, components/net-snmp-57/sun/agent/modules/healthMonitor/healthMonitor.h] in directory /ws-local/userland-s11u4
2018-03-28 15:45:12.747+0200 WARNING t68 FileHistoryCache$1.run: doFileHistory() got exception 
org.opensolaris.opengrok.history.HistoryException: Failed to get history for: "/ws-local/userland-s11u4/components/tcl/tk/test/results-32-i386.master" Exit code: -1
        at org.opensolaris.opengrok.history.MercurialHistoryParser.parse(MercurialHistoryParser.java:85)
        at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:592)
        at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:578)
        at org.opensolaris.opengrok.history.FileHistoryCache.doFileHistory(FileHistoryCache.java:110)
        at org.opensolaris.opengrok.history.FileHistoryCache.access$000(FileHistoryCache.java:64)
        at org.opensolaris.opengrok.history.FileHistoryCache$1.run(FileHistoryCache.java:535)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Looking at the open files by the indexer after couple of hg log commands:

23869:	/usr/jdk/latest/bin/java -agentlib:jdwp=transport=dt_socket,server=y,a
  Current rlimit: 1024 file descriptors
   0: S_IFCHR mode:0620 dev:555,0 ino:70432698 uid:200839 gid:7 rdev:188,1
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/1
   1: S_IFCHR mode:0620 dev:555,0 ino:70432698 uid:200839 gid:7 rdev:188,1
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/1
   2: S_IFCHR mode:0620 dev:555,0 ino:70432698 uid:200839 gid:7 rdev:188,1
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/1
   3: S_IFCHR mode:0666 dev:554,0 ino:76021788 uid:0 gid:3 rdev:145,12
      O_RDWR|O_LARGEFILE FD_CLOEXEC
      /devices/pseudo/mm@0:zero
      offset:0
   4: S_IFREG mode:0644 dev:285,65538 ino:181125 uid:0 gid:2 size:66368847
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/rt.jar
      offset:32382518
   5: S_IFREG mode:0444 dev:285,65538 ino:201310 uid:0 gid:2 size:4170300
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/opengrok.jar
      offset:3183851
   6: S_IFSOCK mode:0666 dev:565,0 ino:47908 uid:0 gid:0 size:0
      O_RDWR
	SOCK_STREAM
	SO_REUSEADDR,SO_SNDBUF(49152),SO_RCVBUF(261760)
	sockname: AF_INET 10.163.127.79  port: 8000
	peername: AF_INET 10.163.127.79  port: 39230
	congestion control: newreno
	peer: sshd[22680] zone: global[0]
   7: S_IFREG mode:0644 dev:285,65548 ino:27082 uid:80 gid:80 size:0
      O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE
      /opengrok/log/userland-s11u4/opengrok0.0.log.lck
      offset:0
   8: S_IFREG mode:0644 dev:285,65548 ino:27083 uid:80 gid:80 size:134873
      O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
      /opengrok/log/userland-s11u4/opengrok0.0.log
      offset:134873
   9: S_IFREG mode:0644 dev:285,65538 ino:181142 uid:0 gid:2 size:1179101
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/ext/localedata.jar
      offset:1179101
  10: S_IFSOCK mode:0666 dev:565,0 ino:58131 uid:0 gid:0 size:0
      O_RDWR
	SOCK_STREAM
	SO_SNDBUF(16384),SO_RCVBUF(5120)
	sockname: AF_UNIX 
	peer: java[23869] zone: global[0]
  11: S_IFREG mode:0644 dev:285,65538 ino:181312 uid:0 gid:2 size:3860502
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/ext/cldrdata.jar
      offset:3860502
  12: S_IFREG mode:0444 dev:285,65538 ino:155809 uid:0 gid:2 size:2036104
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/ant.jar
      offset:309
  13: S_IFREG mode:0444 dev:285,65538 ino:201314 uid:0 gid:2 size:62125
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/jrcs.jar
      offset:7929
  14: S_IFREG mode:0444 dev:285,65538 ino:168706 uid:0 gid:2 size:23737
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/json-simple-1.1.1.jar
      offset:370
  15: S_IFREG mode:0444 dev:285,65538 ino:201305 uid:0 gid:2 size:674125
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/bcel-6.2.jar
      offset:669
  16: S_IFREG mode:0444 dev:285,65538 ino:201303 uid:0 gid:2 size:1621932
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/lucene-analyzers-common-7.2.1.jar
      offset:388
  17: S_IFREG mode:0444 dev:285,65538 ino:201304 uid:0 gid:2 size:2792335
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/lucene-core-7.2.1.jar
      offset:1844662
  18: S_IFREG mode:0444 dev:285,65538 ino:201307 uid:0 gid:2 size:384677
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/lucene-queryparser-7.2.1.jar
      offset:384
  19: S_IFREG mode:0444 dev:285,65538 ino:201306 uid:0 gid:2 size:249543
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/opengrok/lib/lucene-suggest-7.2.1.jar
      offset:381
  20: S_IFREG mode:0644 dev:285,65538 ino:177525 uid:0 gid:2 size:73202
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/ext/ucrypto.jar
      offset:32087
  21: S_IFREG mode:0644 dev:285,65538 ino:181122 uid:0 gid:2 size:649092
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/jsse.jar
      offset:1373
  22: S_IFCHR mode:0644 dev:554,0 ino:102236164 uid:0 gid:3 rdev:195,0
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:random
      offset:0
  23: S_IFCHR mode:0644 dev:554,0 ino:102236166 uid:0 gid:3 rdev:195,1
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:urandom
      offset:8
  24: S_IFCHR mode:0644 dev:554,0 ino:102236164 uid:0 gid:3 rdev:195,0
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:random
      offset:0
  25: S_IFCHR mode:0644 dev:554,0 ino:102236164 uid:0 gid:3 rdev:195,0
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:random
      offset:0
  26: S_IFCHR mode:0644 dev:554,0 ino:102236166 uid:0 gid:3 rdev:195,1
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:urandom
      offset:0
  27: S_IFCHR mode:0644 dev:554,0 ino:102236166 uid:0 gid:3 rdev:195,1
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/random@0:urandom
      offset:0
  28: S_IFREG mode:0644 dev:285,65538 ino:181308 uid:0 gid:2 size:116988
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/jce.jar
      offset:96505
  29: S_IFREG mode:0644 dev:285,65538 ino:181146 uid:0 gid:2 size:248726
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /usr/jdk/instances/jdk1.8.0/jre/lib/ext/sunpkcs11.jar
      offset:227440
  30: S_IFIFO mode:0000 dev:552,0 ino:148903 uid:80 gid:80 size:0
      O_RDWR
  31: S_IFIFO mode:0000 dev:552,0 ino:148741 uid:80 gid:80 size:0
      O_RDWR
  32: S_IFIFO mode:0000 dev:552,0 ino:148742 uid:80 gid:80 size:0
      O_RDWR
  33: S_IFIFO mode:0000 dev:552,0 ino:148751 uid:80 gid:80 size:0
      O_RDWR
  34: S_IFIFO mode:0000 dev:552,0 ino:148743 uid:80 gid:80 size:0
      O_RDWR
  35: S_IFIFO mode:0000 dev:552,0 ino:148752 uid:80 gid:80 size:0
      O_RDWR
  36: S_IFIFO mode:0000 dev:552,0 ino:148761 uid:80 gid:80 size:0
      O_RDWR
  37: S_IFIFO mode:0000 dev:552,0 ino:148753 uid:80 gid:80 size:0
      O_RDWR

...


 602: S_IFIFO mode:0000 dev:552,0 ino:149682 uid:80 gid:80 size:0
      O_RDWR
 603: S_IFIFO mode:0000 dev:552,0 ino:149691 uid:80 gid:80 size:0
      O_RDWR
 604: S_IFIFO mode:0000 dev:552,0 ino:149683 uid:80 gid:80 size:0
      O_RDWR
 605: S_IFIFO mode:0000 dev:552,0 ino:149692 uid:80 gid:80 size:0
      O_RDWR
 606: S_IFIFO mode:0000 dev:552,0 ino:149698 uid:80 gid:80 size:0
      O_RDWR
 607: S_IFIFO mode:0000 dev:552,0 ino:149693 uid:80 gid:80 size:0
      O_RDWR

so there is bunch of pipe descriptors opened.

The process tree looks like this at the moment of being stopped in the debugger:

            23826 sudo -u webservd /usr/opengrok/bin/reindex-project.ksh /opengrok/etc/opengrok.c
              23827 /usr/bin/ksh -p /usr/opengrok/bin/reindex-project.ksh /opengrok/etc/opengrok.co
                23841 /bin/sh /usr/opengrok/bin/OpenGrok indexpart userland-s11u4
                  23869 /usr/jdk/latest/bin/java -agentlib:jdwp=transport=dt_socket,server=y,address=80

i.e. there are no hg log processes running at the moment the snapshot of opened fds was taken.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions