Skip to content

Conversation

@vladak
Copy link
Member

@vladak vladak commented May 1, 2023

When indexing AOPS, I noticed that the indexer has a tail in which only bunch of CPUs are executing. The stacks of these look like this:

"ForkJoinPool-1-worker-79" #3746 daemon prio=5 os_prio=0 cpu=1774894.92ms elapsed=1995.46s tid=0x00007efd9c09f800 nid=0x5d51f runnable  [0x00007efbbdedc000]
   java.lang.Thread.State: RUNNABLE
        at org.opengrok.indexer.analysis.plain.XMLXref.yylex(XMLXref.java:918)
        at org.opengrok.indexer.analysis.JFlexXref.write(JFlexXref.java:489)
        at org.opengrok.indexer.analysis.TextAnalyzer.writeXref(TextAnalyzer.java:82)
        at org.opengrok.indexer.analysis.plain.XMLAnalyzer.analyze(XMLAnalyzer.java:84)
        at org.opengrok.indexer.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:627)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1156)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1782)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$373/0x00007efdb56edd68.apply(Unknown Source)
        at java.util.stream.Collectors.lambda$groupingByConcurrent$59(java.base@11.0.18-ea/Collectors.java:1304)
        at java.util.stream.Collectors$$Lambda$375/0x00007efdb56ec908.accept(java.base@11.0.18-ea/Unknown Source)
        at java.util.stream.ReferencePipeline.lambda$collect$1(java.base@11.0.18-ea/ReferencePipeline.java:575)
        at java.util.stream.ReferencePipeline$$Lambda$376/0x00007efdb56ebcb0.accept(java.base@11.0.18-ea/Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@11.0.18-ea/ForEachOps.java:183)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@11.0.18-ea/ArrayList.java:1655)
        at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.18-ea/AbstractPipeline.java:484)
        at java.util.stream.ForEachOps$ForEachTask.compute(java.base@11.0.18-ea/ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(java.base@11.0.18-ea/CountedCompleter.java:746)
        at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.18-ea/ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.18-ea/ForkJoinPool.java:1020)
        at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.18-ea/ForkJoinPool.java:1656)
        at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.18-ea/ForkJoinPool.java:1594)
        at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.18-ea/ForkJoinWorkerThread.java:183)

and are likely related to #907 or #3740.

By adding xref timeout to XMLAnalyzer#analyze() similarly to what is already done in PlainAnalyzer#analyze(), this reduced the indexing time significantly (from 1 hour and 40 minutes to 35 minutes).

There are bunch of other analyzers that override the analyze() method that do not use the xref timeout, however XMLAnalyzer is probably one that suffers from this problem most. At this point I don't see a way of pushing the xref timeout upwards, because the analyze() methods differ. E.g. PlainAnalyzer#analyzer() runs ctags, that has its own timeout.

Example of a AOSP file causing the timeout: /AOSP/hardware/qcom/sm8150/display/config/qdcm_calib_data_sw43404_amoled_cmd_mode_dsi_boe_panel_with_DSC.xml (1.8MB). This one has a long strings of hexadecimal numbers as XML element values.

This change merely works around the problems referenced by the issues above so a fix should still be done to the XMLAnalyzers' lexical parser not choke on certain files.

Lastly, there is probably some refactoring potential to move the CompletableFuture and the code setting the timeout to a method, possibly rework the XrefWork class (sic!), avoiding code duplication in PlainAnalyzer and XMLAnalyzer.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 1, 2023
@vladak vladak merged commit 4e4d206 into oracle:master May 1, 2023
@vladak vladak deleted the xml_analyzer_timeout branch May 1, 2023 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

indexer OCA Verified All contributors have signed the Oracle Contributor Agreement. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant