Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maven indexing lucene renovation #3558

Merged
merged 7 commits into from
Apr 24, 2022
Merged

Conversation

mbien
Copy link
Member

@mbien mbien commented Feb 5, 2022

Check if the maven repo indexing module can be upgraded from apache lucene 5.x to 9.x.

  • lucene 9.x requires Java 11+, so we have to stick with 8.x
  • upgrading lucene requires upgrading maven-indexer to 6.10+
  • unfortunately it has to be done in one step since there don't appear to be any maven-indexer releases compatible with lucene 6.x or 7.x.

note:

  • code cleanup has its own commit, the interesting changes are in the other commits

progress:

...
[26,138s][info][class,load] org.apache.lucene.store.SimpleFSDirectory source: file:/home/mbien/NetBeansProjects/netbeans/nbbuild/netbeans/ide/modules/ext/lucene-core-3.6.2.jar
[26,138s][info][class,load] org.apache.lucene.search.BooleanQuery$TooManyClauses source: file:/home/mbien/NetBeansProjects/netbeans/nbbuild/netbeans/ide/modules/ext/lucene-core-3.6.2.jar
[26,138s][info][class,load] org.apache.maven.index.ArtifactInfoFilter source: file:/home/mbien/NetBeansProjects/netbeans/nbbuild/netbeans/java/modules/ext/maven/indexer-core-6.1.1.jar
[26,138s][info][class,load] org.apache.maven.index.updater.WagonHelper$WagonFetcher source: file:/home/mbien/NetBeansProjects/netbeans/nbbuild/netbeans/java/modules/ext/maven/indexer-core-6.1.1.jar
...
  • the same maven.apisupport tests are now timing out since they decided to download and index the entire maven central repo

@mbien mbien force-pushed the lucene-renovation branch from a6d54a7 to 086d6be Compare February 5, 2022 12:53
@mbien mbien changed the title lucene renovation maven indexing lucene renovation Feb 5, 2022
@mbien mbien force-pushed the lucene-renovation branch 2 times, most recently from 71170ed to f9fabe1 Compare February 11, 2022 16:45
@mbien mbien added Java [ci] enable extra Java tests (java.completion, java.source.base, java.hints, refactoring.java, form) Maven [ci] enable "build tools" tests Upgrade Library Library (Dependency) Upgrade labels Feb 11, 2022
@mbien mbien force-pushed the lucene-renovation branch from f9fabe1 to 98d4f33 Compare February 16, 2022 02:42
@mbien
Copy link
Member Author

mbien commented Feb 16, 2022

It turned out that the reason for findClassUsages() not returning any hits was maven-indexerproducing lower case based queries, while ClassDependencyIndexCreator encodes class dependencies in searchable Base64 encoded CRC32 Strings (-> case sensitive). This worked for some reason with old lucene (maybe searches were case insensitive?) but fails now after the lucene upgrade, since there is an asymmetry between the tokenizing phase and the query phase.

But even if it worked before, it is likely still a bug since it could produce false positives if I see this correctly. In any case a big part of the Base64 key space is wasted.

The hack 587084e simulates the old behavior and lets all tests pass locally.

potential fix: encode with Base32 instead (in lowercase, but this would be 7 chars (1 char longer) or find other solution preferably via lucene, unfortunately maven-indexer has it's NexusAnalyzer hard coded.

@ebarboni

This comment was marked as outdated.

@mbien mbien force-pushed the lucene-renovation branch 3 times, most recently from b7d1b18 to e14588c Compare February 18, 2022 09:49
@mbien
Copy link
Member Author

mbien commented Feb 19, 2022

regarding the timeout of NBMNativeMWITest
The junit tests expected the query to fail with a NPE in the indexer code

catch (NullPointerException ex) {
// This exceptions occurs during unit tests so default to LATEST_NBM_PLUGIN_VERSION
LOG.log(Level.WARNING, "Unable to get latest nbm-maven-plugin version number");
}
return LATEST_NBM_PLUGIN_VERSION;

post migration, the NPE does not occur. The query appears to work. However, this causes the download+indexing of the maven central repo. Which causes a timeout after 3 minutes.

So my question is: was the test originally supposed to download the index?

NPE suppression was added here: 57e21b8

suppressed exception for reference (pre-migration):

java.lang.NullPointerException: Cannot invoke "org.apache.maven.index.Indexer.createIndexingContext(String, String, java.io.File, java.io.File, String, String, boolean, boolean, java.util.List)" because "this.indexer" is null
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.addIndexingContextForced(NexusRepositoryIndexerImpl.java:272)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.loadIndexingContext2(NexusRepositoryIndexerImpl.java:335)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.access$700(NexusRepositoryIndexerImpl.java:122)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl$6.run(NexusRepositoryIndexerImpl.java:940)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl$6.run(NexusRepositoryIndexerImpl.java:937)
	at org.netbeans.modules.openide.util.DefaultMutexImplementation.writeAccess(DefaultMutexImplementation.java:229)
	at org.openide.util.Mutex.writeAccess(Mutex.java:252)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.iterate(NexusRepositoryIndexerImpl.java:937)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.getVersions(NexusRepositoryIndexerImpl.java:1153)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.getVersions(NexusRepositoryIndexerImpl.java:1148)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.lambda$getVersionsResult$4(RepositoryQueries.java:314)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.getQueryResult(RepositoryQueries.java:217)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.getVersionsResult(RepositoryQueries.java:311)
	at org.netbeans.modules.maven.apisupport.MavenNbModuleImpl.getLatestNbmPluginVersion(MavenNbModuleImpl.java:133)
	at org.netbeans.modules.maven.apisupport.NBMNativeMWI$AdditionalOperations.performOperation(NBMNativeMWI.java:250)
	at org.netbeans.modules.maven.apisupport.NBMNativeMWI$AdditionalOperations.performOperation(NBMNativeMWI.java:141)
	at org.netbeans.modules.maven.model.Utilities.performPOMModelOperations(Utilities.java:318)
	at org.netbeans.modules.maven.spi.newproject.CreateProjectBuilder.create(CreateProjectBuilder.java:146)
	at org.netbeans.modules.maven.apisupport.NBMNativeMWI.instantiate(NBMNativeMWI.java:69)
	at org.netbeans.modules.maven.apisupport.NBMNativeMWITest.testPathNoParent(NBMNativeMWITest.java:54)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at junit.framework.TestCase.runTest(TestCase.java:177)
	at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:77)
	at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:476)
	at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:402)
	at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:495)
	at junit.framework.TestResult$1.protect(TestResult.java:122)
	at junit.framework.TestResult.runProtected(TestResult.java:142)
	at junit.framework.TestResult.run(TestResult.java:125)
	at junit.framework.TestCase.run(TestCase.java:130)
	at org.netbeans.junit.NbTestCase.run(NbTestCase.java:291)
	at junit.framework.TestSuite.runTest(TestSuite.java:241)
	at junit.framework.TestSuite.run(TestSuite.java:236)

@mbien mbien force-pushed the lucene-renovation branch from e14588c to b1767a0 Compare February 19, 2022 02:38
@mbien

This comment was marked as outdated.

@mbien mbien force-pushed the lucene-renovation branch 2 times, most recently from dd73403 to a3e9997 Compare February 20, 2022 12:36
@mbien mbien added the do not merge Don't merge this PR, it is not ready or just demonstration purposes. label Feb 20, 2022
@mbien mbien marked this pull request as ready for review February 20, 2022 12:39
@mbien
Copy link
Member Author

mbien commented Feb 20, 2022

switched to lowercase base32 encoding since I couldn't find any other way to make it work.

index size wise, the base64->base32 switch did not make any noteworthy difference. Both versions however are ~800 MB smaller when compared to the old lucene 5 index (fresh maven central index).

13-rc4_cache/mavenindex 139 items, totalling 5,1 GB
dev_cache/mavenindex 161 items, totalling 4,3 GB

Comment on lines +40 to +48
@Override
protected Gav getGavFromPath(IndexingContext context, String repositoryPath, String artifactPath) {
// filter out maven-metadata[-repo_name].xml to avoid IndexOutOfBoundsExceptions in
// org.apache.maven.index.artifact.M2GavCalculator.getSnapshotGav(M2GavCalculator.java:187)
if (artifactPath.endsWith(".xml") && artifactPath.substring(artifactPath.lastIndexOf("/")+1).startsWith("maven-metadata")) {
return null;
}
return super.getGavFromPath(context, repositoryPath, artifactPath);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a workaround for the IOOBE in maven-indexer.

issue filed under:
https://issues.apache.org/jira/browse/MINDEXER-144

@ebarboni
Copy link
Contributor

Hi @mbien, I'm not sure my issue is related to your task but the Apache NetBeans indexer was not able to index Apache Archiva repository for a long time. It seems that your patch make it working again.
Typical stacktrace.
INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: Downloaded maven index file has size 38?912 (zipped). The usable space in **mydrive** is 348?321?529?856. INFO [org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl]: could not (re-)index **arepoid** java.lang.IllegalArgumentException: The legacy format is no longer supported by this version of maven-indexer. at org.apache.maven.index.updater.DefaultIndexUpdater.loadIndexDirectory(DefaultIndexUpdater.java:210) at org.apache.maven.index.updater.DefaultIndexUpdater.access$300(DefaultIndexUpdater.java:76) at org.apache.maven.index.updater.DefaultIndexUpdater$LuceneIndexAdaptor.setIndexFile(DefaultIndexUpdater.java:518) at org.apache.maven.index.updater.DefaultIndexUpdater.fetchAndUpdateIndex(DefaultIndexUpdater.java:752) at org.apache.maven.index.updater.DefaultIndexUpdater.fetchAndUpdateIndex(DefaultIndexUpdater.java:161) at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.indexLoadedRepo(NexusRepositoryIndexerImpl.java:523) Caused: java.io.IOException: Failed to load maven-index for: **repourl** at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.indexLoadedRepo(NexusRepositoryIndexerImpl.java:536) at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.access$200(NexusRepositoryIndexerImpl.java:122) [catch] at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl$3.run(NexusRepositoryIndexerImpl.java:654) at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl$3.run(NexusRepositoryIndexerImpl.java:647) at org.netbeans.modules.openide.util.DefaultMutexImplementation.writeAccess(DefaultMutexImplementation.java:229) at org.openide.util.Mutex.writeAccess(Mutex.java:252) at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.indexRepo(NexusRepositoryIndexerImpl.java:647) at org.netbeans.modules.maven.indexer.api.RepositoryIndexer.indexRepo(RepositoryIndexer.java:42) at org.netbeans.modules.maven.repository.RepositoryNode$RefreshIndexAction$1.run(RepositoryNode.java:228) at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:1418) at org.netbeans.modules.openide.util.GlobalLookup.execute(GlobalLookup.java:45) at org.openide.util.lookup.Lookups.executeWith(Lookups.java:278) at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:2033)

@mbien
Copy link
Member Author

mbien commented Feb 22, 2022

@ebarboni thanks Eric for testing. That is good news. This PR was a little bit of a minefield :)

@mbien
Copy link
Member Author

mbien commented Feb 25, 2022

for future reference, the assertion error in BooleanQuery.hashcode() attached below.

bq.add(new BooleanClause(setBooleanRewrite(q), occur));
} else {
//TODO when all fields, we need to create separate
//queries for each field.
}
}
IteratorSearchResponse resp = repeatedPagedSearch(bq.build(), Collections.singletonList(context), MAX_RESULT_COUNT);

Problem can be mitigated by not changing the rewrite method to MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE recursively via setBooleanRewrite(query), or not running with assertions enabled.

lucene issue filed:

PRs:

Issue can be triggered via UI, right click on project dependencies -> "Add Dependency..." -> use "query:" search field, e.g enter "org.eclipse.jetty".

java.lang.AssertionError
	at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614)
	at java.base/java.util.Objects.hashCode(Objects.java:103)
	at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298)
	at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527)
	at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119)
	at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717)
	at java.base/java.util.EnumMap.hashCode(EnumMap.java:709)
	at java.base/java.util.Arrays.hashCode(Arrays.java:4498)
	at java.base/java.util.Objects.hash(Objects.java:133)
	at org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597)
	at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611)
	at java.base/java.util.HashMap.hash(HashMap.java:340)
	at java.base/java.util.HashMap.put(HashMap.java:612)
	at org.apache.lucene.search.Multiset.add(Multiset.java:82)
	at org.apache.lucene.search.BooleanQuery.<init>(BooleanQuery.java:154)
	at org.apache.lucene.search.BooleanQuery.<init>(BooleanQuery.java:42)
	at org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl$1.run(NexusRepositoryIndexerImpl.java:1457)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.lambda$iterate$11(NexusRepositoryIndexerImpl.java:890)
	at org.netbeans.modules.openide.util.DefaultMutexImplementation.writeAccess(DefaultMutexImplementation.java:229)
	at org.openide.util.Mutex.writeAccess(Mutex.java:252)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.iterate(NexusRepositoryIndexerImpl.java:872)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.find(NexusRepositoryIndexerImpl.java:1387)
	at org.netbeans.modules.maven.indexer.NexusRepositoryIndexerImpl.find(NexusRepositoryIndexerImpl.java:1382)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.lambda$findResult$13(RepositoryQueries.java:505)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.getQueryResult(RepositoryQueries.java:217)
	at org.netbeans.modules.maven.indexer.api.RepositoryQueries.findResult(RepositoryQueries.java:502)
	at org.netbeans.modules.maven.nodes.AddDependencyPanel$QueryPanel$2.run(AddDependencyPanel.java:1144)
	at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:1418)
(...)

@JaroslavTulach
Copy link

Maven continues to support JDK8 and even older. E.g. Maven itself must be using compatible version of Lucene. I suggest NetBeans to use the same version.

@mbien
Copy link
Member Author

mbien commented Mar 11, 2022

Maven continues to support JDK8 and even older. E.g. Maven itself must be using compatible version of Lucene. I suggest NetBeans to use the same version.

you mean maven-indexer? Thats exactly what this PR does. Latest 8.x is in use which is the same what maven-indexer 6.1x uses, Compatibility windows are described in the PR text and why this all had to be done in one step (5.x -> 8.x). The encountered problems are listed too.

The only potential showstopper remaining (IMO) is the discovered bug in lucene's hashcode logic, although it got quickly fixed thanks to lucene devs after I filed the issue, it doesn't really help us if it won't be backported to 8.x which we are stuck to due to the JDK 8 requirement of this module.

Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. A quick local test showed, that searching for dependencies (locally by class, remote by artifact/group name). I left a few inline comments though.

@@ -247,7 +247,7 @@ public void performOperation(POMModel model) {
//nbm-maven-plugin
boolean addPlugin = true;
String managedPVersion = null;
String pVersion = MavenNbModuleImpl.getLatestNbmPluginVersion();
String pVersion = MavenNbModuleImpl.LATEST_NBM_PLUGIN_VERSION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change in semantics intended? If I read the method correctly it will really fetch the latest version, while your change switches to a hard coded value. This looks fishy to me.

Copy link
Member Author

@mbien mbien Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is intended, Its this commit: 5b9efc2

explanation is here:
#3558 (comment)

short version:
code swallowed a NPE which basically made the tests test nothing, post upgrade the NPE did not occur and the tests tried to download the maven repo index which caused them to time out

by returning a constant (again, i basically reverted an older commit) and not ignoring the exception, everything works again.

Yes, semantics changed a little to the old behavior before 57e21b8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oyarzun could you please have a look at this - it is your commit, for which the behavior is changed, so this needs to be reevaluated.

@mbien mbien force-pushed the lucene-renovation branch 2 times, most recently from c03e479 to 5ebf50f Compare April 5, 2022 06:13
@mbien
Copy link
Member Author

mbien commented Apr 11, 2022

@matthiasblaesing what is your opinion on this. This is a fairly mission critical area so we really should make sure that everything works before integrating IMO... but we have only a week left till freeze - would be great to get more reviews.

@mbien mbien force-pushed the lucene-renovation branch from 5ebf50f to a8681f7 Compare April 11, 2022 21:09
@matthiasblaesing
Copy link
Contributor

@mbien sorry for the late reply: The outstanding question is the handling of the plugin version. Would it be possible to implement something like this:

  1. Check if maven index is available
  2. If maven index is available find the latest version and use that
  3. else fallback to the hardcoded value

That way we have the best of both worlds.

@mbien mbien added this to the NB14 milestone Apr 19, 2022
@mbien mbien removed the do not merge Don't merge this PR, it is not ready or just demonstration purposes. label Apr 19, 2022
@neilcsmith-net neilcsmith-net modified the milestones: NB14, NB15 Apr 20, 2022
Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to get this into master, once it settles down, the earlier, the long we have time to fix bugs ;-)

mbien added 6 commits April 24, 2022 18:52
 - added jdk/incubator to jdk class list test.
 - fix: move maven-indexer to the front in the classpath.
 - migration had to be done in one step since there are no maven-indexer
   releases available which support lucene 6 or 7.
lucene is case sensitive, maven-indexer however uses a lower case
filter on its hardcoded NexusAnalyzer. This means that most class
dependency queries wouldn't return anything.

Base32 uses only one case (usually upper case, but we use lower case).
…nloaded.

 - query latest nbm-version but don't wait for the index download
 - this reverts some of 57e21b8

why did this work before migration?

The code swallowed an exception during junit tests which caused the logic
to return a fallback constant as version. Post migration, this exception
does not occure and the method would correctly block -> tests time out.
- lambdas, diamonds, overrides, final fields, arm blocks
- if chain over Strings -> switch
- str.replaceAll -> str.replace for non-regex usages
- map.contains + map.get -> map.get + null check
- collection sizing where known
- deprecations and warning removals
- dead code removal
- formatting fixes
@mbien mbien force-pushed the lucene-renovation branch from b4842a1 to ec298fb Compare April 24, 2022 16:58
@mbien
Copy link
Member Author

mbien commented Apr 24, 2022

ok awesome. Travis is working here too. I squashed a few less important commits, i would like to merge the rest without squashing so that it simplifies debugging with git bisections if problems occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code cleanup Java [ci] enable extra Java tests (java.completion, java.source.base, java.hints, refactoring.java, form) Maven [ci] enable "build tools" tests Upgrade Library Library (Dependency) Upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants