Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene Codecs #136

Closed
cdorfer opened this issue Aug 8, 2020 · 14 comments
Closed

Lucene Codecs #136

cdorfer opened this issue Aug 8, 2020 · 14 comments

Comments

@cdorfer
Copy link

cdorfer commented Aug 8, 2020

I am aware that this is a duplicate but it was not solved in the last issue.

Using ngrams I get:
Could not load codec 'Lucene410'. Did you forget to add lucene-backward-codecs.jar? N-gram rules will be ignored.

Is there a way to fix the languagetool before it is packed into the textidote jar?

@bratekarate
Copy link
Contributor

bratekarate commented Sep 18, 2020

Edit: Better skip this comment, it has virtualy no information.

It's actually quite easy to get a hold of the jar lucene-backward-codecs.jar. I built languagetool with maven and it appears in the target directory. But that still does not solve the issue. Now I'm even more confused, because I get an SPI error (this is getting way over my head, don't even know what that means). The fact that this project works without Maven or Gradle makes this like a lottery to me. I don't know how to manage Ant.

Lucene410 seems to be available now, and in the JAR I can see Lucene50 files. Apparently I was not able to include everything.

Exception in thread "main" java.util.ServiceConfigurationError: Cannot instantiate SPI class: 
                org.apache.lucene.codecs.lucene50.Lucene50Codec

...and with root cause

Caused by: java.lang.IllegalArgumentException: 
        An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist.  
 You need to add the corresponding JAR file supporting this SPI to your classpath.
        The current classpath supports the following names: [Lucene40, Lucene41]

So I somehow managed to get only parts of the JAR in the end result. I have no Idea what I did with Ant to be honest, if someone knows what to do it is probably not even difficult. Everything seems to be available, but I don't know what I'm doing. I just added this line in config.xml (which is probably terribly wrong):

EDIT: The following code is completely unrelated and does not make any sense. Skip ahead

    ~<dependency>
      <!--
      Lucene 410
      -->
        <name>Lucene Codec 410</name>
        <classname>org.apache.lucene.codecs.lucene410.Lucene410Codec</classname>
        <bundle>true</bundle>
    </dependency>
    <dependency>
        <!--
            Lucene 50
        -->
        <name>Lucene Codec 50</name>
        <classname>org.apache.lucene.codecs.lucene50.Lucene50Codec</classname>
        <bundle>true</bundle>
    </dependency>
    <dependency>~

I hope someone with more knowledge of Ant and Jar packaging can help out.

@bratekarate
Copy link
Contributor

I have a fix!! Sorry for spamming and not editing the old comment, but it was so chaotic and beyond rescue.

Just managed to build languagetool with working ngram checks locally. Thanks to this great information.

TL;DR: a ServiceResourceTransformer with the maven-shade-plugin makes it work.

However, I did not use the fork. I checked the logs and it looks the same like the the original repos just 2 years ago? Anyways, I just checked out tag v4.6 from the original repos.

I configured languagetool-commandline to run a maven-shade plugin. I don't really understand the project structure in its entirety; just tried to get it to build from the perspective of needing languagetool-commandline to run standalone. The important thing is that somehow a shaded JAR is built with the right option.

All that needed to be done in the end was to replace the entire maven-jar-plugin block with the following maven-shade-plugin block:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.4.3</version>
    <executions>
        <execution>
            <id>create-fat-jar</id>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                     <!-- See next line for the fix. -->
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>org.languagetool.commandline.Main</mainClass>
                    </transformer>
                </transformers>
                <finalName>languagetool-commandline-standalone</finalName>
            </configuration>
        </execution>
    </executions>
</plugin>

After running mvn install in the root project (presumably there is also a quicker command), I verified with java -jar languagetool-commandline/target/languagetool-commandline-fat.jar --languagemodel ~/ngram example.tex that the error was gone. Then just needed to copy languagetool-commandline-fat.jar into the textidote project to dep and it worked.

What I configured probably makes no sense in the bigger picture. The important takeaway is ServiceResourceTransformer to fix the issue with the conflicting Codec SPIs.

@sylvainhalle
Copy link
Owner

Big thumbs up for all your work! This is exactly the recipe we need to build the "fat" JAR that should then be included to build TeXtidote. However, since this is related to building a JAR from LanguageTool, I think this fix should be posted in the other repo. I see from my e-mails that you already did it?

@bratekarate
Copy link
Contributor

bratekarate commented Sep 22, 2020

Yes, I did. I found out about the relevant discussions after posting here. So please excuse the boilerplate, I was lost for a while. I created a sylvainhalle/languagetool#2 in the LT fork as a working example where I also explain the limitations. Had to do some tinkering by hand with the META-INF of the jar to get all languageClasses working, but maybe you know an easier way.

I can confirm that it works, currently I am using the master of textidote with LT v5.0.1. Also tested with #123 and n-gram language models.

@sylvainhalle
Copy link
Owner

I am trying to reproduce your steps with LT v5.0.1. Checked out the branch, replaced the section in languagetool-commandline/pom.xml and ran mvn install. The build is a success, but no file languagetool-commandline-fat.jar is produced.

@sylvainhalle
Copy link
Owner

Made a mistake in copy/pasting, rebuilding now...

@sylvainhalle
Copy link
Owner

Nope, still no such file

@bratekarate
Copy link
Contributor

Have you used the code from my pull request? The code I pasted in this issue may not be optimal.

So languagetool-commandline/target/languagetool-commandline-fat.jar does not exist at all?

@bratekarate
Copy link
Contributor

bratekarate commented Sep 24, 2020

Yes, the code in this issue names the file languagetool-commandline-standalone. In the PR I called it languagetool-commandline-fat.jar, to avoid confusion with the standalone module.

If the jar still does not appear with the pom.xml from sylvainhalle/languagetool#2, then I have no clue. I just tested it again to verify, but no issues.

@sylvainhalle
Copy link
Owner

Thanks, I'll try directly from your pull request.

@bratekarate
Copy link
Contributor

My environment, output from mvn --version:

Apache Maven 3.6.3 (NON-CANONICAL_2019-11-27T20:26:29Z_root)
Maven home: /opt/maven
Java version: 1.8.0_265, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-8-openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.8.9-arch2-1", arch: "amd64", family: "unix"

@sylvainhalle
Copy link
Owner

I'm closing this, since the latest fat JAR used by TeXtidote has fixed this issue.

@wysiayg
Copy link

wysiayg commented Nov 7, 2023

When building 0.9 I get the same erros. Do I need to configure anything so that the latest fat JAR is used?

@sylvainhalle
Copy link
Owner

This should not be happening, as the GitHub workflow compiles and runs fine with the latest commit on the repository:

https://github.com/sylvainhalle/textidote/actions/runs/6756084739

On your local system, you might try ant wipe to make sure all dependencies are cleared and that the next build re-downloads all the latest versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants