Skip to content

Add OpenNLP Analysis capabilities as a module [LUCENE-2899] #3973

Closed
@asfimport

Description

@asfimport

Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does:

  • Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens)
  • NamedEntity recognition as a TokenFilter

We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position.

I'd propose it go under:
modules/analysis/opennlp


Migrated from LUCENE-2899 by Grant Ingersoll (@gsingers), 36 votes, resolved Dec 19 2017
Attachments: LUCENE-2899.patch (versions: 6), LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, OpenNLPFilter.java, OpenNLPTokenizer.java
Linked issues:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions