Skip to content

Phrase Suggester: Suggesting on very frequent words can cause request failures #34282

Closed
@nik9000

Description

@nik9000

I have a stack trace that looks like:

Caused by: java.lang.IllegalArgumentException: Fractional absolute document frequencies are not allowed
    at org.apache.lucene.search.spell.DirectSpellChecker.setThresholdFrequency(DirectSpellChecker.java:182)
    at org.elasticsearch.search.suggest.phrase.DirectCandidateGenerator.drawCandidates(DirectCandidateGenerator.java:131)
    at org.elasticsearch.search.suggest.phrase.MultiCandidateGeneratorWrapper.drawCandidates(MultiCandidateGeneratorWrapper.java:52)

I do not have and cannot get the index that causes this failure. But it looks to me like the failure is caused by this series of events:

  1. DirectCandidateGenerator#thresholdFrequency spits out a frequency that is bigger than Integer.MAX_VALUE. This looks to be possible using the default configuration for common words like "the" when the corpus is a couple of million documents and each document is large, like, say, as big as a wikipedia page.
  2. We call DirectSpellChecker#setThresholdFrequency with that number. The JVM helpfully casts the long returned by step 1 into a float, losing precision but keeping the magnitude of the number largely intact.
  3. Lucene attempts to validate that the float is either less than 0 or a whole number. The "is it a whole number" check looks like thresholdFrequency != (int) thresholdFrequency. That will consider floats that don't fit into ints as not whole numbers. Most of the time, anyway.

There is a work around: set "suggest_mode": "always". We'll skip the math and just pick 0 for the frequency. Which is both less than one and whole number so Lucene is quite happy with it.

It looks like we should either clamp the value to Integer.MAX_VALUE in Elasticsearch or Lucene should use something else to check for fractional numbers.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions