BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term text [LUCENE-10360]

Error seen:
```
  2> TEST FAIL: useCharFilter=true text='Uf?F ?wlu{0 <!--'a'
  2> Exception from random analyzer:
  2> charfilters=
  2> tokenizer=
  2>   org.apache.lucene.analysis.ja.JapaneseTokenizer(org.apache.lucene.util.AttributeFactory$1@4c00d592, null, false, true, NORMAL)
  2> filters=
  2>   Conditional:org.apache.lucene.analysis.pt.PortugueseLightStemFilter(OneTimeWrapper@3fad923e term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false)
  2>   org.apache.lucene.analysis.phonetic.BeiderMorseFilter(ValidatingTokenFilter@43fbbeb0 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false, org.apache.commons.codec.language.bm.PhoneticEngine@631e916d)
  2>   Conditional:org.apache.lucene.analysis.synonym.SynonymGraphFilter(OneTimeWrapper@77051976 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false, org.apache.lucene.analysis.synonym.SynonymMap@69152718, true)
   >     java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
   >         at __randomizedtesting.SeedInfo.seed([1E22B4EE8663AE48:23C39D8FC171B388]:0)
   >         at org.apache.commons.codec@1.13/org.apache.commons.codec.language.bm.PhoneticEngine.encode(PhoneticEngine.java:433)
   >         at org.apache.commons.codec@1.13/org.apache.commons.codec.language.bm.PhoneticEngine.encode(PhoneticEngine.java:384)
   >         at org.apache.lucene.analysis.phonetic@10.0.0-SNAPSHOT/org.apache.lucene.analysis.phonetic.BeiderMorseFilter.incrementToken(BeiderMorseFilter.java:96)
```

Actually the issue happens if:
- PhoneticEngine uses NameType=SEPHARDIC
- The term is empty or the cleanup done by the encode is empty (whitespace and dashes removed)

The problem is that the encoder calls String.split() and assumes the array always has size>=1.

You can write an easy test, but the bug has to be reported upstream.



---
Migrated from [LUCENE-10360](https://issues.apache.org/jira/browse/LUCENE-10360) by Uwe Schindler (@uschindler)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term text [LUCENE-10360] #11396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BeiderMorseFilter: TestRandomChains fails with IndexOutOfBounds on empty term text [LUCENE-10360] #11396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions