Skip to content

[lucene.xml#query]: [Full Text Index]: Improvements in query and options parameters description #1063

Open
@daliboris

Description

@daliboris

Description of the query and option parameters in ft:query() function in the Querying the Index section can be improved

To drill down by a given facet dimension and value, pass a key "facets" in the options map given in the third parameter of ft:query

  • It's not clear that user can combine query in XML with options as a map
  • options parameter as a map in the documentation contains only "facets" as a key, but it can contain other keys, like "default-operator", "leading-wildcard", that correspond to child elements of the <options> element, for example
let $options := map { 
    "default-operator" : "or"
    }
  • Documentation should mention that user can use XML version of the query for full-text search of the fields associated with the element by adding @field attribute to the <term> and others query child elements, except <near>. For example, the following query searches in the entire (dictionary) entry:
$collection//tei:entry[ft:query(., <query><term>dog</term></query>)

In contrast, the following query searches only within the lemma field of the (dictionary) entry:

$collection//tei:entry[ft:query(., <query><term field="lemma">dog</term></query>)

When set to yes, * or ? are allowed as the first character of a PrefixQuery and WildcardQuery. Note that this can produce very slow queries on big indexes.

  • The terms PrefixQuery and WildcardQuery are not mentioned anywhere else on this page and come from the source code. Definition should be simpler, for example:

When set to yes, * or ? are allowed as the first character of a query. Note that this can produce very slow queries on big indexes.

  • From my experience, <leading-wildcard>yes</leading-wildcard> or map { "leading-wildcard": "yes" } has effect only if the query is defined in Lucene format, not in XML format.

For example, following queries returns the same results:

ft:query(., <query><wildcard field="lemma">*epes</wildcard></query>, <options><leading-wildcard>no</leading-wildcard></options>)
ft:query(., <query><wildcard field="lemma">*epes</wildcard></query>, <options><leading-wildcard>yes</leading-wildcard></options>)
ft:query(.,  "lemma:*epes", <options><leading-wildcard>yes</leading-wildcard></options>)

While the following query throws an error (Syntax error in Lucene query string: Cannot parse 'lemma:*epes': '*' or '?' not allowed as first character in WildcardQuery):

ft:query(.,  "lemma:*epes", <options><leading-wildcard>no</leading-wildcard></options>)
  • In the list of elements occurring in query description, the <fuzzy> element is missing. Proposed definition:

Will match terms with an edit distance of at most @max-edits to the term. The value of @max-edits attribute is an integer between 0 and 2, default is 2. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm.

<regex> A regular expression which will be matched against the terms of a document. Can be used instead of a element. For example:

  • Documentation should mention that not all regular expressions are allowed, for example ^ for the beginning of string or $ for the end of string. Link to the Lucene documentation could help with this.

Please provide the following

  • exist-db version: 6.2.0
  • documentation version: 6.2.0, 3Q21

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions