Skip to content

Commit

Permalink
adds doc about range queries
Browse files Browse the repository at this point in the history
  • Loading branch information
robfrank committed Dec 22, 2016
1 parent 9a5b9d9 commit d978075
Showing 1 changed file with 111 additions and 75 deletions.
186 changes: 111 additions & 75 deletions Full-Text-Index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ In addition to the standard FullText Index, which uses the SB-Tree index algorit

**Syntax**:

```sql
CREATE INDEX <name> ON <class-name> (prop-names) FULLTEXT ENGINE LUCENE
```
<pre>
orientdb> <code class="lang-sql userinput">CREATE INDEX <name> ON <class-name> (prop-names) FULLTEXT ENGINE LUCENE</code>
</pre>

The following SQL statement will create a FullText index on the property `name` for the class `City`, using the Lucene Engine.

Expand All @@ -26,68 +26,53 @@ orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON
FULLTEXT ENGINE LUCENE</code>
</pre>

The default analyzer used by OrientDB when a Lucene index is created id the [StandardAnalyzer](https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html).

### Analyzer

This creates a basic FullText Index with the Lucene Engine on the specified properties. In the even that you do not specify the analyzer, OrientDB defaults to [StandardAnalyzer](http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html).

In addition to the StandardAnalyzer, you can also create indexes that use different analyzer, using the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md).

<pre>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
{"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"}</code>
</pre>

**(from 2.1.16)**

From version 2.1.16 it is possible to provide a set of stopwords to the analyzer to override the default set of the analyzer:
In addition to the StandardAnalyzer, full text indexes can be configured to use different analyzer by the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md).

Configure the index on `City.name` to use the `EnglishAnalyzer`:
<pre>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
{
"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"analyzer_stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by" ]
}

</code>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
FULLTEXT ENGINE LUCENE METADATA {
"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
}</code>
</pre>


**(from 2.2)**

Starting from 2.2 it is possible to configure different analyzers for indexing and querying.
Configure the index on `City.name` to use different analyzers for indexing and querying.

<pre>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
{
"index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
FULLTEXT ENGINE LUCENE METADATA {
"index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
}</code>
</pre>

EnglishAnalyzer will be used to analyze text while indexing and the StandardAnalyzer will be used to analyze query terms.
`EnglishAnalyzer` will be used to analyze text while indexing and the `StandardAnalyzer` will be used to analyze query terms.

It is posssbile to configure analyzers at field level
A very detailed configuration, on multi-field index configuration, could be:

<pre>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON City(name, lyrics, title,author, description) FULLTEXT ENGINE LUCENE METADATA
{
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"description_index_stopwords": [
"the",
"is"
]

}</code>
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON City(name, lyrics, title,author, description)
FULLTEXT ENGINE LUCENE METADATA {
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
"author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
"description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
"description_index_stopwords": [
"the",
"is"
]
}</code>
</pre>

With this configuration, the underlying Lucene index will works in different way on each field:
Expand All @@ -98,14 +83,18 @@ With this configuration, the underlying Lucene index will works in different way
* *author*: indexed and searched with KeywordhAnalyzer
* *description*: indexed with StandardAnalyzer with a given set of stopwords

You can also use the FullText Index with the Lucene Engine through the Java API.

```java
OSchema schema = databaseDocumentTx.getMetadata().getSchema();
OClass oClass = schema.createClass("Foo");
oClass.createProperty("name", OType.STRING);
oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"});
```
### Java API

The FullText Index with the Lucene Engine is configurable through the Java API.

<pre><code class="">
OSchema schema = databaseDocumentTx.getMetadata().getSchema();
OClass oClass = schema.createClass("Foo");
oClass.createProperty("name", OType.STRING);
oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"});
</code>
</pre>

## Query parser

Expand Down Expand Up @@ -161,16 +150,16 @@ SELECT from Person WHERE name LUCENE "name"
It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter

<pre>
<code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
{
"directory_type": "nio",
"use_compound_file": false,
"ram_buffer_MB": "16",
"max_buffered_docs": "-1",
"max_buffered_delete_terms": "-1",
"ram_per_thread_MB": "1024",
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
}
<code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
FULLTEXT ENGINE LUCENE METADATA {
"directory_type": "nio",
"use_compound_file": false,
"ram_buffer_MB": "16",
"max_buffered_docs": "-1",
"max_buffered_delete_terms": "-1",
"ram_per_thread_MB": "1024",
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
}
</code>
</pre>

Expand All @@ -186,40 +175,87 @@ It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter

For a detailed explanation of config parameters and IndexWriter behaviour

* indexWriterConfig : https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriterConfig.html
* indexWriter: https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriter.html
* indexWriterConfig : https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriterConfig.html
* indexWriter: https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriter.html

## Querying Lucene FullText Indexes

You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/5_4_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine.
You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine.

<pre>
orientdb> <code class='lang-sql userinput'>SELECT FROM V WHERE name LUCENE "test*"</code>
</pre>

This query searches for `test`, `tests`, `tester`, and so on from the property `name` of the class `V`.
The query can use proximity operator _~_, the required (_+_) and prohibit (_-_) operators, phrase queries, regexp queries:

<pre>
orientdb> <code class='lang-sql userinput'>SELECT FROM Article WHERE content LUCENE "(+graph -rdbms) AND +cloud"</code>
</pre>

### Working with Multiple Fields

### Working with multiple fields

In addition to the standard Lucene query above, you can also query multiple fields. For example,

<pre>
orientdb> <code class="lang-sql userinput">SELECT FROM Class WHERE [prop1, prop2] LUCENE "query"</code>
</pre>

In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field.
In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field.

To execute a more complex query on each field, surround your query with parentheses, which causes the query to address specific fields.

<pre>
orientdb> <code class="lang-sql userinput">SELECT FROM CLass WHERE [prop1, prop2] LUCENE "(prop1:foo AND prop2:bar)"</code>
orientdb> <code class="lang-sql userinput">SELECT FROM Article WHERE [content, author] LUCENE "(content:graph AND author:john)"</code>
</pre>

Here, the engine parses the query using the [QueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html)

### Numeric and date range queries

If the index is defined over a numeric field (INTEGER, LONG, DOUBLE) or a date field (DATE, DATETIME), the engine supports [range queries](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches)
Suppose to have a `City` class witha multi-field Lucene index defined:

<pre>
orientdb> <code class="lang-sql userinput">
CREATE CLASS CITY EXTENDS V
CREATE PROPERTY CITY.name STRING
CREATE PROPERTY CITY.size INTEGER
CREATE INDEX City.name ON City(name,size) FULLTEXT ENGINE LUCENE
</code>
</pre>

Here, hte engine parses the query using the [QueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html)
Then query using ranges:

<pre>
orientdb> <code class="lang-sql userinput">
SELECT FROM City WHERE [name,size] LUCENE 'name:cas* AND size:[15000 TO 20000]'
</code>
</pre>

Ranges can be applied to DATE/DATETIME field as well. Create a Lucene index over a property:

<pre>
orientdb> <code class="lang-sql userinput">
CREATE CLASS Article EXTENDS V
CREATE PROPERTY Article.createdAt DATETIME
CREATE INDEX Article.createdAt ON Article(createdAt) FULLTEXT ENGINE LUCENE
</code>
</pre>

Then query to retrieve articles published only in a given time range:

<pre>
orientdb> <code class="lang-sql userinput">
SELECT FROM Article WHERE createdAt LUCENE '[201612221000 TO 201612221100]'</code>
</pre>



### Retrieve the Score

When the lucene index is used in a query, the results set carries a context variable for each record rappresenting the score.
When the lucene index is used in a query, the results set carries a context variable for each record representing the score.
To display the score add `$score` in projections.

```
Expand All @@ -228,7 +264,7 @@ SELECT *,$score FROM V WHERE name LUCENE "test*"

## Creating a Manual Lucene Index

Beginning with version 2.1, the Lucene Engine supports index creation without the need for a class.
The Lucene Engine supports index creation without the need for a class.

**Syntax**:

Expand Down

0 comments on commit d978075

Please sign in to comment.