Skip to content

Commit d978075

Browse files
committed
adds doc about range queries
refs orientechnologies/orientdb#6534
1 parent 9a5b9d9 commit d978075

File tree

1 file changed

+111
-75
lines changed

1 file changed

+111
-75
lines changed

Full-Text-Index.md

Lines changed: 111 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ In addition to the standard FullText Index, which uses the SB-Tree index algorit
99

1010
**Syntax**:
1111

12-
```sql
13-
CREATE INDEX <name> ON <class-name> (prop-names) FULLTEXT ENGINE LUCENE
14-
```
12+
<pre>
13+
orientdb> <code class="lang-sql userinput">CREATE INDEX <name> ON <class-name> (prop-names) FULLTEXT ENGINE LUCENE</code>
14+
</pre>
1515

1616
The following SQL statement will create a FullText index on the property `name` for the class `City`, using the Lucene Engine.
1717

@@ -26,68 +26,53 @@ orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON
2626
FULLTEXT ENGINE LUCENE</code>
2727
</pre>
2828

29+
The default analyzer used by OrientDB when a Lucene index is created id the [StandardAnalyzer](https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html).
2930

3031
### Analyzer
3132

32-
This creates a basic FullText Index with the Lucene Engine on the specified properties. In the even that you do not specify the analyzer, OrientDB defaults to [StandardAnalyzer](http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html).
33-
34-
In addition to the StandardAnalyzer, you can also create indexes that use different analyzer, using the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md).
35-
36-
<pre>
37-
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
38-
{"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"}</code>
39-
</pre>
40-
41-
**(from 2.1.16)**
42-
43-
From version 2.1.16 it is possible to provide a set of stopwords to the analyzer to override the default set of the analyzer:
33+
In addition to the StandardAnalyzer, full text indexes can be configured to use different analyzer by the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md).
4434

35+
Configure the index on `City.name` to use the `EnglishAnalyzer`:
4536
<pre>
46-
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
47-
{
48-
"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer",
49-
"analyzer_stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by" ]
50-
}
51-
52-
</code>
37+
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
38+
FULLTEXT ENGINE LUCENE METADATA {
39+
"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
40+
}</code>
5341
</pre>
5442

5543

56-
**(from 2.2)**
57-
58-
Starting from 2.2 it is possible to configure different analyzers for indexing and querying.
44+
Configure the index on `City.name` to use different analyzers for indexing and querying.
5945

6046
<pre>
61-
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
62-
{
63-
"index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
64-
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
47+
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
48+
FULLTEXT ENGINE LUCENE METADATA {
49+
"index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
50+
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
6551
}</code>
6652
</pre>
6753

68-
EnglishAnalyzer will be used to analyze text while indexing and the StandardAnalyzer will be used to analyze query terms.
54+
`EnglishAnalyzer` will be used to analyze text while indexing and the `StandardAnalyzer` will be used to analyze query terms.
6955

70-
It is posssbile to configure analyzers at field level
56+
A very detailed configuration, on multi-field index configuration, could be:
7157

7258
<pre>
73-
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON City(name, lyrics, title,author, description) FULLTEXT ENGINE LUCENE METADATA
74-
{
75-
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
76-
"index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
77-
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
78-
"name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
79-
"name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
80-
"lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
81-
"title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
82-
"title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
83-
"author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
84-
"description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
85-
"description_index_stopwords": [
86-
"the",
87-
"is"
88-
]
89-
90-
}</code>
59+
orientdb> <code class="lang-sql userinput">CREATE INDEX City.name_description ON City(name, lyrics, title,author, description)
60+
FULLTEXT ENGINE LUCENE METADATA {
61+
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
62+
"index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
63+
"query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
64+
"name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
65+
"name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
66+
"lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
67+
"title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
68+
"title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
69+
"author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
70+
"description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
71+
"description_index_stopwords": [
72+
"the",
73+
"is"
74+
]
75+
}</code>
9176
</pre>
9277

9378
With this configuration, the underlying Lucene index will works in different way on each field:
@@ -98,14 +83,18 @@ With this configuration, the underlying Lucene index will works in different way
9883
* *author*: indexed and searched with KeywordhAnalyzer
9984
* *description*: indexed with StandardAnalyzer with a given set of stopwords
10085

101-
You can also use the FullText Index with the Lucene Engine through the Java API.
10286

103-
```java
104-
OSchema schema = databaseDocumentTx.getMetadata().getSchema();
105-
OClass oClass = schema.createClass("Foo");
106-
oClass.createProperty("name", OType.STRING);
107-
oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"});
108-
```
87+
### Java API
88+
89+
The FullText Index with the Lucene Engine is configurable through the Java API.
90+
91+
<pre><code class="">
92+
OSchema schema = databaseDocumentTx.getMetadata().getSchema();
93+
OClass oClass = schema.createClass("Foo");
94+
oClass.createProperty("name", OType.STRING);
95+
oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"});
96+
</code>
97+
</pre>
10998

11099
## Query parser
111100

@@ -161,16 +150,16 @@ SELECT from Person WHERE name LUCENE "name"
161150
It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter
162151

163152
<pre>
164-
<code class="lang-sql userinput">CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
165-
{
166-
"directory_type": "nio",
167-
"use_compound_file": false,
168-
"ram_buffer_MB": "16",
169-
"max_buffered_docs": "-1",
170-
"max_buffered_delete_terms": "-1",
171-
"ram_per_thread_MB": "1024",
172-
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
173-
}
153+
<code class="lang-sql userinput">CREATE INDEX City.name ON City(name)
154+
FULLTEXT ENGINE LUCENE METADATA {
155+
"directory_type": "nio",
156+
"use_compound_file": false,
157+
"ram_buffer_MB": "16",
158+
"max_buffered_docs": "-1",
159+
"max_buffered_delete_terms": "-1",
160+
"ram_per_thread_MB": "1024",
161+
"default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
162+
}
174163
</code>
175164
</pre>
176165

@@ -186,40 +175,87 @@ It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter
186175

187176
For a detailed explanation of config parameters and IndexWriter behaviour
188177

189-
* indexWriterConfig : https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriterConfig.html
190-
* indexWriter: https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriter.html
178+
* indexWriterConfig : https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriterConfig.html
179+
* indexWriter: https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriter.html
191180

192181
## Querying Lucene FullText Indexes
193182

194-
You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/5_4_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine.
183+
You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine.
195184

196185
<pre>
197186
orientdb> <code class='lang-sql userinput'>SELECT FROM V WHERE name LUCENE "test*"</code>
198187
</pre>
199188

200189
This query searches for `test`, `tests`, `tester`, and so on from the property `name` of the class `V`.
190+
The query can use proximity operator _~_, the required (_+_) and prohibit (_-_) operators, phrase queries, regexp queries:
191+
192+
<pre>
193+
orientdb> <code class='lang-sql userinput'>SELECT FROM Article WHERE content LUCENE "(+graph -rdbms) AND +cloud"</code>
194+
</pre>
201195

202-
### Working with Multiple Fields
196+
197+
### Working with multiple fields
203198

204199
In addition to the standard Lucene query above, you can also query multiple fields. For example,
205200

206201
<pre>
207202
orientdb> <code class="lang-sql userinput">SELECT FROM Class WHERE [prop1, prop2] LUCENE "query"</code>
208203
</pre>
209204

210-
In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field.
205+
In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field.
211206

212207
To execute a more complex query on each field, surround your query with parentheses, which causes the query to address specific fields.
213208

214209
<pre>
215-
orientdb> <code class="lang-sql userinput">SELECT FROM CLass WHERE [prop1, prop2] LUCENE "(prop1:foo AND prop2:bar)"</code>
210+
orientdb> <code class="lang-sql userinput">SELECT FROM Article WHERE [content, author] LUCENE "(content:graph AND author:john)"</code>
211+
</pre>
212+
213+
Here, the engine parses the query using the [QueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html)
214+
215+
### Numeric and date range queries
216+
217+
If the index is defined over a numeric field (INTEGER, LONG, DOUBLE) or a date field (DATE, DATETIME), the engine supports [range queries](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches)
218+
Suppose to have a `City` class witha multi-field Lucene index defined:
219+
220+
<pre>
221+
orientdb> <code class="lang-sql userinput">
222+
CREATE CLASS CITY EXTENDS V
223+
CREATE PROPERTY CITY.name STRING
224+
CREATE PROPERTY CITY.size INTEGER
225+
CREATE INDEX City.name ON City(name,size) FULLTEXT ENGINE LUCENE
226+
</code>
216227
</pre>
217228

218-
Here, hte engine parses the query using the [QueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html)
229+
Then query using ranges:
230+
231+
<pre>
232+
orientdb> <code class="lang-sql userinput">
233+
SELECT FROM City WHERE [name,size] LUCENE 'name:cas* AND size:[15000 TO 20000]'
234+
</code>
235+
</pre>
236+
237+
Ranges can be applied to DATE/DATETIME field as well. Create a Lucene index over a property:
238+
239+
<pre>
240+
orientdb> <code class="lang-sql userinput">
241+
CREATE CLASS Article EXTENDS V
242+
CREATE PROPERTY Article.createdAt DATETIME
243+
CREATE INDEX Article.createdAt ON Article(createdAt) FULLTEXT ENGINE LUCENE
244+
</code>
245+
</pre>
246+
247+
Then query to retrieve articles published only in a given time range:
248+
249+
<pre>
250+
orientdb> <code class="lang-sql userinput">
251+
SELECT FROM Article WHERE createdAt LUCENE '[201612221000 TO 201612221100]'</code>
252+
</pre>
253+
254+
219255

220256
### Retrieve the Score
221257

222-
When the lucene index is used in a query, the results set carries a context variable for each record rappresenting the score.
258+
When the lucene index is used in a query, the results set carries a context variable for each record representing the score.
223259
To display the score add `$score` in projections.
224260

225261
```
@@ -228,7 +264,7 @@ SELECT *,$score FROM V WHERE name LUCENE "test*"
228264

229265
## Creating a Manual Lucene Index
230266

231-
Beginning with version 2.1, the Lucene Engine supports index creation without the need for a class.
267+
The Lucene Engine supports index creation without the need for a class.
232268

233269
**Syntax**:
234270

0 commit comments

Comments
 (0)