Tags · meilisearch/milli

This repository was archived by the owner on Apr 4, 2023. It is now read-only.

v0.41.3

Merge #780

780: Update version for the next release (v0.41.3) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

Feb 22, 2023
e021bc1
zip
tar.gz
Notes

v0.41.2

Merge #777

777: Fix attributes set candidates r=ManyTheFish a=ManyTheFish

Fixes meilisearch/meilisearch#3483 for v1.0.1


Co-authored-by: Many the fish <many@meilisearch.com>

Feb 14, 2023
f79dbdf
zip
tar.gz
Notes

v0.41.1

Merge #776

776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec

Fixes partially #605

The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.

It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)

Note, before approving  the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

Jan 31, 2023
758b4ac
zip
tar.gz
Notes

v0.41.0

Merge #773

773: bump milli r=Kerollmops a=irevoire

I need a new release of milli for meilisearch/meilisearch#3415

Co-authored-by: Tamo <tamo@meilisearch.com>

Jan 24, 2023
30f8835
zip
tar.gz
Notes

v0.40.0

Merge #770

770: Update deserr v0.3.0 r=irevoire a=ManyTheFish

related to meilisearch/meilisearch#3391


Co-authored-by: Many the fish <many@meilisearch.com>

Jan 19, 2023
1c4b1b3
zip
tar.gz
Notes

v0.39.1

Merge #765

765: Update version for the next release (v0.39.1) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

Jan 17, 2023
0c7d1f7
zip
tar.gz
Notes

v0.39.0

Merge #762

762: Update version for the next release (v0.39.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

Jan 11, 2023
e6bea99
zip
tar.gz
Notes

v0.38.0

Merge #733

733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec

# Pull Request

## Related issue
Somewhat fixes (until merged into meilisearch) meilisearch/meilisearch#3118

## What does this PR do?
When a query ends with a word and a prefix, such as:
```
word pr
```
Then we first determine whether `pre` *could possibly* be in the proximity prefix database before querying it. There are then three possibilities:

1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases.

2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. **In this case, we partially disable the proximity ranking rule for the pair `word pre`.** This is done as follows:
   1. Only find the documents where `word` is in proximity to `pre` **exactly** (no derivations)
   2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8

3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases.

Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is:

1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8
2. For common prefixes of more than two letters: we no longer distinguish between any proximities
3. For uncommon prefixes: nothing changes

Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset):
```json
[
    { "text": "I heard there is a faster proximity criterion" },
    { "text": "I heard there is a faster but less relevant proximity criterion" }
]
```

Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro":
```json
[
    { "text": "I heard there is a faster but less relevant proximity criterion" }
    { "text": "I heard there is a faster proximity criterion" },
]
```
But the following document would be considered more relevant than the two documents above:
```json
{ "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " }
```

Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything. 

---

## Performance

I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset.   

```
1. 10x 'a': 
	- 640ms ⟹ 630ms                  = no significant difference
2. 10x 'b':
	- set-based: 4.47s ⟹ 7.42        = bad, ~2x regression
	- dynamic: 1s ⟹ 870 ms           = no significant difference
3. 'Someone I l':
	- set-based: 250ms ⟹ 12 ms       = very good, x20 speedup
	- dynamic: 21ms ⟹ 11 ms          = good, x2 speedup 
4. 'billie e':
	- set-based: 623ms ⟹ 2ms         = very good, x300 speedup 
	- dynamic: ~4ms ⟹ 4ms            = no difference
5. 'billie ei':
	- set-based: 57ms ⟹ 20ms         = good, ~2x speedup
	- dynamic: ~4ms ⟹ ~2ms.          = no significant difference
6. 'i am getting o' 
	- set-based: 300ms ⟹ 60ms        = very good, 5x speedup
	- dynamic: 30ms ⟹ 6ms            = very good, 5x speedup
7. 'prologue 1 a 1:
	- set-based: 3.36s ⟹ 120ms       = very good, 30x speedup
	- dynamic: 200ms ⟹ 30ms          = very good, 6x speedup
8. 'prologue 1 a 10':
	- set-based: 590ms ⟹ 18ms        = very good, 30x speedup 
	- dynamic: 82ms ⟹ 35ms           = good, ~2x speedup
```

Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

Jan 4, 2023
c3f4835
zip
tar.gz
Notes

v0.37.5

Merge #754

754: Fix hard-deletion of an external id that was soft-deleted and then reimported - release-v0.37.5 r=irevoire a=loiclec

This is a PR that cherry-picks the commit from #750 onto `release-v0.37.5`

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

Dec 20, 2022
0bc7f70
zip
tar.gz
Notes

v0.37.4

Merge #745

745: Cherry pick placeholder search fix r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>

Dec 15, 2022
e1d7d72
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.41.3

v0.41.2

v0.41.1

v0.41.0

v0.40.0

v0.39.1

v0.39.0

v0.38.0

v0.37.5

v0.37.4

Tags: meilisearch/milli