Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Tags: meilisearch/milli

Tags

v0.41.3

Toggle v0.41.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #780

780: Update version for the next release (v0.41.3) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

v0.41.2

Toggle v0.41.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #777

777: Fix attributes set candidates r=ManyTheFish a=ManyTheFish

Fixes meilisearch/meilisearch#3483 for v1.0.1


Co-authored-by: Many the fish <many@meilisearch.com>

v0.41.1

Toggle v0.41.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #776

776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec

Fixes partially #605

The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.

It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)

Note, before approving  the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

v0.41.0

Toggle v0.41.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #773

773: bump milli r=Kerollmops a=irevoire

I need a new release of milli for meilisearch/meilisearch#3415

Co-authored-by: Tamo <tamo@meilisearch.com>

v0.40.0

Toggle v0.40.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #770

770: Update deserr v0.3.0 r=irevoire a=ManyTheFish

related to meilisearch/meilisearch#3391


Co-authored-by: Many the fish <many@meilisearch.com>

v0.39.1

Toggle v0.39.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #765

765: Update version for the next release (v0.39.1) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

v0.39.0

Toggle v0.39.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #762

762: Update version for the next release (v0.39.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>

v0.38.0

Toggle v0.38.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #733

733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec

# Pull Request

## Related issue
Somewhat fixes (until merged into meilisearch) meilisearch/meilisearch#3118

## What does this PR do?
When a query ends with a word and a prefix, such as:
```
word pr
```
Then we first determine whether `pre` *could possibly* be in the proximity prefix database before querying it. There are then three possibilities:

1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases.

2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. **In this case, we partially disable the proximity ranking rule for the pair `word pre`.** This is done as follows:
   1. Only find the documents where `word` is in proximity to `pre` **exactly** (no derivations)
   2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8

3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases.

Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is:

1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8
2. For common prefixes of more than two letters: we no longer distinguish between any proximities
3. For uncommon prefixes: nothing changes

Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset):
```json
[
    { "text": "I heard there is a faster proximity criterion" },
    { "text": "I heard there is a faster but less relevant proximity criterion" }
]
```

Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro":
```json
[
    { "text": "I heard there is a faster but less relevant proximity criterion" }
    { "text": "I heard there is a faster proximity criterion" },
]
```
But the following document would be considered more relevant than the two documents above:
```json
{ "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " }
```

Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything. 

---

## Performance

I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset.   

```
1. 10x 'a': 
	- 640ms ⟹ 630ms                  = no significant difference
2. 10x 'b':
	- set-based: 4.47s ⟹ 7.42        = bad, ~2x regression
	- dynamic: 1s ⟹ 870 ms           = no significant difference
3. 'Someone I l':
	- set-based: 250ms ⟹ 12 ms       = very good, x20 speedup
	- dynamic: 21ms ⟹ 11 ms          = good, x2 speedup 
4. 'billie e':
	- set-based: 623ms ⟹ 2ms         = very good, x300 speedup 
	- dynamic: ~4ms ⟹ 4ms            = no difference
5. 'billie ei':
	- set-based: 57ms ⟹ 20ms         = good, ~2x speedup
	- dynamic: ~4ms ⟹ ~2ms.          = no significant difference
6. 'i am getting o' 
	- set-based: 300ms ⟹ 60ms        = very good, 5x speedup
	- dynamic: 30ms ⟹ 6ms            = very good, 5x speedup
7. 'prologue 1 a 1:
	- set-based: 3.36s ⟹ 120ms       = very good, 30x speedup
	- dynamic: 200ms ⟹ 30ms          = very good, 6x speedup
8. 'prologue 1 a 10':
	- set-based: 590ms ⟹ 18ms        = very good, 30x speedup 
	- dynamic: 82ms ⟹ 35ms           = good, ~2x speedup
```

Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

v0.37.5

Toggle v0.37.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #754

754: Fix hard-deletion of an external id that was soft-deleted and then reimported - release-v0.37.5 r=irevoire a=loiclec

This is a PR that cherry-picks the commit from #750 onto `release-v0.37.5`

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

v0.37.4

Toggle v0.37.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge #745

745: Cherry pick placeholder search fix r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>