Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Auto-batching - Enable feature by default and remove unwanted options #162

Merged
merged 5 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 1 addition & 9 deletions open-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,6 @@ components:
examples:
- uid: 0
indexUid: movies
batchUid: 0
status: succeeded
type: documentAdditionOrUpdate
details:
Expand All @@ -294,9 +293,6 @@ components:
indexUid:
type: string
description: The unique identifier of the index where this task is operated
batchUid:
type: integer
description: Identify in which batch a task has been grouped by auto-batching. It corresponds to the first task uid grouped within a batch.
status:
type: string
description: The status of the task
Expand Down Expand Up @@ -326,7 +322,7 @@ components:
description: Number of documents received for documentAdditionOrUpdate task.
indexedDocuments:
type: integer
description: Number of documents finally indexed for documentAdditionOrUpdate task or if batched, in the batchUid.
description: Number of documents finally indexed for documentAdditionOrUpdate task or a documentAdditionOrUpdate batch of tasks.
receivedDocumentsIds:
type: integer
description: Number of document ids received for documentDeletion task.
Expand Down Expand Up @@ -362,7 +358,6 @@ components:
required:
- uid
- indexUid
- batchUid
- status
- type
- enqueuedAt
Expand Down Expand Up @@ -3041,7 +3036,6 @@ paths:
results:
- uid: 1
indexUid: movies
batchUid: 1
status: succeeded
type: documentAdditionOrUpdate
details:
Expand All @@ -3053,7 +3047,6 @@ paths:
finishedAt: '2021-01-01T09:39:02.000000Z'
- uid: 0
indexUid: movies_Review
batchUid: 0
status: failed
type: documentAdditionOrUpdate
details:
Expand Down Expand Up @@ -3098,7 +3091,6 @@ paths:
value:
uid: 1
indexUid: movies
batchUid: 1
status: succeeded
type: documentAdditionOrUpdate
details:
Expand Down
10 changes: 2 additions & 8 deletions text/0034-telemetry-policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,7 @@ The collected data is sent to [Segment](https://segment.com/). Segment is a plat
| `infos.max_index_size` | Value of `--max-index-size`/`MEILI_INDEX_SIZE` in bytes | 336042103 | Every Hour |
| `infos.max_task_db_size` | Value of `--max-task-db-size`/`MEILI_MAX_TASK_DB_SIZE` in bytes | 336042103 | Every Hour |
| `infos.http_payload_size_limit` | Value of `--http-payload-size-limit`/`MEILI_HTTP_PAYLOAD_SIZE_LIMIT` in bytes | 336042103 | Every Hour |
| `infos.enable_auto_batching` | `true` if `--enable-auto-batching` is specified to true, otherwise `false` | `true` | Every Hour |
| `infos.max_batch_size` | Value of `--max-batch-size` in integer, otherwise `null` | 1000 | Every Hour |
| `infos.max_documents_per_batch` | Value of `--max-documents-per-batch` in integer, otherwise `null` | 1000 | Every Hour |
| `infos.debounce_duration_sec` | Value of `--debounce-duration-sec` in seconds, otherwise `0` | 3600 | Every Hour |
| `infos.disable_auto_batching` | `true` if `--disable-auto-batching`/`MEILI_DISABLE_AUTO_BATCHING` is specified to true, otherwise `false` | `true` | Every Hour |
| `infos.log_level` | Value of `--log-level`/`MEILI_LOG_LEVEL` | debug | Every Hour |
| `infos.max_indexing_memory` | Value of `--max-indexing-memory`/`MEILI_MAX_INDEXING_MEMORY` in bytes | 336042103 | Every Hour |
| `infos.max_indexing_threads` | Value of `--max-indexing-threads`/`MEILI_MAX_INDEXING_THREADS` in integer | 4 | Every Hour |
Expand Down Expand Up @@ -175,10 +172,7 @@ This property allows us to gather essential information to better understand on
| infos.max_index_size | Value of `--max-index-size`/`MEILI_INDEX_SIZE` in bytes | `336042103` |
| infos.max_task_db_size | Value of `--max-task-db-size`/`MEILI_MAX_TASK_DB_SIZE` in bytes | `336042103` |
| infos.http_payload_size_limit | Value of `--http-payload-size-limit`/`MEILI_HTTP_PAYLOAD_SIZE_LIMIT` in bytes | `336042103` |
| infos.enable_autobatching | `true` if `--enable-autobatching` is specified to true, otherwise `false` | `true` |
| infos.max_batch_size | Value of `--max-batch-size` in integer, otherwise `null` | `1000` |
| infos.max_documents_per_batch | Value of `--max-documents-per-batch` in integer, otherwise `null` | `1000` |
| infos.debounce_duration_sec | Value of `--debounce-duration-sec`in seconds, otherwise `0` | `3600` |
| infos.disable_auto_batching | `true` if `--disable-auto-batching`/`MEILI_DISABLE_AUTO_BATCHING` is specified to true, otherwise `false` | `true` |
| infos.log_level | Value of `--log-level`/`MEILI_LOG_LEVEL` | `debug` |
| infos.max_indexing_memory | Value of `--max-indexing-memory`/`MEILI_MAX_INDEXING_MEMORY` in bytes | `336042103` |
| infos.max_indexing_threads | Value of `--max-indexing-threads`/`MEILI_MAX_INDEXING_THREADS` in integer | `4` |
Expand Down
8 changes: 0 additions & 8 deletions text/0060-tasks-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ As writing is asynchronous for most of Meilisearch's operations, this API makes
|------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| uid | integer | Unique sequential identifier |
| indexUid | string | Unique index identifier. This field is `null` when the task type is `dumpCreation`. |
| batchUid | integer | Identify in which batch a task has been grouped by auto-batching. It corresponds to the first task uid grouped within a batch. See [0096-auto-batching.md](0096-auto-batching.md) |
| status | string | Status of the task. Possible values are `enqueued`, `processing`, `succeeded`, `failed` |
| type | string | Type of the task. Possible values are `indexCreation`, `indexUpdate`, `indexDeletion`, `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `dumpCreation` |
| details | object | Details information for a task payload. See Task Details part. |
Expand Down Expand Up @@ -138,7 +137,6 @@ e.g. A fully qualified `task` object in an `enqueued` state.
{
"uid": 0,
"indexUid": "movies",
"batchUid": 0,
"status": "enqueued",
"type": "settingsUpdate",
"details": {
Expand All @@ -164,7 +162,6 @@ e.g. A fully qualified `task` object in a `processing` state.
{
"uid": 0,
"indexUid": "movies",
"batchUid": 0,
"status": "processing",
"type": "settingsUpdate",
"details": {
Expand All @@ -190,7 +187,6 @@ e.g. A fully qualified `task` object in a `succeeded` state.
{
"uid": 0,
"indexUid": "movies",
"batchUid": 0,
"status": "succeeded",
"type": "settingsUpdate",
"details": {
Expand All @@ -216,7 +212,6 @@ e.g. A fully qualified `task` object in a `failed` state.
{
"uid": 0,
"indexUid": "movies",
"batchUid": 0,
"status": "failed",
"type": "settingsUpdate",
"details": {
Expand Down Expand Up @@ -273,7 +268,6 @@ Allows users to list tasks globally regardless of the indexes involved. Particul
{
"uid": 1,
"indexUid": "movies_reviews",
"batchUid": 1,
"status": "enqueued",
"type": "documentAdditionOrUpdate",
"duration": null,
Expand All @@ -284,7 +278,6 @@ Allows users to list tasks globally regardless of the indexes involved. Particul
{
"uid": 0,
"indexUid": "movies",
"batchUid": 0,
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"details": {
Expand Down Expand Up @@ -331,7 +324,6 @@ Allows users to get a detailed `task` object retrieved by the `uid` field regard
{
"uid": 1,
"indexUid": "movies",
"batchUid": 1,
"status": "enqueued",
"type": "documentAdditionOrUpdate",
"duration": null,
Expand Down
51 changes: 13 additions & 38 deletions text/0096-auto-batching.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,27 @@

Meilisearch can automatically group consecutive asynchronous `documentAddition` or `documentPartial` tasks for the same index via an automatic batching mechanism.

The user can enable this auto-batching behavior through various command flag options.
The user can disable this auto-batching behavior. See [3.2. Auto-batching mechanisms options](#32-auto-batching-mechanisms-options) section.

## 2. Motivation

We have regularly collected user pain points pointing out the slow indexing over the last year. We explained several times to users to make batches containing a maximum of documents to be updated/added to compress the indexing time of specific data structures.

To make Meilisearch easier to use, we explored the idea of automatically creating these batches within Meilisearch before indexing users’ documents.

## 3. Functional Specification
## 3. Functional Specification

### 3.1. Explanations

All `tasks` are part of a batch identified by a `batchUid`. A task batch preserves the logical order of the tasks for a given index.
A batch preserves the logical order of the tasks for a given index.

Only consecutive `documentAddition` and `documentPartial` tasks for the same index can have the same `batchUid`. All `tasks` concerning other operations will also be part of a `batchUid` having only one task.
Only consecutive `documentAdditionOrUpdate` tasks for the same index can be in the same batch. All `tasks` concerning other operations will also be part of a batch having only one task.

#### 3.1.1. Grouping tasks to a single batch

The scheduling program that groups tasks within a single batch is triggered when an asynchronous `task` currently processed reaches a terminal state as `succeeded` or `failed`.

In other words, when a scheduled `documentAddition` task for a given index is picked from the task queue, the scheduler fetches and groups all `documentAddition` tasks for that same index in a batch.
In other words, when a scheduled `documentAdditionOrUpdate` task for a given index is picked from the task queue, the scheduler fetches and groups all `documentAdditionOrUpdate` tasks for that same index in a batch.

The more similar consecutive tasks the user sends in a row, the more likely the batching mechanism can group these tasks.

Expand All @@ -36,53 +36,28 @@ The more similar consecutive tasks the user sends in a row, the more likely the

##### 3.1.1.2. `batchUid` generation

The batch identifiers are unique and strictly increasing.
All tasks are part of a batch identified by an internal `batchUid` field. A task batch preserves the logical order of the tasks for a given index. The batch identifiers are unique and strictly increasing. The `batchUid` field is internal; thus not visible on a `task` resource.

#### 3.1.2. Impacts on `task` API resource

- The different tasks grouped in a batch are processed within the same transaction. If a task fails within a batch, the whole batch fails.
- A `batchUid` field is only added on fully-qualified `task` API objects. `batchUid` values are unique and strictly increasing.
- The different tasks grouped in a batch are processed within the same transaction. But if a task fails within a batch, the whole batch does not fail, only the related task.
- Tasks within the same batch share the same values for the `startedAt`, `finishedAt`, `duration` fields, and the same `error` object if an error occurs for a `task` during the batch processing.
- If a batch contains many `tasks`, the `task` `details` `indexedDocuments` is identical in all `tasks` belonging to the same processed `batch`.

### 3.2. Auto-batching mechanisms options

### 3.2.1. `--enable-auto-batching`
### 3.2.1. `--disable-auto-batching`

By default, the auto-batching feature is disabled.
By default, the auto-batching feature is enabled.

The auto-batching feature can be activated by passing the command flag `--enable-auto-batching` to Meilisearch at launch.

### 3.2.2. `--max-batch-size`

`--max-batch-size <NUM>` allows setting the maximum number `NUM` of tasks that can be processed together within a single batch.

If `0` is set it will be replaced by `1`, since such a value would prevent any task from ever being processed.

If not specified, this is unlimited.

### 3.2.3. `--max-documents-per-batch`

`--max-documents-per-batch <NUM>` allows setting a limit to the maximum number `NUM` of documents that can be indexed together within a single batch.

Since the batch can't split one update in half, this value is rounded up to the number of documents in the last document addition.

If not specified, this is unlimited.
### 3.2.4. `--debounce-duration-sec`

`--debounce-duration-sec <SECS>` wait at least `SECS` seconds between the time the scheduler is notified of a new `task` and the processing of the next batch.

Snapshots and dumps are impacted by this debounce duration. It means that they will be processed at the end of the current debounce duration.

Defaults to `0`secs (process immediately).
The auto-batching feature can be desactivated by passing the command flag `--disable-auto-batching` (or the environment variable `MEILI_DISABLE_AUTO_BATCHING`) to Meilisearch at launch.

## 4. Technical Aspects
N/A

## 5. Future Possibilities

- Extends it for all consecutive payload types.
- Add a filter capability by `batchUid` on the `/tasks` endpoints.
- Do not fail the entire transaction if a document is not valid. Report the documents that could not be indexed to the user.
- Enable auto-batching by default.
- Optimize some tasks sequence, for example if there is a document addition followed by an index deletion, we could skip the document addition
- Expose the `batchUid` field and add a filter capability on it on the `/tasks` endpoints.
- Report the documents that could not be indexed to the user in a more precise manner.
- Optimize some tasks sequence, for example if there is a document addition followed by an index deletion, we could skip the document addition.
44 changes: 26 additions & 18 deletions text/0119-instance-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,16 +96,14 @@ The expected behavior of each flag is described in the list above.
- [HTTP address & port binding](#333-http-address--port-binding)
- [Master key](#334-master-key)
- [Disable analytics](#335-disable-analytics)
- [Dumps](#336-dumps-destination)
- [Dumps destination](#336-dumps-destination)
- [Import dump](#337-import-dump)
- [Ignore missing dump](#338-ignore-missing-dump)
- [Ignore dump if DB exists](#339-ignore-dump-if-db-exists)
- [Log level](#3310-log-level)
- [Max index size](#3311-max-index-size)
- [Max TASK_DB size](#3312-max-taskdb-size)
- [Max TASK_DB size](#3312-max-task_db-size)
- [Payload limit size](#3313-payload-limit-size)
- [Snapshots](#3314-schedule-snapshot-creation)
- [Schedule snapshot creation](#3314-schedule-snapshot-creation)
- [Snapshot destination](#3315-snapshot-destination)
- [Snapshot interval](#3316-snapshot-interval)
Expand All @@ -114,14 +112,14 @@ The expected behavior of each flag is described in the list above.
- [Ignore snapshot if DB exists](#3319-ignore-snapshot-if-db-exists)
- [Max memory usage when indexing](#3320-max-memory-usage-when-indexing)
- [Max indexing threads](#3321-max-indexing-threads)
- [SSL configuration](#3322-ssl-authentication-path)
- [SSL authentication path](#3322-ssl-authentication-path)
- [SSL certificates path](#3323-ssl-certificates-path)
- [SSL key path](#3324-ssl-key-path)
- [SSL OCSP path](#3325-ssl-ocsp-path)
- [SSL require auth](#3326-ssl-require-auth)
- [SSL resumption](#3327-ssl-resumption)
- [SSL tickets](#3328-ssl-tickets)
- [Disable auto-batching](#3322-disable-auto-batching)
- [SSL authentication path](#3323-ssl-authentication-path)
- [SSL certificates path](#3324-ssl-certificates-path)
- [SSL key path](#3325-ssl-key-path)
- [SSL OCSP path](#3326-ssl-ocsp-path)
- [SSL require auth](#3327-ssl-require-auth)
- [SSL resumption](#3328-ssl-resumption)
- [SSL tickets](#3329-ssl-tickets)

#### 3.3.1. Database path

Expand Down Expand Up @@ -375,7 +373,17 @@ Obviously, multi-threading is not possible in machines with only one processor c

If the number set is higher than the real number of core available in the machine, Meilisearch will use the maximum number of available cores.

#### 3.3.22. SSL authentication path
#### 3.3.22. Disable auto-batching

**Environment variable**: `MEILI_DISABLE_AUTO_BATCHING`
**CLI option**: `--disable-auto-batching`
**Default**: Enable

⚠️ This command-line option does not take any values. Assigning a value will throw an error.

Disable the [auto-batching feature](./0096-auto-batching.md).

#### 3.3.23. SSL authentication path

**Environment variable**: `MEILI_SSL_AUTH_PATH`
**CLI option**: `--ssl-auth-path`
Expand All @@ -384,7 +392,7 @@ If the number set is higher than the real number of core available in the machin

Enables client authentication in the specified path.

#### 3.3.23. SSL certificates path
#### 3.3.24. SSL certificates path

**Environment variable**: `MEILI_SSL_CERT_PATH`
**CLI option**: `--ssl-cert-path`
Expand All @@ -395,7 +403,7 @@ Sets the server's SSL certificates.

Value must be a path to PEM-formatted certificates. The first certificate should certify the KEYFILE supplied by `--ssl-key-path`. The last certificate should be a root CA.

#### 3.3.24. SSL key path
#### 3.3.25. SSL key path

**Environment variable**: `MEILI_SSL_KEY_PATH`
**CLI option**: `--ssl-key-path`
Expand All @@ -406,7 +414,7 @@ Sets the server's SSL keyfiles.

Value must be a path to an RSA private key or PKCS8-encoded private key, both in PEM format.

#### 3.3.25. SSL OCSP path
#### 3.3.26. SSL OCSP path

**Environment variable**: `MEILI_SSL_OCSP_PATH`
**CLI option**: `--ssl-ocsp-path`
Expand All @@ -417,7 +425,7 @@ Sets the server's OCSP file. *Optional*

Reads DER-encoded OCSP response from OCSPFILE and staple to certificate.

#### 3.3.26. SSL require auth
#### 3.3.27. SSL require auth

**Environment variable**: `MEILI_SSL_REQUIRE_AUTH`
**CLI option**: `--ssl-require-auth`
Expand All @@ -429,7 +437,7 @@ Makes SSL authentication mandatory.

Sends a fatal alert if the client does not complete client authentication.

#### 3.3.27. SSL resumption
#### 3.3.28. SSL resumption

**Environment variable**: `MEILI_SSL_RESUMPTION`
**CLI option**: `--ssl-resumption`
Expand All @@ -439,7 +447,7 @@ Sends a fatal alert if the client does not complete client authentication.

Activates SSL session resumption.

#### 3.3.28. SSL tickets
#### 3.3.29. SSL tickets

**Environment variable**: `MEILI_SSL_TICKETS`
**CLI option**: `--ssl-tickets`
Expand Down