Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: msq autocompaction #16681

Merged
merged 28 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7d5c8d3
docs: msq autocompaction docs
317brian Jul 2, 2024
63bc906
cleanup
317brian Jul 2, 2024
4c204e7
Merge branch 'master' into msq-autocompact-docs
317brian Jul 2, 2024
0f96df3
Merge branch 'master' into msq-autocompact-docs
317brian Sep 18, 2024
d118847
update for overlord-based autocompact
317brian Sep 19, 2024
b698acf
parallelism
317brian Sep 19, 2024
0699168
update list in ki
317brian Sep 19, 2024
c58b26b
update supervisor docs
317brian Sep 20, 2024
5e08d07
fix typos
317brian Sep 20, 2024
802af43
Apply suggestions from code review
317brian Sep 27, 2024
30db2ba
address comments
317brian Sep 27, 2024
56ede75
fix typo
317brian Sep 30, 2024
ea6c767
address comments
317brian Oct 1, 2024
9defd6e
Apply suggestions from code review
317brian Oct 3, 2024
4fe38a7
address comments
317brian Oct 3, 2024
790dcf0
fix link etc
317brian Oct 4, 2024
ae5829b
update aggregator section
317brian Oct 7, 2024
ba1434a
fix link
317brian Oct 8, 2024
1dfbb6b
Merge branch 'master' into msq-autocompact-docs
317brian Oct 9, 2024
52dfd51
Apply suggestions from code review
317brian Oct 10, 2024
a0e2b61
Update docs/data-management/automatic-compaction.md
317brian Oct 15, 2024
3e1bbf3
Apply suggestions from code review
317brian Oct 15, 2024
82baf46
Apply suggestions from code review
317brian Oct 15, 2024
632da59
address review comments
317brian Oct 15, 2024
390dd6a
Apply suggestions from code review
317brian Oct 16, 2024
1ea4fbd
update config page
317brian Oct 16, 2024
9667d59
fix link
317brian Oct 16, 2024
7fda723
update spelling file
317brian Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address review comments
  • Loading branch information
317brian committed Oct 15, 2024
commit 632da5962a2c9a3b585caac8d1bc3b9a974bb92e
8 changes: 7 additions & 1 deletion docs/api-reference/automatic-compaction-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,13 @@ import TabItem from '@theme/TabItem';
~ under the License.
-->

This topic describes the status and configuration API endpoints for [automatic compaction](../data-management/automatic-compaction.md) in Apache Druid. You can configure automatic compaction in the Druid web console or API.
This topic describes the status and configuration API endpoints for [automatic compaction using Coordinator duties](../data-management/automatic-compaction.md#auto-compaction-using-coordinator-duties) in Apache Druid. You can configure automatic compaction in the Druid web console or API.

:::info Experimental

Instead of the automatic compaction API, you can use the supervisor API to submit auto-compaction jobs using compaction supervisors. For more information, see [Auto-compaction using compaction supervisors](../data-management/automatic-compaction.md#auto-compaction-using-compaction-supervisors).

:::

In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments.

Expand Down
44 changes: 27 additions & 17 deletions docs/data-management/automatic-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,17 @@ The automatic compaction system uses the following syntax:
"granularitySpec": <compaction task granularitySpec>,
"skipOffsetFromLatest": <time period to avoid compaction>,
"taskPriority": <compaction task priority>,
"taskContext": <task context>,
"engine": <native|msq>
"taskContext": <task context>
}
```

For Coordinator-based automatic compaction, you submit the spec to the [Compaction config UI](#manage-auto-compaction-using-the-web-console) or the [Compaction configuration API](#manage-auto-compaction-using-coordinator-apis).
:::info Experimental

The MSQ task engine is available as a compaction engine when you run automatic compaction as a compaction supervisor. For more information, see [Auto-compaction using compaction supervisors](#auto-compaction-using-compaction-supervisors).

:::

For automatic compaction using Coordinator duties, you submit the spec to the [Compaction config UI](#manage-auto-compaction-using-the-web-console) or the [Compaction configuration API](#manage-auto-compaction-using-coordinator-apis).

Most fields in the auto-compaction configuration correlate to a typical [Druid ingestion spec](../ingestion/ingestion-spec.md).
The following properties only apply to auto-compaction:
Expand Down Expand Up @@ -226,7 +231,7 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg
## Auto-compaction using compaction supervisors

:::info Experimental
Compaction supervisors are experimental. For production use, we recommend [Coordinator-based auto-compaction](#auto-compaction-using-coordinator-duties).
Compaction supervisors are experimental. For production use, we recommend [auto-compaction using Coordinator duties](#auto-compaction-using-coordinator-duties).
:::

You can run automatic compaction using compaction supervisors on the Overlord rather than Coordinator duties. Compaction supervisors provide the following benefits over Coordinator duties:
317brian marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -240,11 +245,12 @@ You can run automatic compaction using compaction supervisors on the Overlord ra

To use compaction supervisors, set the following properties in your Overlord runtime properties:
* `druid.supervisor.compaction.enabled` to `true` so that compaction tasks can be run as a supervisor tasks
317brian marked this conversation as resolved.
Show resolved Hide resolved
* `druid.supervisor.compaction.engine` to `msq` to specify the MSQ task engine as the compaction engine or to `native` to use the native engine.
* `druid.supervisor.compaction.engine` to `msq` to specify the MSQ task engine as the compaction engine or to `native` to use the native engine. This is the default engine if the `engine` field is omitted from your compaction config

Compaction supervisors use the same syntax as auto-compaction using Coordinator duties with one key difference: you submit the auto-compaction as a a supervisor spec. In the spec, set the `type` to `autocompact` and include the auto-compaction config in the `spec` .
317brian marked this conversation as resolved.
Show resolved Hide resolved

To submit an automatic compaction task, you can submit a supervisor spec through the [web console](#manage-compaction-supervisors-with-the-web-console) or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).

Compaction uses the same syntax as Coordinator-based auto-compaction with some differences. Specifically, you submit a supervisor spec with the `type` set to `autocompact` and the auto-compaction config in the `spec` to configure auto-compaction.

For information about the syntax, see [automatic compaction syntax](#auto-compaction-syntax).

### Manage compaction supervisors with the web console

Expand All @@ -260,7 +266,10 @@ To submit a supervisor spec for MSQ task engine automatic compaction, perform th
"type": "autocompact",
"spec": {
"dataSource": YOUR_DATASOURCE,
...
"tuningConfig": {...},
"granularitySpec": {...},
"engine": <native|msq>,
...
}
```
1. Submit the supervisor.
Expand All @@ -277,13 +286,13 @@ The following example configures auto-compaction for the `wikipedia` datasource:
curl --location --request POST 'http://localhost:8081/druid/indexer/v1/supervisor' \
--header 'Content-Type: application/json' \
--data-raw '{
"type": "autocompact", // required
"suspended": false, // optional
"spec": { // required
"dataSource": "wikipedia", // required
"tuningConfig": {...}, // optional
"granularitySpec": {...}, // optional
"engine": <native|msq>, //optional
"type": "autocompact", // required
"suspended": false, // optional
"spec": { // required
"dataSource": "wikipedia", // required
"tuningConfig": {...}, // optional
"granularitySpec": {...}, // optional
"engine": <native|msq>, // optional
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add engine parameter here

"granularitySpec": {...},
"engine": <native|msq>,            // optional
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that skipping it here and specifying it later in supervisor-based spec may be confusing. If we keep it here, just want to make sure that users realize that it's only supported with supervisors.

Also, we need to add this field to Automatic compaction dynamic configuration page. Maybe this info can reside there simiar to below:

engine | Engine for compaction. Can be either native or msq. MSQ is only supported with compaction supervisors | no (default = native)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also include the above on Automatic compaction dynamic configuration page?

}
}'
Expand All @@ -304,11 +313,12 @@ The MSQ task engine is available as a compaction engine if you configure auto-co
* Have at least two compaction task slots available or set `compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine requires at least two tasks to run, one controller task and one worker task.

You can use [MSQ task engine context parameters](../multi-stage-query/reference.md#context-parameters) in `spec.taskContext` when configuring your datasource for automatic compaction, such as setting the maximum number of tasks using the `spec.taskContext.maxNumTasks` parameter. Some of the MSQ task engine context parameters overlap with automatic compaction parameters. When these settings overlap, set one or the other.
To submit an automatic compaction task, you submit a supervisor spec through the UI or API with the type `autocompact` and the `spec` where you define the compaction behavior using the [automatic compaction syntax](#auto-compaction-syntax). You can also use the [web console](#manage-compaction-supervisors-with-the-web-console).


#### MSQ task engine limitations

<!--This list also exists in multi-stage-query/known-issues-->

When using the MSQ task engine for auto-compaction, keep the following limitations in mind:

- The `metricSpec` field is only supported for certain aggregators. For more information, see [Supported aggregators](#supported-aggregators).
Expand Down
4 changes: 2 additions & 2 deletions docs/multi-stage-query/known-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ where native engine has a non-empty `leafOperator`.

The following known issues and limitations affect automatic compaction with the MSQ task engine:
317brian marked this conversation as resolved.
Show resolved Hide resolved

- The `metricSpec` field is only supported for idempotent aggregators. For more information, see [Idempotent aggregators](../data-management/automatic-compaction.md#supported-aggregators).
- Only dynamic and range-based partitioning are supported
- The `metricSpec` field is only supported for certain aggregators. For more information, see [Supported aggregators](#supported-aggregators).
- Only dynamic and range-based partitioning are supported.
- Set `rollup` to `true` if and only if `metricSpec` is not empty or null.
- You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use `maxRowsPerSegment` instead.
Expand Down
Loading