Skip to content

Conversation

@jepett0
Copy link
Contributor

@jepett0 jepett0 commented Mar 12, 2024

KIKIMR-18302

Users pay differently for HDD and SSD storage. They create storage pools differentiated by the underlying storage kind for their databases. Moreover, they can specify the preferred storage kind for each column in a table (see column groups in the docs for the CREATE TABLE statement).

However, up until this PR they didn't know, how much storage was used on each of the storage pool kinds. (And we didn't have storage pool kinds quotas to disable writes to the database, which exceeded the limit on one of its storage pools.)

We would like to provide users with an aggregate of the disk space usage of the database so they can order additional disks before the space is physically depleted. This is done by aggregating the by channel disk space usage statistics that the SchemeShard receives from the data shards (see TEvPeriodicTableStats). Channels are mapped to the corresponding storage pool kinds via the information that the SchemeShard has about the database (in code databases are subdomains) and the storage pools it was created with. Aggregation is done on two levels: by tables and by database. Aggregate by the table path can be seen in the UI in the path description of the table under the Describe -> PathDescription -> TableStats -> StoragePools field. Aggregate by the database can be seen in the UI in the Describe -> PathDescription -> DomainDescription -> DiskSpaceUsage -> StoragePoolsUsage field.

In addition, we implemented "storage_pools_quotas" that the user can specify in the "DatabaseQuotas" section of the config of the database that the user would like to create. There are 3 parameters in each storage pool quota:

  • pool kind,
  • hard quota (if any storage pool exceeds its hard quota, writes to the whole database (not just the storage pool that has exceeded the quota!) are restricted),
  • soft quota (if all storage pools use less storage than the corresponding soft quota, then the database opens for writing again).

"storage_pools_quotas" can be used together with the existing "data_size_hard_quota" and "data_size_soft_quota" that do not differentiate between storage pools. Exceedance of any hard quota (either the storage pool one, or the entire "data_size_hard_quota") disables writes to the database. To reenable writes, all disk space usage (either the storage pool one, or the aggregated TotalSize) must be below the corresponding soft quota.

One important thing to note about the storage pools usage statistics is that it is delivered to the SchemeShard with a considerable delay (about 1 minute). This means that the storage pools usage will be checked against the storage pools quotas with a delay and some data can be written above the hard limit. (And the other way around too: deleting some data to open the database for writes will be noticed by the SchemeShard with a considerable delay (about 420 seconds in my tests with a default compaction policy, I don't know where this number comes from). This is due to the fact that the new data is stored in the LSM tree (I guess) and is written to the appropriate storage pool later, after compaction.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 requested review from ijon and snaury March 13, 2024 13:09
@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from a19c3b4 to 7009c75 Compare March 14, 2024 09:07
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Copy link
Collaborator

@ijon ijon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More explicit tests on the permutation cases of (database, pool, hard, soft quotas, defined, not defined) would be nice

Comment on lines 520 to 523
LOG_DEBUG_S(ctx, NKikimrServices::FLAT_TX_SCHEMESHARD,
"Got periodic table stats at tablet " << TabletID()
<< " from shard " << datashardId
<< " pathId " << pathId
<< " raw table stats:\n" << tableStats.DebugString());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it purely dev time related output? Should it be removed by now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was helpful during the development and I think, there is no such message in the debug logs now. I would like to leave it. It is debug level log, so it should not bother others much

Copy link
Contributor Author

@jepett0 jepett0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed level to TRACE and made it ShortDebugString() to make it use less space in the output log

@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from 7009c75 to 958757e Compare March 25, 2024 07:26
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from 627815f to 7afab1c Compare March 25, 2024 13:04
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from 7afab1c to ee5f289 Compare March 25, 2024 22:17
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Copy link
Member

@snaury snaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from 5cc7e0d to fce55f8 Compare April 1, 2024 06:53
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from fce55f8 to 8af4a09 Compare April 1, 2024 10:49
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

jepett0 added 4 commits April 3, 2024 08:38
- change log level to trace to not pollute the SchemeShard log
- change comment wording to emphasize the real values seen in practice
@jepett0 jepett0 force-pushed the SchemeShard.per_channel_storage_limits.1 branch from 8af4a09 to 5d15404 Compare April 3, 2024 08:38
@github-actions
Copy link

github-actions bot commented Apr 3, 2024

2024-04-03 08:39:52 UTC Pre-commit check for bc745ee has started.
2024-04-03 08:39:55 UTC Build linux-x86_64-release-asan is running...
🟢 2024-04-03 09:18:47 UTC Build successful.
2024-04-03 09:20:37 UTC Tests are running...
🔴 2024-04-03 10:56:31 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14229 13932 0 56 220 21

@github-actions
Copy link

github-actions bot commented Apr 3, 2024

2024-04-03 08:42:09 UTC Pre-commit check for bc745ee has started.
2024-04-03 08:42:12 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-04-03 09:20:57 UTC Build successful.
2024-04-03 09:22:45 UTC Tests are running...
🔴 2024-04-03 11:10:49 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
69153 57992 0 20 11117 24

@github-actions
Copy link

github-actions bot commented Apr 3, 2024

2024-04-03 08:42:10 UTC Pre-commit check for bc745ee has started.
2024-04-03 08:42:13 UTC Build linux-x86_64-release-clang14 is running...
🟢 2024-04-03 09:15:37 UTC Build successful.

@jepett0 jepett0 merged commit 24926f6 into ydb-platform:main Apr 3, 2024
@shnikd shnikd mentioned this pull request Apr 11, 2024
jepett0 added a commit to jepett0/ydb that referenced this pull request Jun 17, 2024
KIKIMR-18302

Users pay differently for HDD and SSD storage. They create storage pools differentiated by the underlying storage kind for their databases. Moreover, they can specify the preferred storage kind for each column in a table (see [column groups](https://ydb.tech/docs/en/yql/reference/syntax/create_table#column-family) in the docs for the CREATE TABLE statement).

However, up until this PR they didn't know, how much storage was used on each of the storage pool kinds. (And we didn't have storage pool kinds quotas to disable writes to the database, which exceeded the limit on one of its storage pools.)

We would like to provide users with an aggregate of the disk space usage of the database so they can order additional disks before the space is physically depleted. This is done by aggregating the [by channel disk space usage statistics](https://github.com/ydb-platform/ydb/blob/7a673cf01feefbe95bf5e7396d9179a5f283aeba/ydb/core/protos/table_stats.proto#L57) that the SchemeShard receives from the data shards (see [TEvPeriodicTableStats](https://github.com/ydb-platform/ydb/blob/7a673cf01feefbe95bf5e7396d9179a5f283aeba/ydb/core/protos/tx_datashard.proto#L789)). Channels are mapped to the corresponding storage pool kinds via the information that the SchemeShard has about the database (in code databases are subdomains) and the storage pools it was created with. Aggregation is done on two levels: by tables and by database. Aggregate by the table path can be seen in the UI in the path description of the table under the Describe -> PathDescription -> TableStats -> StoragePools field. Aggregate by the database can be seen in the UI in the Describe -> PathDescription -> DomainDescription -> DiskSpaceUsage -> StoragePoolsUsage field.

In addition, we implemented "storage_pools_quotas" that the user can specify in the "DatabaseQuotas" section of the config of the database that the user would like to create. There are 3 parameters in each [storage pool quota](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L98):
- pool kind,
- hard quota (if any storage pool exceeds its hard quota, writes to the **whole** database (not just the storage pool that has exceeded the quota!) are restricted),
- soft quota (if all storage pools use less storage than the corresponding soft quota, then the database opens for writing again).

"storage_pools_quotas" can be used together with the existing ["data_size_hard_quota"](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L82) and ["data_size_soft_quota"](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L88) that do not differentiate between storage pools. Exceedance of __any__ hard quota (either the storage pool one, or the entire "data_size_hard_quota") disables writes to the database. To reenable writes, __all__ disk space usage (either the [storage pool one](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/core/tx/schemeshard/schemeshard_info_types.h#L1460), or the aggregated [TotalSize](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/core/tx/schemeshard/schemeshard_info_types.h#L1452)) must be below the corresponding soft quota.

One important thing to note about the storage pools usage statistics is that it is delivered to the SchemeShard with a considerable delay (about 1 minute). This means that the storage pools usage will be checked against the storage pools quotas with a delay and some data can be written above the hard limit. (And the other way around too: deleting some data to open the database for writes will be noticed by the SchemeShard with a considerable delay (about 420 seconds in my tests with a default compaction policy, I don't know where this number comes from). This is due to the fact that the new data is stored in the LSM tree (I guess) and is written to the appropriate storage pool later, after compaction.
jepett0 added a commit to jepett0/ydb that referenced this pull request Jun 17, 2024
KIKIMR-18302

Users pay differently for HDD and SSD storage. They create storage pools differentiated by the underlying storage kind for their databases. Moreover, they can specify the preferred storage kind for each column in a table (see [column groups](https://ydb.tech/docs/en/yql/reference/syntax/create_table#column-family) in the docs for the CREATE TABLE statement).

However, up until this PR they didn't know, how much storage was used on each of the storage pool kinds. (And we didn't have storage pool kinds quotas to disable writes to the database, which exceeded the limit on one of its storage pools.)

We would like to provide users with an aggregate of the disk space usage of the database so they can order additional disks before the space is physically depleted. This is done by aggregating the [by channel disk space usage statistics](https://github.com/ydb-platform/ydb/blob/7a673cf01feefbe95bf5e7396d9179a5f283aeba/ydb/core/protos/table_stats.proto#L57) that the SchemeShard receives from the data shards (see [TEvPeriodicTableStats](https://github.com/ydb-platform/ydb/blob/7a673cf01feefbe95bf5e7396d9179a5f283aeba/ydb/core/protos/tx_datashard.proto#L789)). Channels are mapped to the corresponding storage pool kinds via the information that the SchemeShard has about the database (in code databases are subdomains) and the storage pools it was created with. Aggregation is done on two levels: by tables and by database. Aggregate by the table path can be seen in the UI in the path description of the table under the Describe -> PathDescription -> TableStats -> StoragePools field. Aggregate by the database can be seen in the UI in the Describe -> PathDescription -> DomainDescription -> DiskSpaceUsage -> StoragePoolsUsage field.

In addition, we implemented "storage_pools_quotas" that the user can specify in the "DatabaseQuotas" section of the config of the database that the user would like to create. There are 3 parameters in each [storage pool quota](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L98):
- pool kind,
- hard quota (if any storage pool exceeds its hard quota, writes to the **whole** database (not just the storage pool that has exceeded the quota!) are restricted),
- soft quota (if all storage pools use less storage than the corresponding soft quota, then the database opens for writing again).

"storage_pools_quotas" can be used together with the existing ["data_size_hard_quota"](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L82) and ["data_size_soft_quota"](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/public/api/protos/ydb_cms.proto#L88) that do not differentiate between storage pools. Exceedance of __any__ hard quota (either the storage pool one, or the entire "data_size_hard_quota") disables writes to the database. To reenable writes, __all__ disk space usage (either the [storage pool one](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/core/tx/schemeshard/schemeshard_info_types.h#L1460), or the aggregated [TotalSize](https://github.com/jepett0/ydb/blob/a19c3b4dcc28fb1da6d04ecfb139ffdfe90c72fb/ydb/core/tx/schemeshard/schemeshard_info_types.h#L1452)) must be below the corresponding soft quota.

One important thing to note about the storage pools usage statistics is that it is delivered to the SchemeShard with a considerable delay (about 1 minute). This means that the storage pools usage will be checked against the storage pools quotas with a delay and some data can be written above the hard limit. (And the other way around too: deleting some data to open the database for writes will be noticed by the SchemeShard with a considerable delay (about 420 seconds in my tests with a default compaction policy, I don't know where this number comes from). This is due to the fact that the new data is stored in the LSM tree (I guess) and is written to the appropriate storage pool later, after compaction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants