Skip to content

Conversation

@MBkkt
Copy link
Contributor

@MBkkt MBkkt commented Mar 28, 2025

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

  • Fix debug log message for prefixed vector index
  • Use real table size in prefixed vector index
  • Use O(1) restore for vector index construction and fix it for prefixed vector index

MBkkt added 4 commits March 28, 2025 07:50
…ents + childs from single shard), because constant used instead of real value for table size
@MBkkt MBkkt requested a review from a team as a code owner March 28, 2025 07:55
@github-actions
Copy link

🟢 2025-03-28 07:56:06 UTC The validation of the Pull Request description is successful.

@github-actions
Copy link

github-actions bot commented Mar 28, 2025

2025-03-28 07:58:44 UTC Pre-commit check linux-x86_64-release-asan for 7f7acc9 has started.
2025-03-28 07:59:16 UTC Artifacts will be uploaded here
2025-03-28 08:02:50 UTC ya make is running...
🟡 2025-03-28 09:22:03 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12222 12098 0 73 17 34

2025-03-28 09:23:25 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-28 09:35:55 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
172 (only retried tests) 130 0 5 7 30

2025-03-28 09:36:04 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-03-28 09:47:55 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67 (only retried tests) 29 0 4 6 28

🟢 2025-03-28 09:48:05 UTC Build successful.
🟢 2025-03-28 09:48:37 UTC ydbd size 3.8 GiB changed* by +57.1 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: af288e5 merge: 7f7acc9 diff diff %
ydbd size 4 118 509 880 Bytes 4 118 568 312 Bytes +57.1 KiB +0.001%
ydbd stripped size 1 423 529 416 Bytes 1 423 546 856 Bytes +17.0 KiB +0.001%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Mar 28, 2025

2025-03-28 07:59:08 UTC Pre-commit check linux-x86_64-relwithdebinfo for 7f7acc9 has started.
2025-03-28 07:59:23 UTC Artifacts will be uploaded here
2025-03-28 08:02:19 UTC ya make is running...
🟡 2025-03-28 09:08:09 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
19764 18401 0 5 1243 115

2025-03-28 09:09:49 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-28 09:22:55 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
168 (only retried tests) 63 0 0 1 104

🟢 2025-03-28 09:23:04 UTC Build successful.
🟢 2025-03-28 09:23:28 UTC ydbd size 2.2 GiB changed* by +30.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: af288e5 merge: 7f7acc9 diff diff %
ydbd size 2 346 856 544 Bytes 2 346 888 184 Bytes +30.9 KiB +0.001%
ydbd stripped size 490 756 096 Bytes 490 756 032 Bytes -64 Bytes -0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@MBkkt MBkkt requested review from CyberROFL and kungasc March 28, 2025 09:17
@MBkkt MBkkt self-assigned this Mar 28, 2025
const auto shardIndex = buildInfo.Shards.at(shardIdx).Index;
ev->Record.SetChild(buildInfo.KMeans.ChildBegin + (1 + TableSize) * shardIndex);
// about 2 * TableSize see comment in PrefixIndexDone
ev->Record.SetChild(buildInfo.KMeans.ChildBegin + (2 * buildInfo.KMeans.TableSize) * shardIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это чтобы номера не пересекались нужно? Выглядит не очень надёжно :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да все так.
Ну не очень очевидно как по другому сделать.

const auto shardIndex = buildInfo.Shards.at(shardIdx).Index;
ev->Record.SetChild(buildInfo.KMeans.ChildBegin + (1 + TableSize) * shardIndex);
// about 2 * TableSize see comment in PrefixIndexDone
ev->Record.SetChild(buildInfo.KMeans.ChildBegin + (2 * buildInfo.KMeans.TableSize) * shardIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не очень понимаю что тут происходит

Вот мы хотим построить индекс, запускаем на каждом шарде prefix kmeans

Запускаем последовательно и TableSize это число строк в шардах слева?

Что значит Y_ASSERT(buildInfo.KMeans.Level == 2), почему уровень второй?

Copy link
Contributor Author

@MBkkt MBkkt Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не очень понимаю что тут происходит
Вот мы хотим построить индекс, запускаем на каждом шарде prefix kmeans
Запускаем последовательно и TableSize это число строк в шардах слева?

Комментарий в другом месте подробный

Что значит Y_ASSERT(buildInfo.KMeans.Level == 2), почему уровень второй?

У нас построение выглядит так.
1ый уровень: построили вторичный индекс для префикса (в covered column запихнули embedding + vector index covered column)
2ой уровень: запустили читая из этого индекса n prefix_kmeans, они будут работать параллельно и важно чтобы их id не пересеклись. К сожалению мы заранее не знаем сколько уникальных префиксов всего и не знаем сколько их на конкретном шарде

NTableIndex::TClusterId childBegin, NTableIndex::TClusterId child,
ui32 state, ui64 tableSize) {
Level = level;
ParentBegin = parentBegin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Могу только догадываться что эти числа используются для нумерации кластеров, но явно не хватает какого-то комментария с тем картинкой как они используются и что значат

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну они не в первый раз возникли)
Вообще идея такая:
У нас есть процесс построения, по сути это цикл в цикле в цикле развернутый в конечный автомат.

циклы выглядят так

for (;level < levels; ++level) {
  for (parent = parentBegin; parent < parentEnd; ++parent) {
    do something
  }
}

parentEnd ~ childBegin (там функции есть, по ним видно)
child/childBegin нужны для удобства, в целом их посчитать можно из Parent+ParentEnd

Бтв
prefixed vector index levels setting транслируется в levels+1 (собственно + 1 уровень на вторичный индекс безусловно)

@MBkkt MBkkt merged commit 0610a4d into ydb-platform:main Mar 28, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants