Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTERNAL: Add sampling method to random get in set #825

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

jeesup0103
Copy link

@jeesup0103 jeesup0103 commented Feb 19, 2025

๐Ÿ”— Related Issue

โŒจ๏ธ What I did

  • Set์˜ random get ๋™์ž‘์—์„œ selection sampling ๋ฐฉ๋ฒ•์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • do_set_elem_traverse_sampling()
  • Set ์ „์ฒด ํฌ๊ธฐ์˜ 10% ์ดํ•˜์˜ ํฌ๊ธฐ๋ฅผ ์š”์ฒญํ•˜๋ฉด hash table ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Set ์ „์ฒด ํฌ๊ธฐ์˜ 10% ์ดˆ๊ณผ์˜ ํฌ๊ธฐ๋ฅผ ์š”์ฒญํ•˜๋ฉด sampling ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
  • count๊ฐœ์˜ element๋ฅผ ๋ชจ๋‘ ์„ ํƒํ•œ ๋’ค shuffle ํ•ฉ๋‹ˆ๋‹ค.
  • ์š”์†Œ๋ฅผ ๋งŒ๋‚ฌ์„ ๋•Œ ์„ ํƒ ํ™•๋ฅ  = (๋‚จ์€ ํ•„์š” ์š”์†Œ ๊ฐœ์ˆ˜) / (๋‚จ์€ ํƒ์ƒ‰ ์š”์†Œ ๊ฐœ์ˆ˜)

์„ ํƒ ํ™•๋ฅ  ์˜ˆ์‹œ

์ด 10๊ฐœ์—์„œ 3๊ฐœ๋ฅผ ์š”์ฒญํ•˜๋Š” ์ƒํ™ฉ์—์„œ 0~9 Index๋ฅผ ํƒ์ƒ‰ํ•˜๋ฉด์„œ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
- 0๋ฒˆ index์—์„œ ๋ฝ‘์„ ํ™•๋ฅ  = 3/10

- 1๋ฒˆ index์—์„œ ๋ฝ‘์„ ํ™•๋ฅ 
   - 0๋ฒˆ index์—์„œ ๋ฝ‘์•˜์„ ๋•Œ,     ํ˜„์žฌ ๋ฝ‘์„ ํ™•๋ฅ  = 2/9
   - 0๋ฒˆ index์—์„œ ๋ฝ‘์ง€ ์•Š์•˜์„ ๋•Œ, ํ˜„์žฌ ๋ฝ‘์„ ํ™•๋ฅ  = 3/9

...

3๊ฐœ๋ฅผ ๋‹ค ๋ฝ‘๊ฒŒ ๋˜๋ฉด ๋ฐ”๋กœ ์ข…๋ฃŒํ•ฉ๋‹ˆ๋‹ค.

@jhpark816 jhpark816 requested a review from ing-eoking February 19, 2025 06:09
@jeesup0103 jeesup0103 marked this pull request as ready for review February 19, 2025 06:25
@jhpark816 jhpark816 requested a review from namsic February 20, 2025 08:34
@jhpark816
Copy link
Collaborator

@ing-eoking @namsic
๋ณธ ์ด์Šˆ๋Š” ๋น ๋ฅด๊ฒŒ ๋งˆ๋ฌด๋ฆฌํ•ด์•ผ ํ•  ๊ฒƒ ๊ฐ™๊ณ , ๋ฆฌ๋ทฐ ์ง„ํ–‰ํ•ด ์ฃผ์„ธ์š”.

๋งŒ์•ฝ, ๋Šฆ์–ด์ง„๋‹ค๋ฉด,
๊ธฐ์กด ๋ฐฉ์‹๊ณผ random ๋ฐฉ์‹์„ ์„ ํƒํ•˜๋Š” ์กฐ๊ฑด์„ if (1)๋กœ ์ˆ˜์ •ํ•˜์—ฌ ๊ธฐ์กด ๋ฐฉ์‹์ด ์„ ํƒ๋˜๋„๋ก ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

    if (count >= info->ccnt || count == 0) { /* Return all */
        fcnt = do_set_elem_traverse_dfs(info, info->root, count, delete, elem_array);
    } else { /* Return some */
        fcnt = do_set_elem_traverse_rand(info, count, delete, elem_array);
    }

@ing-eoking ing-eoking self-assigned this Feb 20, 2025
@ing-eoking ing-eoking force-pushed the sampling-set branch 3 times, most recently from 6216aa1 to d85e176 Compare February 21, 2025 02:01
Copy link
Collaborator

@jhpark816 jhpark816 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๋ฆฌ๋ทฐ ์™„๋ฃŒ

} else if (node->hcnt[hidx] > 0) {
set_elem_item *elem = node->htab[hidx];
while (elem != NULL) {
if (rand() % *remain < count - fcnt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

์งˆ๋ฌธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๊ฒŒ ๋˜๋Š” ๋ฐ, ์ตœ์ข…์ ์œผ๋กœ count ๊ฐœ๋ฅผ ์„ ํƒํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ณด์žฅ๋˜๋‚˜์š”?

์ฐธ๊ณ  ์‚ฌํ•ญ์œผ๋กœ, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ด„ํ˜ธ๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ๋‚˜์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

((rand() % *remain) < (count - fcnt))

Copy link
Collaborator

@ing-eoking ing-eoking Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๊ฒŒ ๋˜๋Š” ๋ฐ, ์ตœ์ข…์ ์œผ๋กœ count ๊ฐœ๋ฅผ ์„ ํƒํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ณด์žฅ๋˜๋‚˜์š”?

elem์„ ์ง€๋‚˜๊ฐˆ ๋•Œ๋งˆ๋‹ค remain ๊ฐ’์€ ๋ฌด์กฐ๊ฑด 1์”ฉ ๊ฐ์†Œํ•˜๋ฉฐ, ๊ฒฐ๊ตญ remain ๊ฐ’์ด ๋‚จ์€ ๋ฝ‘์•„์•ผ ํ•  ๊ฐœ์ˆ˜(count-fcnt)์— ๊ฐ€๊นŒ์›Œ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

remain ๊ฐ’์ด count - fcnt ๊ณผ ๊ฐ™์•„์ง€๋ฉด, rand() % remain ์˜ ๊ฒฐ๊ณผ๊ฐ€ ํ•ญ์ƒ remain๋ณด๋‹ค ์ž‘์€ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋˜์–ด elem์ด count ๊ฐœ์ˆ˜๋งŒํผ ๋  ๋•Œ๊นŒ์ง€ 100% ํ™•๋ฅ ๋กœ elem์ด ๋ฝ‘ํžˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ remain์ด 0์— ๊ฐ€๊นŒ์›Œ์งˆ์ˆ˜๋ก ํ™•๋ฅ ๊ฐ’์ด 100%๊นŒ์ง€ ์ฆ๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— count ๋งŒํผ ๋ฝ‘ํžˆ๋Š” ๊ฒƒ์ด ๋ณด์žฅ๋ฉ๋‹ˆ๋‹ค.

Copy link
Collaborator

@jhpark816 jhpark816 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๋ฆฌ๋ทฐ ์™„๋ฃŒ

if ((rand() % *remain) < (count - fcnt)) {
elem->refcount++;
elem_array[fcnt] = elem;
fcnt++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

์ด ์œ„์น˜์— ์•„๋ž˜ ์กฐ๊ฑด๋งŒ ์žˆ์œผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

if (fcnt >= count) break;

}
*remain -= 1;
elem = elem->next;
if (fcnt >= count || *remain == 0) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*remain == 0 ์กฐ๊ฑด์€ ์—†์–ด๋„ ๋ฉ๋‹ˆ๋‹ค.

if (fcnt >= count || *remain == 0) break;
}
}
if (fcnt >= count || *remain == 0) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

์—ฌ๊ธฐ๋„ fcnt์— ๋Œ€ํ•œ ์กฐ๊ฑด๋งŒ ์žˆ์œผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

@@ -532,6 +532,36 @@ static int do_set_elem_traverse_dfs(set_meta_info *info, set_hash_node *node,
return fcnt;
}

static int do_set_elem_traverse_sampling(set_meta_info *info, set_hash_node *node,
const uint32_t count, uint32_t *remain,
set_elem_item **elem_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remain๊ณผ count ์ธ์ž์˜ ์ˆœ์„œ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ณ ,
remain์„ ํฌ์ธํ„ฐ ํƒ€์ž…์ด ์•„๋‹Œ uint32_t ํƒ€์ž…์œผ๋กœ ํ•˜๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค.
๋Œ€์‹ , ํ•˜์œ„ ๋…ธ๋“œ์— ๋Œ€ํ•œ traverse๊ฐ€ ๋๋‚˜๋ฉด, remain์€ ํ•˜์œ„ ๋…ธ๋“œ์— ์žˆ๋Š” element ๊ฐœ์ˆ˜๋งŒํผ ๊ฐ์†Œํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

} else { /* Use sampling */
uint32_t remain = info->ccnt;
fcnt = do_set_elem_traverse_sampling(info, info->root,
count, &remain, elem_array);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remain ํƒ€์ž…์„ ๋ณ€๊ฒฝํ•˜๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์ด ํ˜ธ์ถœํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

        fcnt = do_set_elem_traverse_sampling(info, info->root, info->ccnt,
                                             count, elem_array);

@jhpark816 jhpark816 merged commit 21d8bb8 into naver:develop Feb 21, 2025
1 check passed
@jeesup0103 jeesup0103 deleted the sampling-set branch February 21, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants