Skip to content

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented Sep 15, 2021

This PR moves some of the block splitter stuff that was using too much stack space

I see no speed delta for levels 16-19 (have block splitter enabled by default) on enwik7 with gcc.

contrib/linux-kernel test
old: Maximum stack size: 2760
new: Maximum stack size: 1848

Todo: Fix fuzzer errors

@senhuang42 senhuang42 marked this pull request as draft September 15, 2021 20:28
@senhuang42 senhuang42 force-pushed the blocksplit_stack_reduce branch from 99ddad5 to e63fa39 Compare September 15, 2021 20:32
/* block splitter ctx */
if (ZSTD_CParams_useBlockSplitter(&params->cParams)) {
/* Silence -Wcast-align with cast to void* */
zc->blockSplitCtx = (ZSTD_blockSplitCtx*)(void*)ZSTD_cwksp_reserve_buffer(ws, sizeof(ZSTD_blockSplitCtx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Wcast-align has a point here. You'll need correct alignment here or certain platforms will fail.

This probably needs to go in the ZSTD_cwksp_reserve_object() space, or if that doesn't work just put it directly into the CCtx.

Copy link
Contributor Author

@senhuang42 senhuang42 Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I think this will just rest directly in the CCtx. If I want to save space when we're not using block splitting, a wksp object doesn't seem to work properly with the different cctx reuse cases, since objects are assumed to remain the same throughout compressions.

If I wanted to make it work somehow, it would have to be the same as the other objects in that they stay the same throughout different cctx re-uses. And if we have to do that anyways, we may as well just keep it in the cctx. Though, this does mean we will just have to eat the >1K additional heap cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, this does mean we will just have to eat the >1K additional heap cost.

That should be fine, thats much less than the total CCtx allocation. E.g. we have 500KB space for sequences, which is way overkill in most cases. So there is lower hanging fruit.

@senhuang42
Copy link
Contributor Author

Rebased on top of #2788. For review, just consider the last commit in the stack.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Before merging, you should rebase on top of dev, so you don't get any extra commits in the diff.

@senhuang42 senhuang42 force-pushed the blocksplit_stack_reduce branch from 7f0f526 to 1d8143c Compare September 23, 2021 04:02
@senhuang42 senhuang42 merged commit c7afbec into facebook:dev Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants