Skip to content

SvPV_shrink_to_cur: include space for COW #23290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

richardleach
Copy link
Contributor

The SvPV_shrink_to_cur macro shrinks an allocation to SvCUR(sv) + 1, which does not include an additional byte for Copy-On-Write (COW).

GH#22116 - a902d92 - short-circuited constant folding on CONST OPs, as this should be unnecessary. However, Dave Mitchell noticed that it had the inadvertent effect of disabling COW on SVs holding UTF8 string literals (e.g. "\x{100}abcd").

When a CONST OP is created, Perl_ck_svconst should mark its SV as being COW-able. But SVs built via S_scan_const, when that has called SvPV_shrink_to_cur, have resulting SvLEN(sv) values that fail the SvCANCOW(sv) test. Previously, constant folding had the effect of copying the literal into a buffer large enough for COW.

This commit modifies SvPV_shrink_to_cur so that it accounts for an additional byte for COW.

Additionally, this macro will no longer try to shrink the buffer below PERL_STRLEN_NEW_MIN, since that is likely to have no effect.


  • This set of changes does not require a perldelta entry.

@richardleach richardleach requested a review from iabyn May 15, 2025 17:09
@iabyn
Copy link
Contributor

iabyn commented May 16, 2025

While this change seems sensible in general, it's changing the behaviour of an API macro: which makes me think it's not a suitable candidate for a last-minute change. Would it possible to have a more narrow fix which only affects utf8 literals, then apply the change in the PR after 5.42.0 is released?
Apart from that, I can confirm that this PR fixes the issue at hand, and also the performance regression which alerted me to the issue initially.

@jkeenan jkeenan added the defer-next-dev This PR should not be merged yet, but await the next development cycle label May 16, 2025
@richardleach
Copy link
Contributor Author

Would it possible to have a more narrow fix which only affects utf8 literals, then apply the change in the PR after 5.42.0 is released?

Yes, the short circuit check in Perl_op_convert_list can be made to check to see if the CONST OP's SV has IsCOW, which should do it. I'll swap that for an assertion at the start of the next dev cycle.

richardleach added a commit that referenced this pull request May 17, 2025
Currently, a CONST OP's SV will not have the `IsCOW` flag set if
the PV buffer was truncated such that it is too small to be COWed.
In which case, not short circuting `Perl_op_convert_list` causes
the OP to undergo constant folding, resulting in a CONST OP with
an SV that can participate in COW.

This means that the commit which introduced the short-circuit for
CONST OPs accidentally introduced a regression:
a902d92

This commit adds an additional check to ensure that short circuiting
does not happen when the CONST OP SV cannot be COWed. This is a
short term workaround given the proximity to the next stable release.
#23290 seems like the more
appropriate fix, but is a bigger change, so will be held until the
next development cycle.
sv.h Outdated
#define SvPV_shrink_to_cur(sv) STMT_START { \
const STRLEN _lEnGtH = SvCUR(sv) + 2; \
if (PERL_STRLEN_NEW_MIN < _lEnGtH) \
SvPV_renew(sv, _lEnGtH); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems odd to me. Consider this pseudocode example:

my $foo = "aaa" x 1000; # CUR = 3000, LEN = 3002

$foo = "a"; # CUR = 1, LEN = 3002

SvPV_shrink_to_cur($foo); # will do nothing because CUR < PERL_STRLEN_NEW_MIN

I think it should be looking at SvLEN, not SvCUR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Now that there's no rush to get this PR merged, I'll revisit it.

Similar SvCUR vs SvLEN considerations in #23111, where at least one of the callers does indeed set SvCUR to 0 before calling S_invlist_trim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to tie in ilmari's comment too: #23293 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the macro now checks SvLEN and the toke.c changes from 3dc75b6 have been added as a second, now related, commit.

I haven't changed the callers in pp_hot.c and pp_sys.c, since they are guarded by a 20 byte difference check and we haven't bottomed out if they need to be different and/or what a reasonable minimum saving looks like.

PS Not suggesting we think about using malloc_good_size et al here at this point. Or measuring the difference between "fastbins" and using that as the minimum saving.

The `SvPV_shrink_to_cur` macro shrinks an allocation to `SvCUR(sv) + 1`,
which does not include an additional byte for Copy-On-Write (COW).

GH#22116 - a902d92 - short-circuited
constant folding on CONST OPs, as this should be unnecessary. However,
Dave Mitchell noticed that it had the inadvertent effect of disabling
COW on SVs holding UTF8 string literals (e.g. `"\x{100}abcd"`).

When a CONST OP is created, `Perl_ck_svconst` should mark its SV as
being COW-able. But SVs built via `S_scan_const`, when that has
called `SvPV_shrink_to_cur`, have resulting `SvLEN(sv)` values that
fail the `SvCANCOW(sv)` test. Previously, constant folding had the
effect of copying the literal into a buffer large enough for COW.

This commit modifies `SvPV_shrink_to_cur` to: allocate an additional
byte to allow for subsequent COWing directly.

The macro has also been modified such that:
 * No reallocation will be attempted if the saving is unrealistically
   small, or otherwise no saving is likely to be achievable.
 * The intended saving is compared with SvLEN(sv), which enables checks
   at call sites to be simplified.
The previous commit modified the definition of `SvPV_shrink_to_cur` so that
it does not attempt to reallocate to make unrealistic savings. That made
some of the condition checks in toke.c redundant, so this commit removes
those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defer-next-dev This PR should not be merged yet, but await the next development cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants