Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -5995,6 +5995,7 @@ t/op/sub.t See if subroutines work
t/op/sub_lval.t See if lvalue subroutines work
t/op/substr.t See if substr works
t/op/substr_thr.t See if substr works in another thread
t/op/svflags.t See if POK is set as expected.
t/op/svleak.pl Test file for svleak.t
t/op/svleak.t See if stuff leaks SVs
t/op/switch.t See if switches (given/when) work
Expand Down
139 changes: 139 additions & 0 deletions pod/perldelta.pod
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,145 @@ with C<..._flags>-suffixed variants. These expose a simple and consistent API
to perform numerical or string comparison which is aware of operator
overloading.

=item *

Reading the string form of an integer value no longer sets the flag C<SVf_POK>.
The string form is still cached internally, and still re-read directly by the
macros C<SvPV(sv)> I<etc> (inline, without calling a C function). XS code that
already calls the APIs to get values will not be affected by this change. XS
code that accesses flags directly instead of using API calls to express its
intent I<might> break, but such code likely is already buggy if passed some
other values, such as floating point values or objects with string overloading.

This small change permits code (such as JSON serializers) to reliably determine
between

=over 4

=item *

a value that was initially B<written> as an integer, but then B<read> as a string

my $answer = 42;
print "The answer is $answer\n";

=item *

that same value that was initially B<written> as a string, but then B<read> as an integer

my $answer = "42";
print "That doesn't look right\n"
unless $answer == 6 * 9;

=back

For the first case (originally written as an integer), we now have:

use Devel::Peek;
my $answer = 42;
Dump ($answer);
my $void = "$answer";
print STDERR "\n";
Dump($answer)


SV = IV(0x562538925778) at 0x562538925788
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42

SV = PVIV(0x5625389263c0) at 0x562538925788
REFCNT = 1
FLAGS = (IOK,pIOK,pPOK)
IV = 42
PV = 0x562538919b50 "42"\0
CUR = 2
LEN = 10

For the second (originally written as a string), we now have:

use Devel::Peek;
my $answer = "42";
Dump ($answer);
my $void = $answer == 6 * 9;
print STDERR "\n";
Dump($answer)'


SV = PV(0x5586ffe9bfb0) at 0x5586ffec0788
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x5586ffee7fd0 "42"\0
CUR = 2
LEN = 10
COW_REFCNT = 1

SV = PVIV(0x5586ffec13c0) at 0x5586ffec0788
REFCNT = 1
FLAGS = (IOK,POK,IsCOW,pIOK,pPOK)
IV = 42
PV = 0x5586ffee7fd0 "42"\0
CUR = 2
LEN = 10
COW_REFCNT = 1

(One can't rely on the presence or absence of the flag C<SVf_IsCOW> to
determine the history of operations on a scalar.)

Previously both cases would be indistinguishable, with all 4 flags set:

SV = PVIV(0x55d4d62edaf0) at 0x55d4d62f0930
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 42
PV = 0x55d4d62e1740 "42"\0
CUR = 2
LEN = 10

(and possibly C<SVf_IsCOW>, but not always)

This now means that if XS code I<really> needs to determine which form a value
was first written as, it should implement logic roughly

if (flags & SVf_IOK|SVf_NOK) && !(flags & SVf_POK)
serialize as number
else if (flags & SVf_POK)
serialize as string
else
the existing guesswork ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there’s been discussion of this, but FWIW this might be bit nicer if it were wrapped up some new API call … maybe SvORIGTYPE()? Then the delta could just mention that API call and maybe some of the flag-manipulation changes, with most of these other details in some sort of .pod document. This discussion may be of especial interest outside the scope of perldelta.

It just seems like a lot of explanation for a changelog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding something like that would be useful, but I don't think it needs to hold up this particular PR. We'd verymuch like to get this change in before 5.35.9 (due in 3 days from now). Extending the XS-author API later would be a fine addition which we can do afterwards, perhaps as part of a wider look at "how we do type-like information".


Note that this doesn't cover "dualvars" - scalars that report different
values when asked for their string form or number form (such as C<$!>).
Most serialization formats cannot represent such duplicity.

I<The existing guesswork> remains because as well as dualvars, values might
be C<undef>, references, overloaded references, typeglobs and other things that
Perl itself can represent but do not map one-to-one into external formats, so
need some amount of approximation or encapsulation.

=item *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still related here? This looks to be unrelated change from a different commit elsewhen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you try hard enough you can dereference a Regexp as a scalar. (Assignable) LVALUEs are also scalars:

$ perl -MDevel::Peek -e '$a = \(substr $^X, 0, 1); Dump $$a'
SV = PVLV(0x20ae4e8) at 0x2085fb0
  REFCNT = 1
  FLAGS = (GMG,SMG)
  IV = 0
  NV = 0
  PV = 0
  MAGIC = 0x209d648
    MG_VIRTUAL = &PL_vtbl_substr
    MG_TYPE = PERL_MAGIC_substr(x)
  TYPE = x
  TARGOFF = 0
  TARGLEN = 1
  TARG = 0x20860e8
  FLAGS = 0
  SV = PV(0x20869c0) at 0x20860e8
    REFCNT = 2
    FLAGS = (POK,pPOK)
    PV = 0x209d6d0 "/usr/bin/perl"\0
    CUR = 13
    LEN = 15

I'm not confident that these are the only two cases, so didn't want to claim a definitive list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems in particular like it ought to be a different section from the much-more-public original-types change.

The audiences for these two things are different: C API callers on the one, Perl-maintainer Jedi wizards on the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably we can do better. But for a dev release perldelta, I didn't think we needed to get this perfect.

(As in, I don't think that this should block this PR)


Memory for hash iterator state (C<struct xpvhv_aux>) is now allocated as part
of the hash body, instead of as part of the block of memory allocated for the
main hash array.

Nothing else changes - memory for this structure is still allocated only when
needed, is still accessed via the C<HvAUX()> macro, and the macro should only
be called when C<SvOOK()> is true. Hashes are still always of type C<SVt_PVHV>,
hash bodies are still allocated from arenas, and the address of the hash
doesn't change, because the address is the pointer to the head structure, which
never moves.

Internally, a second arena (the arena with index 1) is used to allocate larger
bodies with space for C<struct xpvhv_aux>, the body "upgraded", and the "head"
structure updated to reflect this (much the same way that the bodies of scalars
are upgraded). We already re-purpose arenas - arena with index 0 is used for
C<HE *>s.

None of this affects documented public XS interfaces. The only code changes are
in F<hv.c> and F<sv.c>. As the rest of the core itself uses these macros but
needed no changes, likely no code on CPAN will be affected either.

=back

=head1 Selected Bug Fixes
Expand Down
24 changes: 24 additions & 0 deletions pod/perlguts.pod
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,30 @@ in it, you can use the following macros to check the type of SV you have.
SvNOK(SV*)
SvPOK(SV*)

Be aware that retrieving the numeric value of an SV can set IOK or NOK
on that SV, even when the SV started as a string. Prior to Perl
5.36.0 retrieving the string value of an integer could set POK, but
this can no longer occur. From 5.36.0 this can be used to distinguish
the original representation of an SV and is intended to make life
simpler for serializers:

/* references handled elsewhere */
if (SvIsBOOL(sv)) {
/* originally boolean */
...
}
else if (SvPOK(sv)) {
/* originally a string */
...
}
else if (SvNIOK(sv)) {
/* originally numeric */
...
}
else {
/* something special or undef */
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: wrapping this up as SvORIGTYPE or some such would be amazing and simplify both the documentation and the lives of folks (like me!) who want to use this new hotness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree, but I wasn't sure on exactly what, and I didn't see it as a blocker on this PR.


You can get and set the current length of the string stored in an SV with
the following macros:

Expand Down
23 changes: 22 additions & 1 deletion sv.c
Original file line number Diff line number Diff line change
Expand Up @@ -3309,7 +3309,22 @@ Perl_sv_2pv_flags(pTHX_ SV *const sv, STRLEN *const lp, const U32 flags)
Move(ptr, s, len, char);
s += len;
*s = '\0';
SvPOK_on(sv);
/* We used to call SvPOK_on(). Whilst this is fine for (most) Perl code,
it means that after this stringification is cached, there is no way
to distinguish between values originally assigned as $a = 42; and
$a = "42"; (or results of string operators vs numeric operators)
where the value has subsequently been used in the other sense
and had a value cached.
This (somewhat) hack means that we retain the cached stringification,
but don't set SVf_POK. Hence if a value is SVf_IOK|SVf_POK then it
originated as "42", whereas if it's SVf_IOK then it originated as 42.
(ignore SVp_IOK and SVp_POK)
The SvPV macros are now updated to recognise this specific case
(and that there isn't overloading or magic that could alter the
cached value) and so return the cached value immediately without
re-entering this function, getting back here to this block of code,
and repeating the same conversion. */
SvPOKp_on(sv);
}
else if (SvNOK(sv)) {
if (SvTYPE(sv) < SVt_PVNV)
Expand Down Expand Up @@ -4877,6 +4892,12 @@ Perl_sv_setsv_flags(pTHX_ SV *dsv, SV* ssv, const I32 flags)
SvIV_set(dsv, SvIVX(ssv));
if (sflags & SVf_IVisUV)
SvIsUV_on(dsv);
if ((sflags & SVf_IOK) && !(sflags & SVf_POK)) {
/* Source was SVf_IOK|SVp_IOK|SVp_POK but not SVf_POK, meaning
a value set as an integer and later stringified. So mark
destination the same: */
SvFLAGS(dsv) &= ~SVf_POK;
}
}
SvFLAGS(dsv) |= sflags & (SVf_IOK|SVp_IOK|SVf_NOK|SVp_NOK|SVf_UTF8);
{
Expand Down
25 changes: 18 additions & 7 deletions sv.h
Original file line number Diff line number Diff line change
Expand Up @@ -1859,19 +1859,30 @@ scalar.
#define SvPV_const(sv, len) SvPV_flags_const(sv, len, SV_GMAGIC)
#define SvPV_mutable(sv, len) SvPV_flags_mutable(sv, len, SV_GMAGIC)

/* This test is "is there a cached PV that we can use directly?"
* We can if
* a) SVf_POK is true and there's definitely no get magic on the scalar
* b) SVp_POK is true, there's no get magic, and we know that the cached PV
* came from an IV conversion.
* For the latter case, we don't set SVf_POK so that we can distinguish whether
* the value originated as a string or as an integer, before we cached the
* second representation. */
#define SvPOK_or_cached_IV(sv) \
(((SvFLAGS(sv) & (SVf_POK|SVs_GMG)) == SVf_POK) || ((SvFLAGS(sv) & (SVf_IOK|SVp_POK|SVs_GMG)) == (SVf_IOK|SVp_POK)))
Copy link
Member

@xenu xenu Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases testing for just SVp_POK|SVs_GM == SVp_POK wouldn't be enough?

I always thought that SVp_[INP]OK flags were supposed to mean "this scalar has a string/numeric representation" while SV_[INP]OK mean "this scalar has a canonical (lossless) string/numeric representation". If my understanding is correct, SvPV should have been testing for SVp_POK instead of SVf_POK even without the changes introduced in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SvPV() has been testing only SVf_POK for a very long time, since 463ee0b in 1993, if we do change it, I don't think it belongs in this PR.


#define SvPV_flags(sv, len, flags) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? ((len = SvCUR(sv)), SvPVX(sv)) : sv_2pv_flags(sv, &len, flags))
#define SvPV_flags_const(sv, len, flags) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? ((len = SvCUR(sv)), SvPVX_const(sv)) : \
(const char*) sv_2pv_flags(sv, &len, (flags|SV_CONST_RETURN)))
#define SvPV_flags_const_nolen(sv, flags) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? SvPVX_const(sv) : \
(const char*) sv_2pv_flags(sv, 0, (flags|SV_CONST_RETURN)))
#define SvPV_flags_mutable(sv, len, flags) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? ((len = SvCUR(sv)), SvPVX_mutable(sv)) : \
sv_2pv_flags(sv, &len, (flags|SV_MUTABLE_RETURN)))

Expand All @@ -1896,16 +1907,16 @@ scalar.
: sv_pvn_force_flags(sv, &len, flags|SV_MUTABLE_RETURN))

#define SvPV_nolen(sv) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? SvPVX(sv) : sv_2pv_flags(sv, 0, SV_GMAGIC))

/* "_nomg" in these defines means no mg_get() */
#define SvPV_nomg_nolen(sv) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? SvPVX(sv) : sv_2pv_flags(sv, 0, 0))

#define SvPV_nolen_const(sv) \
(SvPOK_nog(sv) \
(SvPOK_or_cached_IV(sv) \
? SvPVX_const(sv) : sv_2pv_flags(sv, 0, SV_GMAGIC|SV_CONST_RETURN))

#define SvPV_nomg(sv, len) SvPV_flags(sv, len, 0)
Expand Down
30 changes: 30 additions & 0 deletions t/op/svflags.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!./perl

BEGIN {
chdir 't' if -d 't';
require './test.pl';
set_up_inc('../lib');
skip_all("need B, need full perl") if is_miniperl();
}

# Tests the new documented mechanism for determining the original type
# of an SV.

plan tests => 4;
use strict;
use B qw(svref_2object SVf_IOK SVf_NOK SVf_POK);

my $x = 10;
my $xobj = svref_2object(\$x);
is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_IOK, "correct base flags on IV");

my $y = $x . "";

is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_IOK, "POK not set on IV used as string");

$x = "10";
is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_POK, "correct base flags on PV");

$y = $x + 10;

is($xobj->FLAGS & (SVf_IOK | SVf_POK), (SVf_IOK | SVf_POK), "POK still set on PV used as number");
1 change: 1 addition & 0 deletions t/porting/known_pod_issues.dat
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ pod/perlandroid.pod Verbatim line length including indents exceeds 78 by 3
pod/perlbook.pod Verbatim line length including indents exceeds 78 by 1
pod/perldebguts.pod Verbatim line length including indents exceeds 78 by 24
pod/perldebtut.pod Verbatim line length including indents exceeds 78 by 2
pod/perldelta.pod line containing nothing but whitespace in paragraph 6
pod/perldtrace.pod Verbatim line length including indents exceeds 78 by 7
pod/perlgit.pod ? Should you be using F<...> or maybe L<...> instead of 3
pod/perlgit.pod Verbatim line length including indents exceeds 78 by 1
Expand Down