-
Notifications
You must be signed in to change notification settings - Fork 601
IV vs PV serialisation #18958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IV vs PV serialisation #18958
Changes from all commits
eaa259d
2cd2d84
8cc4f46
64ff1fb
ab341e1
e90c7df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -425,6 +425,145 @@ with C<..._flags>-suffixed variants. These expose a simple and consistent API | |
| to perform numerical or string comparison which is aware of operator | ||
| overloading. | ||
|
|
||
| =item * | ||
|
|
||
| Reading the string form of an integer value no longer sets the flag C<SVf_POK>. | ||
| The string form is still cached internally, and still re-read directly by the | ||
| macros C<SvPV(sv)> I<etc> (inline, without calling a C function). XS code that | ||
| already calls the APIs to get values will not be affected by this change. XS | ||
| code that accesses flags directly instead of using API calls to express its | ||
| intent I<might> break, but such code likely is already buggy if passed some | ||
| other values, such as floating point values or objects with string overloading. | ||
|
|
||
| This small change permits code (such as JSON serializers) to reliably determine | ||
| between | ||
|
|
||
| =over 4 | ||
|
|
||
| =item * | ||
|
|
||
| a value that was initially B<written> as an integer, but then B<read> as a string | ||
|
|
||
| my $answer = 42; | ||
| print "The answer is $answer\n"; | ||
|
|
||
| =item * | ||
|
|
||
| that same value that was initially B<written> as a string, but then B<read> as an integer | ||
|
|
||
| my $answer = "42"; | ||
| print "That doesn't look right\n" | ||
| unless $answer == 6 * 9; | ||
|
|
||
| =back | ||
|
|
||
| For the first case (originally written as an integer), we now have: | ||
|
|
||
| use Devel::Peek; | ||
| my $answer = 42; | ||
| Dump ($answer); | ||
| my $void = "$answer"; | ||
| print STDERR "\n"; | ||
| Dump($answer) | ||
|
|
||
|
|
||
| SV = IV(0x562538925778) at 0x562538925788 | ||
| REFCNT = 1 | ||
| FLAGS = (IOK,pIOK) | ||
| IV = 42 | ||
|
|
||
| SV = PVIV(0x5625389263c0) at 0x562538925788 | ||
| REFCNT = 1 | ||
| FLAGS = (IOK,pIOK,pPOK) | ||
| IV = 42 | ||
| PV = 0x562538919b50 "42"\0 | ||
| CUR = 2 | ||
| LEN = 10 | ||
|
|
||
| For the second (originally written as a string), we now have: | ||
|
|
||
| use Devel::Peek; | ||
| my $answer = "42"; | ||
| Dump ($answer); | ||
| my $void = $answer == 6 * 9; | ||
| print STDERR "\n"; | ||
| Dump($answer)' | ||
|
|
||
|
|
||
| SV = PV(0x5586ffe9bfb0) at 0x5586ffec0788 | ||
| REFCNT = 1 | ||
| FLAGS = (POK,IsCOW,pPOK) | ||
| PV = 0x5586ffee7fd0 "42"\0 | ||
| CUR = 2 | ||
| LEN = 10 | ||
| COW_REFCNT = 1 | ||
|
|
||
| SV = PVIV(0x5586ffec13c0) at 0x5586ffec0788 | ||
| REFCNT = 1 | ||
| FLAGS = (IOK,POK,IsCOW,pIOK,pPOK) | ||
| IV = 42 | ||
| PV = 0x5586ffee7fd0 "42"\0 | ||
| CUR = 2 | ||
| LEN = 10 | ||
| COW_REFCNT = 1 | ||
|
|
||
| (One can't rely on the presence or absence of the flag C<SVf_IsCOW> to | ||
| determine the history of operations on a scalar.) | ||
|
|
||
| Previously both cases would be indistinguishable, with all 4 flags set: | ||
|
|
||
| SV = PVIV(0x55d4d62edaf0) at 0x55d4d62f0930 | ||
| REFCNT = 1 | ||
| FLAGS = (IOK,POK,pIOK,pPOK) | ||
| IV = 42 | ||
| PV = 0x55d4d62e1740 "42"\0 | ||
| CUR = 2 | ||
| LEN = 10 | ||
|
|
||
| (and possibly C<SVf_IsCOW>, but not always) | ||
|
|
||
| This now means that if XS code I<really> needs to determine which form a value | ||
| was first written as, it should implement logic roughly | ||
|
|
||
| if (flags & SVf_IOK|SVf_NOK) && !(flags & SVf_POK) | ||
| serialize as number | ||
| else if (flags & SVf_POK) | ||
| serialize as string | ||
| else | ||
| the existing guesswork ... | ||
|
|
||
| Note that this doesn't cover "dualvars" - scalars that report different | ||
| values when asked for their string form or number form (such as C<$!>). | ||
| Most serialization formats cannot represent such duplicity. | ||
|
|
||
| I<The existing guesswork> remains because as well as dualvars, values might | ||
| be C<undef>, references, overloaded references, typeglobs and other things that | ||
| Perl itself can represent but do not map one-to-one into external formats, so | ||
| need some amount of approximation or encapsulation. | ||
|
|
||
| =item * | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still related here? This looks to be unrelated change from a different commit elsewhen
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you try hard enough you can dereference a I'm not confident that these are the only two cases, so didn't want to claim a definitive list.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems in particular like it ought to be a different section from the much-more-public original-types change. The audiences for these two things are different: C API callers on the one, Perl-maintainer Jedi wizards on the other.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, probably we can do better. But for a dev release perldelta, I didn't think we needed to get this perfect. (As in, I don't think that this should block this PR) |
||
|
|
||
| Memory for hash iterator state (C<struct xpvhv_aux>) is now allocated as part | ||
| of the hash body, instead of as part of the block of memory allocated for the | ||
| main hash array. | ||
|
|
||
| Nothing else changes - memory for this structure is still allocated only when | ||
| needed, is still accessed via the C<HvAUX()> macro, and the macro should only | ||
| be called when C<SvOOK()> is true. Hashes are still always of type C<SVt_PVHV>, | ||
| hash bodies are still allocated from arenas, and the address of the hash | ||
| doesn't change, because the address is the pointer to the head structure, which | ||
| never moves. | ||
|
|
||
| Internally, a second arena (the arena with index 1) is used to allocate larger | ||
| bodies with space for C<struct xpvhv_aux>, the body "upgraded", and the "head" | ||
| structure updated to reflect this (much the same way that the bodies of scalars | ||
| are upgraded). We already re-purpose arenas - arena with index 0 is used for | ||
| C<HE *>s. | ||
|
|
||
| None of this affects documented public XS interfaces. The only code changes are | ||
| in F<hv.c> and F<sv.c>. As the rest of the core itself uses these macros but | ||
| needed no changes, likely no code on CPAN will be affected either. | ||
|
|
||
| =back | ||
|
|
||
| =head1 Selected Bug Fixes | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -307,6 +307,30 @@ in it, you can use the following macros to check the type of SV you have. | |
| SvNOK(SV*) | ||
| SvPOK(SV*) | ||
|
|
||
| Be aware that retrieving the numeric value of an SV can set IOK or NOK | ||
| on that SV, even when the SV started as a string. Prior to Perl | ||
| 5.36.0 retrieving the string value of an integer could set POK, but | ||
| this can no longer occur. From 5.36.0 this can be used to distinguish | ||
| the original representation of an SV and is intended to make life | ||
| simpler for serializers: | ||
|
|
||
| /* references handled elsewhere */ | ||
| if (SvIsBOOL(sv)) { | ||
| /* originally boolean */ | ||
| ... | ||
| } | ||
| else if (SvPOK(sv)) { | ||
| /* originally a string */ | ||
| ... | ||
| } | ||
| else if (SvNIOK(sv)) { | ||
| /* originally numeric */ | ||
| ... | ||
| } | ||
| else { | ||
| /* something special or undef */ | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above: wrapping this up as SvORIGTYPE or some such would be amazing and simplify both the documentation and the lives of folks (like me!) who want to use this new hotness.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, agree, but I wasn't sure on exactly what, and I didn't see it as a blocker on this PR. |
||
|
|
||
| You can get and set the current length of the string stored in an SV with | ||
| the following macros: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1859,19 +1859,30 @@ scalar. | |
| #define SvPV_const(sv, len) SvPV_flags_const(sv, len, SV_GMAGIC) | ||
| #define SvPV_mutable(sv, len) SvPV_flags_mutable(sv, len, SV_GMAGIC) | ||
|
|
||
| /* This test is "is there a cached PV that we can use directly?" | ||
| * We can if | ||
| * a) SVf_POK is true and there's definitely no get magic on the scalar | ||
| * b) SVp_POK is true, there's no get magic, and we know that the cached PV | ||
| * came from an IV conversion. | ||
| * For the latter case, we don't set SVf_POK so that we can distinguish whether | ||
| * the value originated as a string or as an integer, before we cached the | ||
| * second representation. */ | ||
| #define SvPOK_or_cached_IV(sv) \ | ||
| (((SvFLAGS(sv) & (SVf_POK|SVs_GMG)) == SVf_POK) || ((SvFLAGS(sv) & (SVf_IOK|SVp_POK|SVs_GMG)) == (SVf_IOK|SVp_POK))) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In what cases testing for just I always thought that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| #define SvPV_flags(sv, len, flags) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? ((len = SvCUR(sv)), SvPVX(sv)) : sv_2pv_flags(sv, &len, flags)) | ||
| #define SvPV_flags_const(sv, len, flags) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? ((len = SvCUR(sv)), SvPVX_const(sv)) : \ | ||
| (const char*) sv_2pv_flags(sv, &len, (flags|SV_CONST_RETURN))) | ||
| #define SvPV_flags_const_nolen(sv, flags) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? SvPVX_const(sv) : \ | ||
| (const char*) sv_2pv_flags(sv, 0, (flags|SV_CONST_RETURN))) | ||
| #define SvPV_flags_mutable(sv, len, flags) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? ((len = SvCUR(sv)), SvPVX_mutable(sv)) : \ | ||
| sv_2pv_flags(sv, &len, (flags|SV_MUTABLE_RETURN))) | ||
|
|
||
|
|
@@ -1896,16 +1907,16 @@ scalar. | |
| : sv_pvn_force_flags(sv, &len, flags|SV_MUTABLE_RETURN)) | ||
|
|
||
| #define SvPV_nolen(sv) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? SvPVX(sv) : sv_2pv_flags(sv, 0, SV_GMAGIC)) | ||
|
|
||
| /* "_nomg" in these defines means no mg_get() */ | ||
| #define SvPV_nomg_nolen(sv) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? SvPVX(sv) : sv_2pv_flags(sv, 0, 0)) | ||
|
|
||
| #define SvPV_nolen_const(sv) \ | ||
| (SvPOK_nog(sv) \ | ||
| (SvPOK_or_cached_IV(sv) \ | ||
| ? SvPVX_const(sv) : sv_2pv_flags(sv, 0, SV_GMAGIC|SV_CONST_RETURN)) | ||
|
|
||
| #define SvPV_nomg(sv, len) SvPV_flags(sv, len, 0) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| #!./perl | ||
|
|
||
| BEGIN { | ||
| chdir 't' if -d 't'; | ||
| require './test.pl'; | ||
| set_up_inc('../lib'); | ||
| skip_all("need B, need full perl") if is_miniperl(); | ||
| } | ||
|
|
||
| # Tests the new documented mechanism for determining the original type | ||
| # of an SV. | ||
|
|
||
| plan tests => 4; | ||
| use strict; | ||
| use B qw(svref_2object SVf_IOK SVf_NOK SVf_POK); | ||
|
|
||
| my $x = 10; | ||
| my $xobj = svref_2object(\$x); | ||
| is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_IOK, "correct base flags on IV"); | ||
|
|
||
| my $y = $x . ""; | ||
|
|
||
| is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_IOK, "POK not set on IV used as string"); | ||
|
|
||
| $x = "10"; | ||
| is($xobj->FLAGS & (SVf_IOK | SVf_POK), SVf_POK, "correct base flags on PV"); | ||
|
|
||
| $y = $x + 10; | ||
|
|
||
| is($xobj->FLAGS & (SVf_IOK | SVf_POK), (SVf_IOK | SVf_POK), "POK still set on PV used as number"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there’s been discussion of this, but FWIW this might be bit nicer if it were wrapped up some new API call … maybe
SvORIGTYPE()? Then the delta could just mention that API call and maybe some of the flag-manipulation changes, with most of these other details in some sort of .pod document. This discussion may be of especial interest outside the scope of perldelta.It just seems like a lot of explanation for a changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding something like that would be useful, but I don't think it needs to hold up this particular PR. We'd verymuch like to get this change in before 5.35.9 (due in 3 days from now). Extending the XS-author API later would be a fine addition which we can do afterwards, perhaps as part of a wider look at "how we do type-like information".