Skip to content

replace various sv_catpv(sv,"cstr") calls with len counted calls #22652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 10, 2024

-perf reasons, even though I guess most of the code is rarely called since
they are error branches, do it anyways, it is only 1 more assembly op
(2-4 bytes) vs not including len, and skips strlen() overhead

-GDBM_File.xs
-'sv_catpv(sv, ": ");'
gdbm: Define error codes; provide the global $gdbm_errno variable. 8b8b122 10/11/2021 11:58:44 AM

regcomp.c
-'sv_catpv(substitute_parse, ")");'
-"sv_catpv(substitute_parse, SvPVX(this_sequence));" len is known, its an
an SV with contents gen by core, probably from regexp pl code, if it was
passing control codes to console on the regular, I assume that would've
been fixed long ago, therefore we not trying eliminate user's null chars
9d53c45 10/11/2012 11:49:31 PM PATCH: [perl #89774] multi-char fold + its fold in char class

-'const char overflow_msg[] = "Code point'
This was creating uncond a RW array on C stack from a RO global string,
then sometimes passing ptr to C stk alloced array, to sv_catpv()
-"sv_catpv(msg, prefix);" pass len to RO string
73b9584 8/20/2018 8:31:04 PM Move \p{user-defined} to core from utf8_heavy.pl

-mg.c
Revision: b35e4f8 12/14/2013 9:48:00 PM Fix HP-UX $! failure
-"sv_catpv(sv, UNKNOWN_ERRNO_MSG);" is a const literal

-perlio.c
-add comment about the unusual but correct stack array, its length can
be const computed so do it

-pp_ctl.c
46fc3d4 4/22/1997 8:00:00 AM [inseparable changes from match from perl-5.003_97g to perl-5.003_97h] -"sv_catpv(namesv, unixname);" seems to be left over after converting this
area of code from an unsafe "strcat()" design, use len since we know it

TODO: fill description here


  • This set of changes requires a perldelta entry, and it is included.
  • This set of changes requires a perldelta entry, and I need help writing it.
  • This set of changes does not require a perldelta entry.

-perf reasons, even though I guess most of the code is rarely called since
 they are error branches, do it anyways, it is only 1 more assembly op
 (2-4 bytes) vs not including len, and skips strlen() overhead

-GDBM_File.xs
-'sv_catpv(sv, ": ");'
gdbm: Define error codes; provide the global $gdbm_errno variable.
8b8b122 10/11/2021 11:58:44 AM

regcomp.c
-'sv_catpv(substitute_parse, ")");'
-"sv_catpv(substitute_parse, SvPVX(this_sequence));" len is known, its an
  an SV with contents gen by core, probably from regexp pl code, if it was
  passing control codes to console on the regular, I assume that would've
  been fixed long ago, therefore we not trying eliminate user's null chars
9d53c45 10/11/2012 11:49:31 PM
PATCH: [perl #89774] multi-char fold + its fold in char class

-'const char overflow_msg[] = "Code point'
 This was creating uncond a RW array on C stack from a RO global string,
 then sometimes passing ptr to C stk alloced array, to sv_catpv()
-"sv_catpv(msg, prefix);" pass len to RO string
73b9584 8/20/2018 8:31:04 PM
Move \p{user-defined} to core from utf8_heavy.pl

-mg.c
Revision: b35e4f8 12/14/2013 9:48:00 PM
Fix HP-UX $! failure
-"sv_catpv(sv, UNKNOWN_ERRNO_MSG);" is a const literal

-perlio.c
-add comment about the unusual but correct stack array, its length can
 be const computed so do it

-pp_ctl.c
46fc3d4 4/22/1997 8:00:00 AM
[inseparable changes from match from perl-5.003_97g to perl-5.003_97h]
-"sv_catpv(namesv, unixname);" seems to be left over after converting this
 area of code from an unsafe "strcat()" design, use len since we know it
@@ -14314,7 +14314,7 @@ S_handle_user_defined_property(pTHX_
s = e;
}
if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
sv_catpv(msg, overflow_msg);
sv_catpvs(msg, "Code point too large in \"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to taking a common value and making it into duplicate copies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overflow_msg could be changed to a define, and then this code could use the define. Alternatively cant these cases use sv_catpvn() and use sizeof on the constant string (or do i misremember that that works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as written, prohibits de-duping identical RO string C literals. "foo" token de-dupes in the linker. static const char foo [] = "foo";does not and is a violation of C if numeric pointer at static symbol foo in A.o, is equal to numeric pointer at static symbol foo in B.o. ISO C committee argues some people use abs pointers of static fns or static char arrays, as GUIDs or "plugin build numbers" in APIs, or session handles. Or separate "my objects" from "foreign objects", based on a callback fn ptr in that malloc object, and a static callback fn, that is identical src code and identical machine code, in 2 different .o files, with 2 copies of machine code and 2 different pointers. And each .o gets a global "destroy" event, chained somehow, then compares callback fn ptr in malloc object to their private static fn ptr symbol, if same, call "my destructor", else pass malloc obj back to the root API for some other dtor.

CC LTO can do some tricks, usually static failed to inline FNs, but moment you do "&" on a static fns, and static const char arrays, %99 chance, & of will be done on a static const char array, and that ptr to static gets passed to another FN, therefore LTO de-duping is not possible.

#define macro, lower case, or upper case token name, or literal "foo" token, is only way out of this in C.

in Makes more sense in the world of closed source, commercial .o files than FOSS world.

For example instead

@@ -10752,7 +10752,8 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth,
}
first_time = FALSE;

sv_catpv(substitute_parse, SvPVX(this_sequence));
sv_catpvn(substitute_parse,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this done better with sv_catsv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leont I am not so sure. This code is effectively using AV's and SV's for their "self cleanup" properties. As far as I could see these are "normal" SVPV's constructed by the regex compiler, and as such use sv_catpv() would just do a bunch of checks we know will be false. This is one of the few cases where using SvPVX() makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never know looking at interp code, if sv_catpv(substitute_parse, SvPVX(this_sequence)); is code rot, or a security/exploit fix or a unicode bug.

I have near future plans to add a "OPV" or "PVR" or "CST" or "PVO" or pascal string object/class/api to macro away/automate away all the redundant Newx() SAVEFREEPV() code and K&R "string objects" core is gather over time. Sane src code symbols/identifiers/packages don't exceed 1 console line. console format warnings dont go above 80*3. 128 bytes or 256 bytes on C stack then OOM croak is good enough. Perl outputs that don't accept varlen user input strings can't overflow from accidents.

(84c133a#diff-d481115f10d65d6967e0a678eb02cc6ccd14c19446926320ff697b15caef7ffaR2114)

Least used feature in Perl C, then and right now. but svf256 looks like a security feature against abuse long console strings. But perl already wont SEGV (i hope) with a 5.5 GB file path in a SVPV *..

What happened to Mr Mortal? was he too violent to lead XS clan and finally he was eliminated [unfinished joke]? SAVEFREEPV is trending now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants