Skip to content

replace various sv_catpv(sv,"cstr") calls with len counted calls #22652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ext/GDBM_File/GDBM_File.pm
Original file line number Diff line number Diff line change
Expand Up @@ -733,7 +733,7 @@ require XSLoader;
);

# This module isn't dual life, so no need for dev version numbers.
$VERSION = '1.24';
$VERSION = '1.25';

our $gdbm_errno;

Expand Down
2 changes: 1 addition & 1 deletion ext/GDBM_File/GDBM_File.xs
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ get_gdbm_errno(pTHX_ IV idx, SV *sv)
if (gdbm_check_syserr(gdbm_errno)) {
SV *val = get_sv("!", 0);
if (val) {
sv_catpv(sv, ": ");
sv_catpvs(sv, ": ");
sv_catsv(sv, val);
}
}
Expand Down
2 changes: 1 addition & 1 deletion mg.c
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ S_fixup_errno_string(pTHX_ SV* sv)
assert(SvOK(sv));

if(strEQ(SvPVX(sv), "")) {
sv_catpv(sv, UNKNOWN_ERRNO_MSG);
sv_catpvs(sv, UNKNOWN_ERRNO_MSG);
}
}

Expand Down
5 changes: 3 additions & 2 deletions perlio.c
Original file line number Diff line number Diff line change
Expand Up @@ -5466,6 +5466,7 @@ PerlIO_tmpfile_flags(int imode)
f = PerlIO_fdopen(fd, "w+b");
#elif ! defined(OS2)
int fd = -1;
/* Perl_my_mkostemp_cloexec() writes to this buf */
char tempname[] = "/tmp/PerlIO_XXXXXX";
const char * const tmpdir = TAINTING_get ? NULL : PerlEnv_getenv("TMPDIR");
SV * sv = NULL;
Expand All @@ -5474,7 +5475,7 @@ PerlIO_tmpfile_flags(int imode)
if (tmpdir && *tmpdir) {
/* if TMPDIR is set and not empty, we try that first */
sv = newSVpv(tmpdir, 0);
sv_catpv(sv, tempname + 4);
sv_catpvn(sv, tempname + 4, C_ARRAY_LENGTH(tempname)-4);
fd = Perl_my_mkostemp_cloexec(SvPVX(sv), imode | O_VMS_DELETEONCLOSE);
}
if (fd < 0) {
Expand All @@ -5486,7 +5487,7 @@ PerlIO_tmpfile_flags(int imode)
if (fd < 0) {
/* Try cwd */
sv = newSVpvs(".");
sv_catpv(sv, tempname + 4);
sv_catpvn(sv, tempname + 4, C_ARRAY_LENGTH(tempname)-4);
fd = Perl_my_mkostemp_cloexec(SvPVX(sv), imode | O_VMS_DELETEONCLOSE);
}
umask(old_umask);
Expand Down
2 changes: 1 addition & 1 deletion pp_ctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -4960,7 +4960,7 @@ S_require_file(pTHX_ SV *sv)
== NULL)
continue;
sv_setpv(namesv, unixdir);
sv_catpv(namesv, unixname);
sv_catpvn(namesv, unixname, unixlen);
#else
/* The equivalent of
Perl_sv_setpvf(aTHX_ namesv, "%s/%s", dir, name);
Expand Down
23 changes: 14 additions & 9 deletions regcomp.c
Original file line number Diff line number Diff line change
Expand Up @@ -5320,7 +5320,7 @@ S_grok_bslash_N(pTHX_ RExC_state_t *pRExC_state,

substitute_parse = newSVpvs("?:");
sv_catsv(substitute_parse, value_sv);
sv_catpv(substitute_parse, ")");
sv_catpvs(substitute_parse, ")");

/* The value should already be native, so no need to convert on EBCDIC
* platforms.*/
Expand Down Expand Up @@ -10752,7 +10752,8 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth,
}
first_time = FALSE;

sv_catpv(substitute_parse, SvPVX(this_sequence));
sv_catpvn(substitute_parse,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this done better with sv_catsv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leont I am not so sure. This code is effectively using AV's and SV's for their "self cleanup" properties. As far as I could see these are "normal" SVPV's constructed by the regex compiler, and as such use sv_catpv() would just do a bunch of checks we know will be false. This is one of the few cases where using SvPVX() makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never know looking at interp code, if sv_catpv(substitute_parse, SvPVX(this_sequence)); is code rot, or a security/exploit fix or a unicode bug.

I have near future plans to add a "OPV" or "PVR" or "CST" or "PVO" or pascal string object/class/api to macro away/automate away all the redundant Newx() SAVEFREEPV() code and K&R "string objects" core is gather over time. Sane src code symbols/identifiers/packages don't exceed 1 console line. console format warnings dont go above 80*3. 128 bytes or 256 bytes on C stack then OOM croak is good enough. Perl outputs that don't accept varlen user input strings can't overflow from accidents.

(84c133a#diff-d481115f10d65d6967e0a678eb02cc6ccd14c19446926320ff697b15caef7ffaR2114)

Least used feature in Perl C, then and right now. but svf256 looks like a security feature against abuse long console strings. But perl already wont SEGV (i hope) with a 5.5 GB file path in a SVPV *..

What happened to Mr Mortal? was he too violent to lead XS clan and finally he was eliminated [unfinished joke]? SAVEFREEPV is trending now.

SvPVX(this_sequence), SvCUR(this_sequence));
}
}
}
Expand Down Expand Up @@ -14222,7 +14223,6 @@ S_handle_user_defined_property(pTHX_

const char * s0 = string; /* Points to first byte in the current line
being parsed in 'string' */
const char overflow_msg[] = "Code point too large in \"";
SV* running_definition = NULL;

PERL_ARGS_ASSERT_HANDLE_USER_DEFINED_PROPERTY;
Expand Down Expand Up @@ -14279,7 +14279,7 @@ S_handle_user_defined_property(pTHX_
s = e;
}
if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
sv_catpv(msg, overflow_msg);
sv_catpvs(msg, "Code point too large in \"");
Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
UTF8fARG(is_contents_utf8, s - s0, s0));
sv_catpvs(msg, "\"");
Expand Down Expand Up @@ -14314,7 +14314,7 @@ S_handle_user_defined_property(pTHX_
s = e;
}
if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
sv_catpv(msg, overflow_msg);
sv_catpvs(msg, "Code point too large in \"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to taking a common value and making it into duplicate copies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overflow_msg could be changed to a define, and then this code could use the define. Alternatively cant these cases use sv_catpvn() and use sizeof on the constant string (or do i misremember that that works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as written, prohibits de-duping identical RO string C literals. "foo" token de-dupes in the linker. static const char foo [] = "foo";does not and is a violation of C if numeric pointer at static symbol foo in A.o, is equal to numeric pointer at static symbol foo in B.o. ISO C committee argues some people use abs pointers of static fns or static char arrays, as GUIDs or "plugin build numbers" in APIs, or session handles. Or separate "my objects" from "foreign objects", based on a callback fn ptr in that malloc object, and a static callback fn, that is identical src code and identical machine code, in 2 different .o files, with 2 copies of machine code and 2 different pointers. And each .o gets a global "destroy" event, chained somehow, then compares callback fn ptr in malloc object to their private static fn ptr symbol, if same, call "my destructor", else pass malloc obj back to the root API for some other dtor.

CC LTO can do some tricks, usually static failed to inline FNs, but moment you do "&" on a static fns, and static const char arrays, %99 chance, & of will be done on a static const char array, and that ptr to static gets passed to another FN, therefore LTO de-duping is not possible.

#define macro, lower case, or upper case token name, or literal "foo" token, is only way out of this in C.

in Makes more sense in the world of closed source, commercial .o files than FOSS world.

For example instead

Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
UTF8fARG(is_contents_utf8, s - s0, s0));
sv_catpvs(msg, "\"");
Expand Down Expand Up @@ -15929,12 +15929,17 @@ S_parse_uniprop_string(pTHX_

append_name_to_msg:
{
const char * prefix = (runtime && level == 0) ? " \\p{" : " \"";
const char * suffix = (runtime && level == 0) ? "}" : "\"";
bool is_root = runtime && level == 0;
const char * prefix = is_root ? " \\p{" : " \"";
Size_t prefixl = is_root ? STRLENs(" \\p{") : STRLENs(" \"");
const char * suffix;
Size_t suffixl;

sv_catpv(msg, prefix);
sv_catpvn(msg, prefix, prefixl);
Perl_sv_catpvf(aTHX_ msg, "%" UTF8f, UTF8fARG(is_utf8, name_len, name));
sv_catpv(msg, suffix);
suffix = is_root ? "}" : "\"";
suffixl = is_root ? STRLENs("}") : STRLENs("\"");
sv_catpvn(msg, suffix, suffixl);
}

return NULL;
Expand Down