-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-39721: Fix constness of members of tok_state struct. #18600
Conversation
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment: /* XXX: constify members. */ This patch addresses that. In the tok_state struct: * end and start were non-const but could be made const * str and input were const but should have been non-const Changes to support this include: * decode_str() now returns a char * since it is allocated. * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a new char * for an allocate string instead of reusing the input const char *. * PyTokenizer_Get() and tok_get() now take const char ** arguments. * Various local vars are const or non-const accordingly. I was able to remove five casts that cast away constness.
Codecov Report
@@ Coverage Diff @@
## master #18600 +/- ##
===========================================
- Coverage 82.11% 79.46% -2.66%
===========================================
Files 1956 384 -1572
Lines 589413 169260 -420153
Branches 44458 0 -44458
===========================================
- Hits 484015 134494 -349521
+ Misses 95748 34766 -60982
+ Partials 9650 0 -9650
Continue to review full report at Codecov.
|
I can't for the life of me figure out why codecov thinks that the code coverage has gone down. Does anyone know why? Is there some specific file in the results that I should look in? |
@@ -60,8 +60,8 @@ struct tok_state { | |||
PyObject *decoding_readline; /* open(...).readline */ | |||
PyObject *decoding_buffer; | |||
const char* enc; /* Encoding for the current str. */ | |||
const char* str; | |||
const char* input; /* Tokenizer's newline translated copy of the string. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason these can't be const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tok->input
can't be const because it's the allocated input, and then gets freed in PyTokenizer__Free
.
tok->str
can't be const because it can be returned from decode_str
, which returns a non-const string. If I inlined decode_str
into PyTokenizer_FromString
, the only place that uses it, then I think I could make tok->str
const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode_str
only returns a non-const string in this change, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I changed decode_str
to return char *
instead of const char *
because it can return the result from translate_newlines
. Also, the result from decode_str
is assigned to tok->buf
which is non-const.
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:
This patch addresses that.
In the tok_state struct:
* end and start were non-const but could be made const
* str and input were const but should have been non-const
Changes to support this include:
* decode_str() now returns a char * since it is allocated.
* PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
new char * for an allocate string instead of reusing the input
const char *.
* PyTokenizer_Get() and tok_get() now take const char ** arguments.
* Various local vars are const or non-const accordingly.
I was able to remove five casts that cast away constness.
https://bugs.python.org/issue39721