Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39721: Fix constness of members of tok_state struct. #18600

Merged
merged 1 commit into from
Feb 28, 2020
Merged

bpo-39721: Fix constness of members of tok_state struct. #18600

merged 1 commit into from
Feb 28, 2020

Conversation

petdance
Copy link
Contributor

@petdance petdance commented Feb 22, 2020

The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

/* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
* end and start were non-const but could be made const
* str and input were const but should have been non-const

Changes to support this include:
* decode_str() now returns a char * since it is allocated.
* PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
new char * for an allocate string instead of reusing the input
const char *.
* PyTokenizer_Get() and tok_get() now take const char ** arguments.
* Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.

https://bugs.python.org/issue39721

The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
@codecov
Copy link

codecov bot commented Feb 22, 2020

Codecov Report

Merging #18600 into master will decrease coverage by 2.65%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #18600       +/-   ##
===========================================
- Coverage   82.11%   79.46%    -2.66%     
===========================================
  Files        1956      384     -1572     
  Lines      589413   169260   -420153     
  Branches    44458        0    -44458     
===========================================
- Hits       484015   134494   -349521     
+ Misses      95748    34766    -60982     
+ Partials     9650        0     -9650     
Impacted Files Coverage Δ
Modules/_decimal/libmpdec/umodarith.h 80.76% <0.00%> (-19.24%) ⬇️
Modules/clinic/spwdmodule.c.h 58.33% <0.00%> (-3.21%) ⬇️
Modules/clinic/resource.c.h 78.57% <0.00%> (-2.68%) ⬇️
Modules/clinic/_pickle.c.h 62.40% <0.00%> (-2.33%) ⬇️
Modules/clinic/signalmodule.c.h 71.51% <0.00%> (-2.32%) ⬇️
Python/clinic/marshal.c.h 77.77% <0.00%> (-2.23%) ⬇️
Modules/clinic/_weakref.c.h 77.77% <0.00%> (-2.23%) ⬇️
Modules/clinic/_queuemodule.c.h 78.08% <0.00%> (-2.17%) ⬇️
Modules/_multiprocessing/clinic/posixshmem.c.h 66.66% <0.00%> (-2.09%) ⬇️
Objects/clinic/moduleobject.c.h 78.94% <0.00%> (-2.01%) ⬇️
... and 1839 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a025d4c...0b7a775. Read the comment docs.

@petdance
Copy link
Contributor Author

I can't for the life of me figure out why codecov thinks that the code coverage has gone down. Does anyone know why? Is there some specific file in the results that I should look in?

@@ -60,8 +60,8 @@ struct tok_state {
PyObject *decoding_readline; /* open(...).readline */
PyObject *decoding_buffer;
const char* enc; /* Encoding for the current str. */
const char* str;
const char* input; /* Tokenizer's newline translated copy of the string. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason these can't be const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tok->input can't be const because it's the allocated input, and then gets freed in PyTokenizer__Free.

tok->str can't be const because it can be returned from decode_str, which returns a non-const string. If I inlined decode_str into PyTokenizer_FromString, the only place that uses it, then I think I could make tok->str const.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_str only returns a non-const string in this change, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I changed decode_str to return char * instead of const char * because it can return the result from translate_newlines. Also, the result from decode_str is assigned to tok->buf which is non-const.

@benjaminp benjaminp merged commit 384f3c5 into python:master Feb 28, 2020
@petdance petdance deleted the bpo-39721 branch February 28, 2020 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants