Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-52551: Fix encoding issues in strftime() #125193

Merged
merged 5 commits into from
Oct 17, 2024

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Oct 9, 2024

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time.

  • Characters not encodable in the current locale are now acceptable in the format string.
  • Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated.
  • Embedded null character no longer terminates the format string.

Closes #52551, #78662 and #124531.

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.
@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-52551: Fix encoding issues in strftime() gh-52551, gh-78662, gh-124531: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-52551, gh-78662, gh-124531: Fix encoding issues in strftime() gh-124531: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-124531: Fix encoding issues in strftime() gh-78662: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-78662: Fix encoding issues in strftime() gh-52551: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka
Copy link
Member Author

@vstinner, could you please review this PR?

newformat, timetuple, NULL);
Py_DECREF(newformat);
}
if (0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if (0) is weird. Why not adding a label after _PyUnicodeWriter_Dealloc() and using goto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require changing more code. Renaming label Done to Error (there are many goto Done in not changed code) and adding new Done label. The if (0) idiom is weird, but used in many places.

}
if (newformat != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to exit early: if (newformat == NULL) goto ....

By the way, only the _PyUnicodeWriter_Finish() code path can fail, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require adding more labels. The flow would look more complicated.

Modules/_datetimemodule.c Show resolved Hide resolved
Modules/_datetimemodule.c Show resolved Hide resolved
Modules/_datetimemodule.c Show resolved Hide resolved
format = PyUnicode_EncodeLocale(format_arg, "surrogateescape");
if (format == NULL)
format_size = PyUnicode_GET_LENGTH(format_arg);
format = PyMem_Malloc((format_size + 1)*sizeof(time_char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add an explicit integer overflow check: fail with PyErr_NoMemory() on overflow.

* will be ahead of time...
*/
while (1) {
outbuf = (time_char *)PyMem_Realloc(outbuf, bufsize*sizeof(time_char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also add an explicit integer overflow check here.

Py_DECREF(unicode);
}

j = i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should declare the j variable here, and I suggest to rename it to start.

}
format[fmtlen++] = (char)c;
}
if (fmtlen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite big. You might move the code handling format in a sub-function (the whole if block).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be inconvenient, because this code changes several outer variables.: outbuf and bufsize.

@serhiy-storchaka
Copy link
Member Author

I addressed all comments. Perhaps it is now more difficult to review.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left minor coding style suggestions that you can ignore.

Thanks for the exhaustive added tests, it's very helpful.

Comment on lines 825 to 826
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));

@@ -776,27 +776,100 @@ the C library strftime function.\n"
#endif

static PyObject *
time_strftime(PyObject *module, PyObject *args)
strftime1(time_char **outbuf, size_t *bufsize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you might rename the function to time_strftime1() to avoid any conflict with a potential strftime1() name from the C library.

@serhiy-storchaka
Copy link
Member Author

Thank you for your review @vstinner.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) October 17, 2024 15:19
@serhiy-storchaka serhiy-storchaka merged commit ad3eac1 into python:main Oct 17, 2024
37 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3eac1963a5f195ef9b2c1dbb5e44fa3cce4c72 3.13

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3eac1963a5f195ef9b2c1dbb5e44fa3cce4c72 3.12

@serhiy-storchaka serhiy-storchaka deleted the strftime branch October 17, 2024 16:18
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 17, 2024
…5193)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.
(cherry picked from commit ad3eac1)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 17, 2024
Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.

(cherry picked from commit ad3eac1)
@bedevere-app
Copy link

bedevere-app bot commented Oct 17, 2024

GH-125657 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 17, 2024
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.12 bug and security fixes label Oct 17, 2024
@serhiy-storchaka serhiy-storchaka removed their assignment Oct 17, 2024
serhiy-storchaka added a commit that referenced this pull request Oct 17, 2024
…5657)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also gh-78662 and gh-124531.

(cherry picked from commit ad3eac1)
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 17, 2024
…5193) (pythonGH-125657)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.

(cherry picked from commit 08ccbb9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit ad3eac1)
serhiy-storchaka added a commit that referenced this pull request Oct 17, 2024
…5657) (GH-125661)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also gh-78662 and gh-124531.

(cherry picked from commit 08ccbb9)
(cherry picked from commit ad3eac1)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@vstinner
Copy link
Member

Congrats! That's a nice change. Using Unicode instead of bytes for the internal buffer is a good move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants