-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-29979: Rewrite cgi.parse_multipart to make it consistent with FieldStorage #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@PierreQuentel, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rhettinger, @orsenthil and @collinw to be potential reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PierreQuentel - thank you for the patch. I've reviewed and have some minor comments/ enhancement requests.
Please address them and we can get this in soon.
Doc/library/cgi.rst
Outdated
field names, each value is a list of values for that field. For non-file | ||
fields, the value is a list of strings. | ||
|
||
This is easy to use but not much good if you are expecting megabytes to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is internally using 'FiedStorage', is this comment still applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7b398f1, I removed the paragraph.
Lib/cgi.py
Outdated
|
||
return partdict | ||
|
||
boundary = pdict['boundary'].decode('ascii') # cf. RFC 2046 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specific information from RFC 2046
might be desirable.
RFC 2026, Section 5.1
The "multipart" boundary delimiters and header fields are always represented as 7bit US-ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7b398f1
that content-type is the raw, unparsed contents of the content-type | ||
header. | ||
|
||
XXX This does not parse nested multipart parts -- use FieldStorage for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is using FieldStorage, a nested multipart parsing example in test would be a great addition. Please see if this can be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orsenthil
Thanks for the comments.
I have very little information about nested multiparts, I don't think they are used very much. Could we postpone this specific point for a later PR ? For the moment I want to add the support of "application/json" requests, which is more important IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the updated changes. Yes, the test for multipart parsing could come next (not in this PR).
I will commit this today. Thanks ! @PierreQuentel .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, thank you !
I have fixed the issues detected by CI checks.
Remove trailing whitespaces
Remove trailing whitespace
bpo-29979's PR python/cpython#991 changed cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its `pdict` parameter (which is documented as "dictionary containing other parameters of the content type header"). The PR seems to have some issues: * The change seems to conflate `pdict` with an environment dict, probably because the test passes a dict called `env` as `pdict`. * 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type header. `pdict` is intended to pass the multipart boundary [1]. * In a CGI environment the length of the request body is in the CONTENT_LENGTH variable, anyway [2]. Typo? * FieldStorage doesn't require a Content-Length header anyway! It's potionally used to enforce cgi.maxlen, which limits the requests size (because this is the layer for that, right?). #1012 deals with bpo-29979's backward-incompatible change by passing the HTTP `Content-Length` header as the 'CONTENT-LENGTH' member of `pdict` when the header exists. The problem is that in the Twisted context, unlike the CGI context (where the web server is responsible for removing any transfer encodings), that header won't necessarily exist, for example when a chunked transfer encoding is used by the client. This commit does the simplest possible thing: synthesize the content length based on what was actually received. [1]: https://tools.ietf.org/html/rfc7578#section-4.1 [2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
bpo-29979's PR python/cpython#991 changed cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its `pdict` parameter (which is documented as "dictionary containing other parameters of the content type header"). The PR seems to have some issues: * The change seems to conflate `pdict` with an environment dict, probably because the test passes a dict called `env` as `pdict`. * 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type header. `pdict` is intended to pass the multipart boundary [1]. * In a CGI environment the length of the request body is in the CONTENT_LENGTH variable, anyway [2]. Typo? * FieldStorage doesn't require a Content-Length header anyway! It's potionally used to enforce cgi.maxlen, which limits the requests size (because this is the layer for that, right?). #1012 dealt with bpo-29979's backward-incompatible change by passing the HTTP `Content-Length` header as the 'CONTENT-LENGTH' member of `pdict` when the header exists. When the header isn't given, Request skips parsing the request content. The problem is that in the Twisted context, unlike the CGI context (where the web server is responsible for removing any transfer encodings), that header won't necessarily exist. For example, when a chunked transfer encoding is used by the client. This commit does the simplest possible thing: synthesize the content length based on what was actually received. [1]: https://tools.ietf.org/html/rfc7578#section-4.1 [2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
bpo-29979's PR python/cpython#991 changed cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its `pdict` parameter (which is documented as "dictionary containing other parameters of the content type header"). The PR seems to have some issues: * The change seems to conflate `pdict` with an environment dict, probably because the test passes a dict called `env` as `pdict`. * 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type header. `pdict` is intended to pass the multipart boundary [1]. * In a CGI environment the length of the request body is in the CONTENT_LENGTH variable, anyway [2]. Typo? * FieldStorage doesn't require a Content-Length header anyway! It's potionally used to enforce cgi.maxlen, which limits the requests size (because this is the layer for that, right?). #1012 dealt with bpo-29979's backward-incompatible change by passing the HTTP `Content-Length` header as the 'CONTENT-LENGTH' member of `pdict` when the header exists. When the header isn't given, Request skips parsing the request content. The problem is that in the Twisted context, unlike the CGI context (where the web server is responsible for removing any transfer encodings), that header won't necessarily exist. For example, when a chunked transfer encoding is used by the client. This commit does the simplest possible thing: synthesize the content length based on what was actually received. [1]: https://tools.ietf.org/html/rfc7578#section-4.1 [2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
The Pull Request simplifies the code of
cgi.parse_multipart()
by reusing theFieldStorage
class, and makes it consistent withFieldStorage
results : the values for non-file fields becomes a list of strings, instead of a list of bytes.