Skip to content

bpo-29979: Rewrite cgi.parse_multipart to make it consistent with FieldStorage #991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2017
Merged

bpo-29979: Rewrite cgi.parse_multipart to make it consistent with FieldStorage #991

merged 8 commits into from
May 8, 2017

Conversation

PierreQuentel
Copy link
Contributor

The Pull Request simplifies the code of cgi.parse_multipart() by reusing the FieldStorage class, and makes it consistent with FieldStorage results : the values for non-file fields becomes a list of strings, instead of a list of bytes.

@mention-bot
Copy link

@PierreQuentel, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rhettinger, @orsenthil and @collinw to be potential reviewers.

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PierreQuentel - thank you for the patch. I've reviewed and have some minor comments/ enhancement requests.

Please address them and we can get this in soon.

field names, each value is a list of values for that field. For non-file
fields, the value is a list of strings.

This is easy to use but not much good if you are expecting megabytes to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is internally using 'FiedStorage', is this comment still applicable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7b398f1, I removed the paragraph.

Lib/cgi.py Outdated

return partdict

boundary = pdict['boundary'].decode('ascii') # cf. RFC 2046
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specific information from RFC 2046 might be desirable.

RFC 2026, Section 5.1

The "multipart" boundary delimiters and  header fields are always represented as 7bit US-ASCII.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7b398f1

that content-type is the raw, unparsed contents of the content-type
header.

XXX This does not parse nested multipart parts -- use FieldStorage for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is using FieldStorage, a nested multipart parsing example in test would be a great addition. Please see if this can be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orsenthil
Thanks for the comments.
I have very little information about nested multiparts, I don't think they are used very much. Could we postpone this specific point for a later PR ? For the moment I want to add the support of "application/json" requests, which is more important IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the updated changes. Yes, the test for multipart parsing could come next (not in this PR).

I will commit this today. Thanks ! @PierreQuentel .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, thank you !
I have fixed the issues detected by CI checks.

@orsenthil orsenthil self-assigned this May 3, 2017
@orsenthil orsenthil merged commit cc3fa20 into python:master May 8, 2017
twm added a commit to twisted/twisted that referenced this pull request Dec 16, 2019
bpo-29979's PR python/cpython#991 changed
cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its
`pdict` parameter (which is documented as "dictionary containing other
parameters of the content type header"). The PR seems to have some
issues:

* The change seems to conflate `pdict` with an environment dict,
  probably because the test passes a dict called `env` as `pdict`.
* 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type
  header. `pdict` is intended to pass the multipart boundary [1].
* In a CGI environment the length of the request body is in the
  CONTENT_LENGTH variable, anyway [2]. Typo?
* FieldStorage doesn't require a Content-Length header anyway! It's
  potionally used to enforce cgi.maxlen, which limits the requests size
  (because this is the layer for that, right?).

#1012 deals with bpo-29979's
backward-incompatible change by passing the HTTP `Content-Length` header
as the 'CONTENT-LENGTH' member of `pdict` when the header exists.

The problem is that in the Twisted context, unlike the CGI context
(where the web server is responsible for removing any transfer encodings),
that header won't necessarily exist, for example when a chunked transfer
encoding is used by the client.

This commit does the simplest possible thing: synthesize the content
length based on what was actually received.

[1]: https://tools.ietf.org/html/rfc7578#section-4.1
[2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
twm added a commit to twisted/twisted that referenced this pull request Dec 16, 2019
bpo-29979's PR python/cpython#991 changed
cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its
`pdict` parameter (which is documented as "dictionary containing other
parameters of the content type header"). The PR seems to have some
issues:

* The change seems to conflate `pdict` with an environment dict,
  probably because the test passes a dict called `env` as `pdict`.
* 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type
  header. `pdict` is intended to pass the multipart boundary [1].
* In a CGI environment the length of the request body is in the
  CONTENT_LENGTH variable, anyway [2]. Typo?
* FieldStorage doesn't require a Content-Length header anyway! It's
  potionally used to enforce cgi.maxlen, which limits the requests size
  (because this is the layer for that, right?).

#1012 dealt with bpo-29979's
backward-incompatible change by passing the HTTP `Content-Length` header
as the 'CONTENT-LENGTH' member of `pdict` when the header exists. When
the header isn't given, Request skips parsing the request content.

The problem is that in the Twisted context, unlike the CGI context
(where the web server is responsible for removing any transfer encodings),
that header won't necessarily exist. For example, when a chunked transfer
encoding is used by the client.

This commit does the simplest possible thing: synthesize the content
length based on what was actually received.

[1]: https://tools.ietf.org/html/rfc7578#section-4.1
[2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
rodrigc pushed a commit to twisted/twisted that referenced this pull request Mar 28, 2020
bpo-29979's PR python/cpython#991 changed
cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its
`pdict` parameter (which is documented as "dictionary containing other
parameters of the content type header"). The PR seems to have some
issues:

* The change seems to conflate `pdict` with an environment dict,
  probably because the test passes a dict called `env` as `pdict`.
* 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type
  header. `pdict` is intended to pass the multipart boundary [1].
* In a CGI environment the length of the request body is in the
  CONTENT_LENGTH variable, anyway [2]. Typo?
* FieldStorage doesn't require a Content-Length header anyway! It's
  potionally used to enforce cgi.maxlen, which limits the requests size
  (because this is the layer for that, right?).

#1012 dealt with bpo-29979's
backward-incompatible change by passing the HTTP `Content-Length` header
as the 'CONTENT-LENGTH' member of `pdict` when the header exists. When
the header isn't given, Request skips parsing the request content.

The problem is that in the Twisted context, unlike the CGI context
(where the web server is responsible for removing any transfer encodings),
that header won't necessarily exist. For example, when a chunked transfer
encoding is used by the client.

This commit does the simplest possible thing: synthesize the content
length based on what was actually received.

[1]: https://tools.ietf.org/html/rfc7578#section-4.1
[2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants