Skip to content

->decoded_content should decode application/json, etc [rt.cpan.org #82963] #72

Closed
@oalders

Description

@oalders

Migrated from rt.cpan.org#82963 (status was 'open')

Requestors:

From mauke@cpan.org on 2013-01-25 22:13:06:

Currently $response->decoded_content will decode the bytes of e.g.
"Content-type: text/json; charset=UTF-8" messages because it knows
"text/*" is ... text.

It would be nice if this could be extended to also decode the text for
content-types such as "application/json; charset=UTF-8",
"application/javascript; charset=ISO-8859-15", etc.

From skaufman@cpan.org on 2013-07-21 18:05:19:

On Fri Jan 25 17:13:06 2013, MAUKE wrote:
> Currently $response->decoded_content will decode the bytes of e.g.
> "Content-type: text/json; charset=UTF-8" messages because it knows
> "text/*" is ... text.
> 
> It would be nice if this could be extended to also decode the text for
> content-types such as "application/json; charset=UTF-8",
> "application/javascript; charset=ISO-8859-15", etc.

Bump, just ran into the same issue after a few hours.
in HTTP::Headers->content_is_text,
shouldn't the presence of charset in the content-type imply that the content is characters, ie text?

From swong@cpan.org on 2013-08-13 09:27:23:

Second on this.

When I say decode, I know what I am doing - currently there is no way to force it.

  $response->decoded_content(charset => 'utf-8')

Adding (charset_strict => 1, raise_error => 1) doesn't help.

Better yet, the content type I get is
  Content-Type: application/json; charset=UTF-8
Maybe content_is_text() should returns true if the charset is present in the content-type header?

From bbyrd@cpan.org on 2014-08-27 03:12:50:

Third.  Currently, the code says:

if ($self->content_is_text || (my $is_xml = $self->content_is_xml)) {

Examples where LWP currently breaks include:

application/json
application/yaml
application/x-yaml
application/pdf
application/* (that isn't +xml)

The Content-Type really shouldn't matter.  If the Content-Type is "pork/beans; charset=UTF-8", it should still be decoded.

If the remote agent broadcasted a charset, it's telling us that it had encoded that data with that character set.  We shouldn't care if the data inside the onion is text, audio, application-specific, some proprietary format, whatever.

Please remove this 'if' line.  It's a pretty intelligent interface, so it would be a waste of code to have other folks design their own decoding interface just because of this restriction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions