Skip to content
This repository was archived by the owner on Nov 30, 2024. It is now read-only.

Fix invalid byte sequence on EncodedString#split #172

Merged
merged 1 commit into from
Feb 9, 2015

Conversation

bf4
Copy link
Contributor

@bf4 bf4 commented Feb 8, 2015

@bf4 bf4 force-pushed the fix_encoded_string_split_argument_error branch 3 times, most recently from a07480f to d9645c8 Compare February 8, 2015 02:27
@JonRowe
Copy link
Member

JonRowe commented Feb 8, 2015

So this makes me a little uncomfortable given you are assuming that the bad bytes can be scrubbed by converting to UTF-8 and back, if this is an incompatible encoding with UTF-8 then won't this destroy the original encoding?

Also can you remove the whitespace changes from the PR

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

So, the whitespace changes look sorta terrible in the unified diff, but if you split it or look at it alone, you'll see I'm actually fixing inconsistent whitespace (and some spelling, but that's unrelated to your comment).

I'm going to push a better fix on top of this momentarily.

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

Weird, that passed in https://travis-ci.org/rspec/rspec-support/builds/49911552

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

If this looks good I"ll rebase down to one commit.

@bf4 bf4 force-pushed the fix_encoded_string_split_argument_error branch from 52b1f5d to 8260a59 Compare February 8, 2015 06:31
@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

Ok, I think this is it.

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

I'm not sure what to do about the JRuby failure, if it's expected behavior, or a bug in JRuby. It is introduced in this PR.

@JonRowe
Copy link
Member

JonRowe commented Feb 8, 2015

screen shot 2015-02-08 at 21 35 56

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

I mean, it's not totally a failure, just a different behavior. The é\x80 results in \xC3\xA9\x80 Where \xCE\xA9 is equivalent to the é. If I remove the egrave, then the failure really is just that JRuby is leaving the \x80 / 128.chr alone for some reason. Still, probably a bug.

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
   rvm ruby-1.9.2-p330,ruby-1.9.3-p551,ruby-2.0.0-p598,ruby-2.1.5,ruby-2.2.0,jruby-1.7.18,rbx-2.2.2 do \
   ruby -e 'p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external, __ENCODING__] << "\x80".force_encoding("US-ASCII").chars.map{|char| char.valid_encoding? ? char : "?" }.join.encode("UTF-8")'

 for version in 1.9 2.0; do \
   export JRUBY_OPTS="-Xcompat.version=${version}" ; \
   LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
   rvm jruby-1.7.18 do \
   ruby -e 'p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external, __ENCODING__] << "\x80".force_encoding("US-ASCII").chars.map{|char| char.valid_encoding? ? char : "?" }.join.encode("UTF-8")';
 done
["1.9.2", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.0.0", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.1.5", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.2.0", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]
["2.1.0", "rbx", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]
["2.0.0", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]

@bf4 bf4 force-pushed the fix_encoded_string_split_argument_error branch from 8260a59 to 125634b Compare February 8, 2015 15:59
@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

@JonRowe @myronmarston Are y'all ok with this?

@JonRowe
Copy link
Member

JonRowe commented Feb 8, 2015

So the test does nothing on JRuby? I think it should be pending if thats the case, that way we can have one spec, use the :pending metadata and we'll be alerted when JRuby fixes the issue.

# https://twitter.com/nalsh/status/553413844685438976
#
# For example, given:
# "\x80".force_encoding("Emacs-Mule").encode(:invalid => :replace).bytes.to_a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the rest of this file this line needs one more " " added at the start (so indented by two spaces not 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@JonRowe
Copy link
Member

JonRowe commented Feb 8, 2015

Sorry nitpicking over the whitespace again because it should either be all fixed or none :P

@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

@JonRowe Not at all. re: the JRuby spec, you think I should remove the 'JRUBY' commit and add :pending => RSpec::Support::Ruby.jruby? with a comment link to the bug?

@JonRowe
Copy link
Member

JonRowe commented Feb 8, 2015

you think I should remove the 'JRUBY' commit and add :pending => RSpec::Support::Ruby.jruby? with a comment link to the bug?

Yes.

Map string char with invalid encoding to '?'
Format identical string expectation to read easier

Refs:
- rspec/rspec-core#1760
- via rspec#134

Set to pending for JRuby, opened issue
jruby/jruby#2580
that JRuby is the only Ruby that returns "\x80"
in place of "?"
@bf4 bf4 force-pushed the fix_encoded_string_split_argument_error branch from 125634b to 4bd437a Compare February 8, 2015 23:51
@bf4
Copy link
Contributor Author

bf4 commented Feb 8, 2015

Ok, I think I fixed the whitespace and it should be good. At some point though, I start expecting to find I missed something and I'm past that point :)

JonRowe added a commit that referenced this pull request Feb 9, 2015
Fix invalid byte sequence on EncodedString#split
@JonRowe JonRowe merged commit 06c72f9 into rspec:master Feb 9, 2015
JonRowe added a commit that referenced this pull request Feb 9, 2015
[skip ci]
JonRowe added a commit that referenced this pull request Feb 9, 2015
Fix invalid byte sequence on EncodedString#split
JonRowe added a commit that referenced this pull request Feb 9, 2015
[skip ci]
@JonRowe
Copy link
Member

JonRowe commented Feb 9, 2015

Picked to 3-2-maintenance

@bf4
Copy link
Contributor Author

bf4 commented Feb 9, 2015

great! back to rspec-core now

@bf4 bf4 deleted the fix_encoded_string_split_argument_error branch March 19, 2015 15:02
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this pull request Oct 6, 2021
…oded_string_split_argument_error

Fix invalid byte sequence on EncodedString#split

---
This commit was imported from rspec/rspec-support@846bdbe.
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this pull request Oct 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants