Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align the count parameter of splitn with other languages #979

Merged

Conversation

shepmaster
Copy link
Member

@Kimundi
Copy link
Member

Kimundi commented Mar 15, 2015

See also rust-lang/rust#14899

@shepmaster
Copy link
Member Author

/cc @aturon or @alexcrichton . This proposes to change currently-stable methods in a backwards-incompatible manner.

@alexcrichton
Copy link
Member

I'm somewhat ambivalent on this one way or the other. I find the splitn(0, ",") case a little surprising where 0 is the number of returned matches (e.g. when I splitn I expect to get the remainder back at some point) but otherwise I agree that split(2, ",") looks nicer if it's actually yielding two elements.

I would personally vote for just breaking code without jumping through hoops to nudge crates to update.

@aturon aturon self-assigned this Mar 19, 2015
@BurntSushi
Copy link
Member

FWIW, the regex crate is using the behavior being recommended here (n specifies the number of items returned). I wrote it this way because it seems more intuitive to me. However, I would be in favor of changing it to whatever is decided by this RFC.

@aturon
Copy link
Member

aturon commented Mar 24, 2015

I thought I had commented on this RFC a week ago, but my comment seems to be gone!

I definitely agree that the proposed semantics is the more intuitive one -- I've mistakenly assumed these semantics in the past, and seen other people do the same.

@alexcrichton makes a good point about the 0 case; we would probably want to panic (i.e. treat that as a contract violation). But I don't think that's a deal breaker or anything.

My bigger concern is: how can we roll this out in the short time before beta without silently breaking a lot of code? @alexcrichton, I lack your confidence that we can just change the semantics, but maybe you can elaborate on how you see that working out?

Otherwise, we might need to consider some multi-step deprecation (assuming we want to stick with the current names), which would be a bit painful and time consuming.

@iliekturtles
Copy link

I'm in favor of the change. The current functionality loses information: there is no way to preserve the tail without doing a full split+rejoin of the tail. I also find the proposed functionality more natural and have explicitly relied on it a number of times in C# whereas I've had to work around JavaScript's functionality (see below).

In mostly valid Rust (I can't find a join method in the std library), the current functionality makes maintaining the tail difficult:

let a: Vec<_> = "a,b,c"
    .split(',')
    .iter()
    .take(1)
    .chain([a.iter().skip(1).join(',')].iter())
    .collect();
assert_eq!(a, ["a", "b,c"]);

With the proposed functionality the tail is maintained and can easily be trimmed with splitn(n+1).take(n).

let a: Vec<_> = "a,b,c".splitn(2, ',').collect();
assert_eq!(a, ["a", "b,c"]);

let n = 2
let a: Vec<_> = "a,b,c".splitn(n - 1, ',').take(n).collect();
assert_eq!(a, ["a", "b"]);

As a side note JavaScript provides another variant where N is the number of items returned and any additional items are lost.

'a,b,c'.split(',', 2)
["a", "b"]

@alexcrichton
Copy link
Member

@alexcrichton makes a good point about the 0 case; we would probably want to panic (i.e. treat that as a contract violation). But I don't think that's a deal breaker or anything.

If we accept this RFC I'd personally vote to accept it as-is where the integer parameter is the precise number of times the iterator will return Some, and the last time it returns Some will include the entire remainder of the string (which covers the 0 case). I'm a little surprised at the semantics, but they make the most sense to me!

My bigger concern is: how can we roll this out in the short time before beta without silently breaking a lot of code? @alexcrichton, I lack your confidence that we can just change the semantics, but maybe you can elaborate on how you see that working out?

I wouldn't necessarily say I had a grand vision here, mostly it was just that tests would catch this bug and general usage would weed out cases pretty quickly. This is a pretty minor method and I feel it's not worth an elaborate migration plan.

@aturon
Copy link
Member

aturon commented Mar 24, 2015

@alexcrichton Ah, I missed that point about 0. That seems OK.

I'm potentially OK with a direct change, but we would need to advertise it widely.

@steveklabnik
Copy link
Member

I'm vaguly 👍 about this RFC. Consistency with more languages is good. I agree with @alexcrichton that it probably doesn't need an elaborate upgrade path.

@lilyball
Copy link
Contributor

I'm tentatively in favor of this as well. I'm mildly concerned about silent breakage of existing, but I don't have any good ideas for how to fix that without causing more trouble than it's worth, unless we wanted to do something trivial like swap the order of the parameters (thus forcing all clients to update their code, except the clients may not realize the semantics changed).

@Kimundi
Copy link
Member

Kimundi commented Mar 24, 2015

So, it seems there are roughly three way to possibly handle the migration:

  1. Do nothing, just change behavior. This will silently break code, and only be discovered if there are tests that can uncover the difference.
  2. Temporary change signature or method name. This would alert users to the new semantic by causing compilation errors, and the temporary rename could indicated the changed semantic, eg splitn -> splitn_minus_one.
    The issue here is that after some time passed, we would likely want to change the name back, requiring up-to-date users to fix compilation errors again to keep up.
    A variant would be to also keep the old methods around temporary as deprecated, and finally switch them over after some transition time, but that would also incur two breakages for users that want to track master.
  3. Remove the methods from the str, and add a temporary trait, say NewSplitNSemantic that is not in the prelude. User code would break, and can be fixed by importing that trait manually, hopefully alerting the user that the semantic changed in the meantime. After the migration time passed, the methods can be added back to str itself and the trait become deprecated. Done correctly, this only requires users that follow master to fix their code once, instead of twice.

@aturon
Copy link
Member

aturon commented Mar 24, 2015

@Kimundi

Thanks for laying those options out! I think we probably lack the time to do (2) before the beta.

It seems like (3) could be a pretty reasonable option, though. In particular, we could remove the trait altogether during the beta cycle, meaning that by 1.0 we'd have a clean API.

@alexcrichton Thoughts on that migration strategy?

@alexcrichton
Copy link
Member

We've already got quite the hefty amount of deprecated code in the standard library, so I'd still be voting for the cold-turkey approach, but option (3) does indeed sound plausible to me!

@aturon
Copy link
Member

aturon commented Mar 31, 2015

While this is a late-breaking change, there's broad agreement that it removes a potential footgun. We've decided to move forward, and to change the semantics in place (together with a very loud PSA that this is happening as part of the beta release).

Thanks for the RFC!

@aturon
Copy link
Member

aturon commented Mar 31, 2015

Tracking issue

@shepmaster shepmaster deleted the align-splitn-with-other-languages branch April 1, 2015 01:44
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 1, 2015
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.

[rfc]: rust-lang/rfcs#979

Closes rust-lang#23911
[breaking-change]
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 1, 2015
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.

[rfc]: rust-lang/rfcs#979

Closes rust-lang#23911
[breaking-change]
@Centril Centril added the A-slice Slice related proposals & ideas label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-slice Slice related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants