- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
std: Stabilize the std::str module #19741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @alexcrichton / @aturon: Have you considered replacing  fn foo<T: AsSlice>(x: T) {
    let x = x.as_slice();
    ...
}
foo("foo");
foo([1, 2, 3]);Otherwise to implement this pattern we'd need to make another trait to support this pattern. | 
| @erickt I think that it leads to ambiguities when you just call  impl ::slice::AsSlice<u8> for str {                
    fn as_slice(&self) -> &[u8] { self.as_bytes() }
}                                                   | 
| See #19612 (comment) | 
| re:  | 
| @alexcrichton / @aturon: Yeah, we could do  db.set("foo".as_bytes(), "abc".as_bytes()).unwrap();
db.set("bar".as_bytes(), "def".as_bytes()).unwrap();
db.set("baz".as_bytes(), "ghi".as_bytes()).unwrap();Or add a wrapper for setting string keys with: db.set_str("foo", "abc".as_bytes()).unwrap();
db.set_str("bar", "def".as_bytes()).unwrap();
db.set_str("baz", "ghi".as_bytes()).unwrap();There's a bit of line noise in both approaches. It would be much nicer to have something like  db.set("foo", "abc").unwrap();
db.set("bar", "def").unwrap();
db.set("baz", "ghi").unwrap();I could write a trait for my library to do this, but this pattern would then force people wanting to support  | 
        
          
                src/libcollections/str.rs
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Please implement this as String(self.as_bytes().to_vec()) (you may need to move it to collections/string.rs). Let's avoid degrading the performance of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or you could use String::from_str(), I think it does the same thing, and doesn't need moving this impl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll switch it over.
| I dislike that this leaves the obvious thing to do when converting a string literal to a  I know that it's more consistent and micro-benchmarks etc., but it feels just silly for the most obvious thing to make a full roundtrip through the formatting infrastructure and a redundant check for valid utf-8, in addition to over-allocating. This was a wart when the answer was  | 
| The question of the efficiency of the formatting subsystem is somewhat orthogonal in my mind because  | 
| @alexcrichton Except that this PR is stabilizing this status quo. | 
| Remember that this is deprecating  | 
| It's not orthogonal. You're causing a severe performance regression. Attention to performance is part of API design, and even if it was an implementation issue it is still a stupid regression. | 
9346257    to
    1e730b7      
    Compare
  
    | @alexcrichton: The conflict you're having with  #![feature(lang_items, macro_rules)]
#![no_std]
#![crate_type = "staticlib"]
extern crate core;
pub unsafe fn replace<T>(dest: *mut T, mut src: T) -> T {                               |
use core::kinds::Sized;
#[lang = "stack_exhausted"] extern fn stack_exhausted() {}
#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} }
#[unstable = "may merge with other traits"]
pub trait AsSlice<T> for Sized? {
    fn as_slice<'a>(&'a self) -> &'a [T];
}
#[unstable = "trait is unstable"]
impl<T> AsSlice<T> for [T] {
    #[inline(always)]
    fn as_slice<'a>(&'a self) -> &'a [T] { self }
}
impl<'a, T, Sized? U: AsSlice<T>> AsSlice<T> for &'a U {
    #[inline(always)]
    fn as_slice<'a>(&'a self) -> &'a [T] { AsSlice::as_slice(*self) }
}
impl<'a, T, Sized? U: AsSlice<T>> AsSlice<T> for &'a mut U {
    #[inline(always)]
    fn as_slice<'a>(&'a self) -> &'a [T] { AsSlice::as_slice(*self) }
}
impl<'a> AsSlice<u8> for str {
    #[inline(always)]
    fn as_slice<'a>(&'a self) -> &'a [u8] {
        unsafe { core::mem::transmute(self) }
    }
}@aturon: Will the  @alexcrichton: I'm very sad to see  // Trigger a copy for me.
let o = ObjectBuilder::new().insert("foo", ...).unwrap();
// Move the string into the `json::Value` enum with no allocation.
let key = String::new("foo");
let o = ObjectBuilder::new().insert(key, ...).unwrap();Since I'm betting most users are going to use  I could have two APIs again,  | 
| @aturon: This might just be your cast trait with a different name, but this variation on  trait BorrowFrom<'a, To> {
    fn borrow_from(&'a self) -> To;
}
impl<'a> BorrowFrom<'a, &'a [u8]> for Vec<u8> {
    fn borrow_from(&'a self) -> &'a [u8] {
        self.as_slice()
    }
}
impl<'a> BorrowFrom<'a, &'a str> for String {
    fn borrow_from(&'a self) -> &'a str {
        self.as_slice()
    }
}
impl<'a> BorrowFrom<'a, &'a [u8]> for String {
    fn borrow_from(&'a self) -> &'a [u8] {
        self.as_bytes()
    }
}
impl<'a> BorrowFrom<'a, &'a [u8]> for &'a str {
    fn borrow_from(&'a self) -> &'a [u8] {
        self.as_bytes()
    }
}
impl<'a, T: BorrowFrom<'a, U>, U> BorrowFrom<'a, U> for &'a T {
    fn borrow_from(&'a self) -> U {
        (**self).borrow_from()
    }
}
#[deriving(Show)]
struct Datum<'a> { data: &'a [u8] }
impl<'a> BorrowFrom<'a, Datum<'a>> for String {
    fn borrow_from(&'a self) -> Datum<'a> {
        self.as_slice().borrow_from()
    }
}
impl<'a> BorrowFrom<'a, Datum<'a>> for &'a str {
    fn borrow_from(&'a self) -> Datum<'a> {
        Datum { data: self.as_bytes() }
    }
}
fn foo_slice<'a, T>(t: &'a T) where T: BorrowFrom<'a, &'a [u8]> {
    let datum: &'a [u8] = t.borrow_from();
    println!("datum: {}", datum);
}
fn foo_str<'a, T>(t: &'a T) where T: BorrowFrom<'a, &'a str> {
    let datum: &'a str = t.borrow_from();
    println!("datum: {}", datum);
}
fn foo_custom<'a, T>(t: &'a T) where T: BorrowFrom<'a, Datum<'a>> {
    let datum: Datum<'a> = t.borrow_from();
    println!("datum: {}", datum);
}
fn main() {
    let s = "hello world".to_string();
    foo_slice(&s);
    foo_str(&s);
    foo_custom(&s);
} | 
e71d542    to
    66925c4      
    Compare
  
    | @erickt Yes I didn't rename to  For your use case I know @aturon has also been thinking about a generic set of conversion traits recently to serve a more broad purpose. Having lots of little one-off traits would be unfortunate for all types in the standard library (e.g. why should we not have  | 
| @alexcrichton The discomfort (at least for me) is not so much that this basically changes the idiom from  | 
| 
 Yes, that's right -- for traits whose sole purpose is generic programming over conversions (i.e. providing implicit conversions via overloading), we should be able to replace them with a single set of traits that everyone knows/uses/implements. This should cut down on the problem of people having to know and implement your custom trait to be compatible with your library. | 
| 
 Yep. The trait will be  
 I think that generic conversion traits will serve this role much better, as I mentioned in my previous comment. "Overloading over ownership" is a pattern that's emerging in several APIs (you can see it in the  | 
| @alexcrichton Ok, I've looked this over and it looks good to me -- just a couple of tiny typos. r=me once we've resolved the  (We'll need to discuss methods like  | 
66925c4    to
    a89f819      
    Compare
  
    0f118c7    to
    664004f      
    Compare
  
    664004f    to
    213a3de      
    Compare
  
    This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.
This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef
Unstable
* from_utf8 - the error type was changed to a `Result`, but the error type has
              yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
                           the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
                           the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
                                             libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
                      it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators
Deprecated
* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
                          not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
* Str - while currently used for generic programming, this trait will be
        replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement
Awaiting stabilization due to patterns and/or matching
* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
    41482f4    to
    8c60c0e      
    Compare
  
    8c60c0e    to
    2728a39      
    Compare
  
    2728a39    to
    082bfde      
    Compare
  
    This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.
This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef
Unstable
* from_utf8 - the error type was changed to a `Result`, but the error type has
              yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
                           the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
                           the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
                                             libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
                      it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators
Deprecated
* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
                          not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
* Str - while currently used for generic programming, this trait will be
        replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement
Awaiting stabilization due to patterns and/or matching
* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
    This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.
This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef
Unstable
* from_utf8 - the error type was changed to a `Result`, but the error type has
              yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
                           the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
                           the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
                                             libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
                      it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators
Deprecated
* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
                          not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
* Str - while currently used for generic programming, this trait will be
        replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement
Awaiting stabilization due to patterns and/or matching
* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
    | These were breaking. | 
This commit starts out by consolidating all
strextension traits into oneStrExttrait to be included in the prelude. This means thatUnicodeStrPrelude,StrPrelude, andStrAllocatinghave all been merged intoone
StrExtexported by the standard library. Some functionality is currentlyduplicated with the
StrExtpresent in libcore.This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
Unstable
Result, but the error type hasyet to prove itself
the cut
the cut
libunicode
it's not certain at this time
Deprecated
not need to be exposed.
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
replaced with one of [], deref coercions, or a generic conversion trait.
Awaiting stabilization due to patterns and/or matching