- Start Date: 2014-04-15
- RFC PR #:
- Rust Issue #:
Add TextReader
and TextWriter
traits to std::io
for Unicode text-oriented streams,
like Reader
and Writer
are for byte-oriented streams.
The API design of text-oriented streams guarantees well-formed Unicode scalar values (characters),
so that there is no need to deal with e.g. errors caused by invalid UTF-8 in an input byte sequence.
When dealing with a potentially large amount of data,
we prefer doing so incrementally rather than having the data set
and all of its intermediate representations entirely in memory.
This is why Reader
and Writer
were added.
Additionally, experience in other programming language has taught us of
the Unicode sandwich:
when dealing with text, the best practice is to handle Unicode only internally
(in Rust: char
, str
and StrBuf
; as opposed to u8
and [u8]
),
and convert to or from bytes at the program’s boundaries, when doing I/O.
Byte-oriented streams are good, but we also need text-oriented streams.
For example, JSON is defined in terms of Unicode code points.
Encoding these code points to UTF-8 for transmission is completely orthogonal
to JSON itself.
Our serialize::json
module could be based on text streams,
and avoid the redundant UTF-8 valitiy check
that’s involved when getting a ~str
from a byte stream.
rust-encoding
will provide wrappers to "convert" between byte streams and text streams.
For example, one that takes a Writer
, an encoding, and an error handling behavior,
and provides a TextWriter
.
Eventually, we could open a file directly in text mode with a given encoding and obtain a text stream.
/// A minimal implementation only needs `write_str`.
/// However, a writer that is not based on UTF-8 may prefer
/// to override `write_char` as their "most fundamental" method,
/// and implement `write_str` with:
///
///
/// fn write_str(&mut self, buf: &str) -> IoResult<()> {
/// for c in buf.chars {
/// try!(write_char(c))
/// }
/// Ok(())
/// }
pub trait TextWriter {
fn write_str(&mut self, buf: &str) -> IoResult<()>;
// These are similar to Writer, but based on `write_str` instead of `write`.
fn write_char(&mut self, c: char) -> IoResult<()> { ... }
fn write_line(&mut self, s: &str) -> IoResult<()> { ... }
fn write_uint(&mut self, n: uint) -> IoResult<()> { ... }
fn write_int(&mut self, n: int) -> IoResult<()> { ... }
// These are similar to Writer
fn flush(&mut self) -> IoResult<()> { ... }
fn by_ref<'a>(&'a mut self) -> RefWriter<'a, Self> { ... }
}
impl<'a, W: TextWriter> TextWriter for RefWriter<'a, W> { ... }
Other than write_char
, the set of default methods is just an idea.
If and when #7771 is implemented,
write_str
can have a default implementation based on write_char
with #[requires(one_of(write_str, write_char)]
on the trait.
pub trait TextReader {
// XXX See "Unresolved questions" below.
fn read(&mut self, buf: &mut StrBuf, max_bytes: uint) -> IoResult<uint>;
// These are similar to Reader
fn read_to_end(&mut self) -> IoResult<~str> { ... }
fn bytes<'r>(&'r mut self) -> Bytes<'r, Self> { ... }
fn by_ref<'a>(&'a mut self) -> RefReader<'a, Self> { ... }
// These are similar to Buffer
fn read_line(&mut self) -> IoResult<~str> { ... }
fn lines<'r>(&'r mut self) -> Lines<'r, Self> { ... }
fn read_until<C: CharEq>(&mut self, char: C) -> IoResult<~str> { ... }
fn read_char(&mut self) -> IoResult<char> { ... }
fn chars<'r>(&'r mut self) -> Chars<'r, Self> { ... }
}
impl<'a, R: TextReader> TextReader for RefReader<'a, R> { ... }
The set of default methods here is just an idea.
- Let rust-encoding define
TextReader
andTextWriter
itself and revisit later. - We may want
TextReader
to be closer tostd::io::Buffer
(which requiresReader
) rather than justReader
fn read(&mut self, buf: &mut StrBuf, max_bytes: uint) -> IoResult<uint>;
is proposed as the most fundamental method ofTextReader
. Is this the right design? See discussion in this RFC’s pull request comments.- Which of these things should have text-oriented equivalents?
The
Buffer
,Seek
, andStream
traits, their buffered wrapper implementations, the readers and writers instd::io::util
.