Avoid allocation for every use of task local standard out #15341

bluss · 2014-07-02T17:29:58Z

Change private items std::io::stdio::{local_stdout, with_task_stdout} to
use RefCell.

Use RefCell to avoid having to replace out and replace back the TLS key
for the task local standard out object. It was noticed replace uses
a small allocation every time a key is set.

This way, with_task_stdout cannot be used nested anymore -- even though
I don't think it ever was, nor should it possible with the public API.

Change private items std::io::stdio::{local_stdout, with_task_stdout} to use RefCell. Use RefCell to avoid having to replace out and replace back the TLS key for the task local standard out object. It was noticed replace uses a small allocation every time a key is set. This way, with_task_stdout cannot be used nested anymore -- even though I don't think it ever was, nor should it possible with the public API.

bluss · 2014-07-02T17:33:18Z

The allocation for every println! is very small -- it's one Box<LocalData>. We discussed in irc why println! was allocating for every call and this was the reason.

Not sure if this is worth fixing, but it looks relatively simple.

alexcrichton · 2014-07-02T17:35:55Z

Before this change, recursive prints while printing would forcefully print to stdout, while with this change it now unconditionally fails the task (with an odd error message). (just stating a fact)

Have you measured a performance difference to see whether this is worth it or not?

bluss · 2014-07-02T17:40:10Z

Ok I think I'm already wrong about recursive prints -- they can happen if you use println! inside a formatting trait -- is that right?

Judging by the comment on with_task_stdout, disallowing nesting should be safer.

I haven't run performance tests yet, my laptop is miserably slow, working on it.

lilyball · 2014-07-02T17:56:41Z

You could allow recursive prints by using try_borrow_mut() instead of borrow_mut(), and temporarily setting a new stdout() if the borrow fails. That would replicate the old behavior.

bluss · 2014-07-02T18:22:24Z

@kballard the old solution composes nested printing much better -- here we basically have to allocat a new stdout for every nested print, but it shouldn't really happen, should it? It could also be without the line buffer.

Performance does increase, printing time reduced by 10% for both a single println! and for a silly test of 10,000 println!. From 47.7 ms/iteration to 42.7 ms/iteration, with this (redirected to file)

extern crate test;

#[deriving(Show)]
enum Instruction {
    Jump(i32),
    Match(bool),
    Text(&'static str),
}

fn gen_instr() -> Vec<Instruction>
{
    range(0i32, 10000).map(|x| match x % 3 {
        0 => Jump(x),
        1 => Match(x % 7 == 0),
        _ => Text("test-text")
    }).collect()
}

#[bench]
fn print_single(b: &mut test::Bencher)
{
    b.iter(|| {
        println!("abc");
    })
}

#[bench]
fn print_all(b: &mut test::Bencher)
{
    let instr = gen_instr();
    b.iter(|| {
        for (i, x) in instr.iter().enumerate() {
            println!("{} = {}", i, x);
        }
    })
}

When with_task_stdout is called inside its own invocation, create a temporary Writer for stdout.

bluss · 2014-07-02T19:29:59Z

Added commit to handle nested invocation again -- simply creating a std::io::stdout(), writing to it, flushing it. It doesn't compose like the old solution, but it's actually a simple solution.

lilyball · 2014-07-02T19:31:16Z

@blake2-ppc The current implementation (which is what I assume you mean by the "old solution") already allocates a new stdout for every level of print nesting. If you do 3 prints in a row all nested within a print, both the current solution and the proposed RefCell + try_borrow would still only allocate again for the first nested print (and the stdout it allocated would remain in TLS for the 2nd and 3rd nested prints to use, and would only be dropped once the outer print finished and it restored the original stdout).

lilyball · 2014-07-02T19:33:40Z

@blake2-ppc Ah I see what you just did. And upon reflection, storing it into TLS until the outer print returns is a bit more complicated. You'd have to just write it into TLS, and then the outer print would have to unconditionally re-write its RefCell back into TLS after calling the closure, in case the nested print changed anything.

I think it's worth trying and seeing what the performance is like. If writing into TLS is cheap, and the expensive part was the allocation of the box, then it should be a win.

lilyball · 2014-07-02T19:35:32Z

Ack, upon re-reading the original issue, the claim is that TLS uses allocation when calling replace(), which is exactly what you're trying to avoid. Well, you could read TLS again and see if the RefCell you got is the same as what you already have, and only write back if it changed.

bluss · 2014-07-02T19:35:34Z

@kballard If we unconditionally write back to the TLS, we end up where we started. Writing to the TLS is cheap, but it is one allocation and that's the one I'm trying to remove.

lilyball · 2014-07-02T19:37:09Z

@blake2-ppc Hmm, I assumed the allocation was necessary because TLS returns a Ref and therefore replace() would have to allocate a brand new box so as to not break any existing references. Except replace() is documented as failing if the value is currently on-loan (if there are any existing Refs for it). So why does replace() need to allocate at all if the key is already present in TLS? Can that be fixed?

bluss · 2014-07-02T19:46:42Z

I think the box is needed to create a homogenous type to store in the TLS map, I'm not at all familiar with the implementation.

I don't think any additional complexity for the nested case is worth it, not more than the two commits in the PR so far.

What needs mentioning is that pub fn set_stdout can now fail, if called during with_task_stdout too.

lilyball · 2014-07-02T19:47:27Z

Looking at TLS, I think this needs an overhaul. There's no actually reason why replace() should ever fail; we can replace Ref with effectively Rc instead, such that replacing a value that's currently borrowed will just "untether" the value from the TLS, but leave the value still accessible through the Ref. And we can rewrite this such that replacing a value that already exists in the map reuses the existing Box (if there are no outstanding Refs) instead of allocating a new one.

lilyball · 2014-07-02T19:50:57Z

@blake2-ppc I'm not thrilled about the idea of set_stdout() failing. The problem with nested prints is the nested print doesn't typically know it's nested (because there's little utility in attempting a nested print), so having a failure mode that only exists when the print is nested causes potential issues for all code that ever does set_stdout() even if it will never actually be used in a nested fashion. And there's plausible reasons why it should work in a nested fashion anyway.

For example, I might have some code that, for whatever reason, wants to log to a file when various methods are called, but it needs to log using other code out of its control that uses println!(). It can accomplish this by using set_stdout() to temporarily redirect stdout to its log file, call the other code, and then restore the old stdout. What's it supposed to do if set_stdout() fails?

bluss · 2014-07-02T19:54:40Z

Thanks for the insightful comments @kballard, I think that leaves this PR explored & closed and we have to go look somewhere else -- TLS.

lilyball · 2014-07-04T05:29:14Z

@blake2-ppc I've submitted PR #15399 that rewrites local_data to, among other things, try and avoid allocation in the replace(None) ... replace(Some) pattern.

Handle nesting in with_task_stdout again

48b8e84

When with_task_stdout is called inside its own invocation, create a temporary Writer for stdout.

bluss closed this Jul 2, 2014

Avoid allocation for every use of task local standard out #15341

Avoid allocation for every use of task local standard out #15341

Uh oh!

Conversation

bluss commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

alexcrichton commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

lilyball commented Jul 2, 2014

Uh oh!

bluss commented Jul 2, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

Uh oh!