Skip to content

Avoid allocation for every use of task local standard out #15341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Avoid allocation for every use of task local standard out #15341

wants to merge 2 commits into from

Conversation

bluss
Copy link
Member

@bluss bluss commented Jul 2, 2014

Change private items std::io::stdio::{local_stdout, with_task_stdout} to
use RefCell.

Use RefCell to avoid having to replace out and replace back the TLS key
for the task local standard out object. It was noticed replace uses
a small allocation every time a key is set.

This way, with_task_stdout cannot be used nested anymore -- even though
I don't think it ever was, nor should it possible with the public API.

Change private items std::io::stdio::{local_stdout, with_task_stdout} to
use RefCell.

Use RefCell to avoid having to replace out and replace back the TLS key
for the task local standard out object. It was noticed replace uses
a small allocation every time a key is set.

This way, with_task_stdout cannot be used nested anymore -- even though
I don't think it ever was, nor should it possible with the public API.
@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

The allocation for every println! is very small -- it's one Box<LocalData>. We discussed in irc why println! was allocating for every call and this was the reason.

Not sure if this is worth fixing, but it looks relatively simple.

@alexcrichton
Copy link
Member

Before this change, recursive prints while printing would forcefully print to stdout, while with this change it now unconditionally fails the task (with an odd error message). (just stating a fact)

Have you measured a performance difference to see whether this is worth it or not?

@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

Ok I think I'm already wrong about recursive prints -- they can happen if you use println! inside a formatting trait -- is that right?

Judging by the comment on with_task_stdout, disallowing nesting should be safer.

I haven't run performance tests yet, my laptop is miserably slow, working on it.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

You could allow recursive prints by using try_borrow_mut() instead of borrow_mut(), and temporarily setting a new stdout() if the borrow fails. That would replicate the old behavior.

@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

@kballard the old solution composes nested printing much better -- here we basically have to allocat a new stdout for every nested print, but it shouldn't really happen, should it? It could also be without the line buffer.

Performance does increase, printing time reduced by 10% for both a single println! and for a silly test of 10,000 println!. From 47.7 ms/iteration to 42.7 ms/iteration, with this (redirected to file)

extern crate test;

#[deriving(Show)]
enum Instruction {
    Jump(i32),
    Match(bool),
    Text(&'static str),
}

fn gen_instr() -> Vec<Instruction>
{
    range(0i32, 10000).map(|x| match x % 3 {
        0 => Jump(x),
        1 => Match(x % 7 == 0),
        _ => Text("test-text")
    }).collect()
}

#[bench]
fn print_single(b: &mut test::Bencher)
{
    b.iter(|| {
        println!("abc");
    })
}

#[bench]
fn print_all(b: &mut test::Bencher)
{
    let instr = gen_instr();
    b.iter(|| {
        for (i, x) in instr.iter().enumerate() {
            println!("{} = {}", i, x);
        }
    })
}

When with_task_stdout is called inside its own invocation, create
a temporary Writer for stdout.
@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

Added commit to handle nested invocation again -- simply creating a std::io::stdout(), writing to it, flushing it. It doesn't compose like the old solution, but it's actually a simple solution.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

@blake2-ppc The current implementation (which is what I assume you mean by the "old solution") already allocates a new stdout for every level of print nesting. If you do 3 prints in a row all nested within a print, both the current solution and the proposed RefCell + try_borrow would still only allocate again for the first nested print (and the stdout it allocated would remain in TLS for the 2nd and 3rd nested prints to use, and would only be dropped once the outer print finished and it restored the original stdout).

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

@blake2-ppc Ah I see what you just did. And upon reflection, storing it into TLS until the outer print returns is a bit more complicated. You'd have to just write it into TLS, and then the outer print would have to unconditionally re-write its RefCell back into TLS after calling the closure, in case the nested print changed anything.

I think it's worth trying and seeing what the performance is like. If writing into TLS is cheap, and the expensive part was the allocation of the box, then it should be a win.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

Ack, upon re-reading the original issue, the claim is that TLS uses allocation when calling replace(), which is exactly what you're trying to avoid. Well, you could read TLS again and see if the RefCell you got is the same as what you already have, and only write back if it changed.

@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

@kballard If we unconditionally write back to the TLS, we end up where we started. Writing to the TLS is cheap, but it is one allocation and that's the one I'm trying to remove.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

@blake2-ppc Hmm, I assumed the allocation was necessary because TLS returns a Ref and therefore replace() would have to allocate a brand new box so as to not break any existing references. Except replace() is documented as failing if the value is currently on-loan (if there are any existing Refs for it). So why does replace() need to allocate at all if the key is already present in TLS? Can that be fixed?

@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

I think the box is needed to create a homogenous type to store in the TLS map, I'm not at all familiar with the implementation.

I don't think any additional complexity for the nested case is worth it, not more than the two commits in the PR so far.

What needs mentioning is that pub fn set_stdout can now fail, if called during with_task_stdout too.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

Looking at TLS, I think this needs an overhaul. There's no actually reason why replace() should ever fail; we can replace Ref with effectively Rc instead, such that replacing a value that's currently borrowed will just "untether" the value from the TLS, but leave the value still accessible through the Ref. And we can rewrite this such that replacing a value that already exists in the map reuses the existing Box (if there are no outstanding Refs) instead of allocating a new one.

@lilyball
Copy link
Contributor

lilyball commented Jul 2, 2014

@blake2-ppc I'm not thrilled about the idea of set_stdout() failing. The problem with nested prints is the nested print doesn't typically know it's nested (because there's little utility in attempting a nested print), so having a failure mode that only exists when the print is nested causes potential issues for all code that ever does set_stdout() even if it will never actually be used in a nested fashion. And there's plausible reasons why it should work in a nested fashion anyway.

For example, I might have some code that, for whatever reason, wants to log to a file when various methods are called, but it needs to log using other code out of its control that uses println!(). It can accomplish this by using set_stdout() to temporarily redirect stdout to its log file, call the other code, and then restore the old stdout. What's it supposed to do if set_stdout() fails?

@bluss
Copy link
Member Author

bluss commented Jul 2, 2014

Thanks for the insightful comments @kballard, I think that leaves this PR explored & closed and we have to go look somewhere else -- TLS.

@bluss bluss closed this Jul 2, 2014
@lilyball
Copy link
Contributor

lilyball commented Jul 4, 2014

@blake2-ppc I've submitted PR #15399 that rewrites local_data to, among other things, try and avoid allocation in the replace(None) ... replace(Some) pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants