-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Avoid allocation for every use of task local standard out #15341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change private items std::io::stdio::{local_stdout, with_task_stdout} to use RefCell. Use RefCell to avoid having to replace out and replace back the TLS key for the task local standard out object. It was noticed replace uses a small allocation every time a key is set. This way, with_task_stdout cannot be used nested anymore -- even though I don't think it ever was, nor should it possible with the public API.
The allocation for every Not sure if this is worth fixing, but it looks relatively simple. |
Before this change, recursive prints while printing would forcefully print to stdout, while with this change it now unconditionally fails the task (with an odd error message). (just stating a fact) Have you measured a performance difference to see whether this is worth it or not? |
Ok I think I'm already wrong about recursive prints -- they can happen if you use Judging by the comment on I haven't run performance tests yet, my laptop is miserably slow, working on it. |
You could allow recursive prints by using |
@kballard the old solution composes nested printing much better -- here we basically have to allocat a new stdout for every nested print, but it shouldn't really happen, should it? It could also be without the line buffer. Performance does increase, printing time reduced by 10% for both a single
|
When with_task_stdout is called inside its own invocation, create a temporary Writer for stdout.
Added commit to handle nested invocation again -- simply creating a |
@blake2-ppc The current implementation (which is what I assume you mean by the "old solution") already allocates a new stdout for every level of print nesting. If you do 3 prints in a row all nested within a print, both the current solution and the proposed RefCell + try_borrow would still only allocate again for the first nested print (and the stdout it allocated would remain in TLS for the 2nd and 3rd nested prints to use, and would only be dropped once the outer print finished and it restored the original stdout). |
@blake2-ppc Ah I see what you just did. And upon reflection, storing it into TLS until the outer print returns is a bit more complicated. You'd have to just write it into TLS, and then the outer print would have to unconditionally re-write its RefCell back into TLS after calling the closure, in case the nested print changed anything. I think it's worth trying and seeing what the performance is like. If writing into TLS is cheap, and the expensive part was the allocation of the box, then it should be a win. |
Ack, upon re-reading the original issue, the claim is that TLS uses allocation when calling |
@kballard If we unconditionally write back to the TLS, we end up where we started. Writing to the TLS is cheap, but it is one allocation and that's the one I'm trying to remove. |
@blake2-ppc Hmm, I assumed the allocation was necessary because TLS returns a |
I think the box is needed to create a homogenous type to store in the TLS map, I'm not at all familiar with the implementation. I don't think any additional complexity for the nested case is worth it, not more than the two commits in the PR so far. What needs mentioning is that |
Looking at TLS, I think this needs an overhaul. There's no actually reason why |
@blake2-ppc I'm not thrilled about the idea of For example, I might have some code that, for whatever reason, wants to log to a file when various methods are called, but it needs to log using other code out of its control that uses |
Thanks for the insightful comments @kballard, I think that leaves this PR explored & closed and we have to go look somewhere else -- TLS. |
@blake2-ppc I've submitted PR #15399 that rewrites |
Change private items std::io::stdio::{local_stdout, with_task_stdout} to
use RefCell.
Use RefCell to avoid having to replace out and replace back the TLS key
for the task local standard out object. It was noticed replace uses
a small allocation every time a key is set.
This way, with_task_stdout cannot be used nested anymore -- even though
I don't think it ever was, nor should it possible with the public API.