Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

Closed

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Mar 4, 2020

example 1:

see example #13582 (comment) showing 3X speedup compared to default stacktraces, while producing identical stacktraces

example 2:

nim compiling itself. Less drastic speedup than example 1 because it relies less on inlined procs (unlike math heavy code), but still an appreciatable speedup than anyone working on debugging compiler may benefit from.

before PR:

after building nim with --stacktrace:on --opt:speed:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 26.5.s
after building nim with --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 52.s
after building nim with -d:release --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 18.s
after building nim with -d:danger --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 17.7s

after PR:

after building nim with --stacktrace:noinline --opt:speed:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 19.5.s (1.35X faster)
after building nim with --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 46.9.s (1.09X faster)
after building nim with -d:release --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 14.7.s (1.24X faster)
after building nim with -d:danger --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 14.3.s (1.23X faster)

and for comparison (unchanged before/after this PR):
after building nim with -d:danger --stacktrace:off:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 8.5

links

@timotheecour timotheecour changed the title refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline Mar 4, 2020
@alaviss
Copy link
Collaborator

alaviss commented Mar 4, 2020

But why? Isn't the point of turning stacktrace on is to get stacktraces? And you referenced the wrong issue/PR I think.

If you want speed wouldn't you just push stacktrace off wherever needed?

@timotheecour
Copy link
Member Author

timotheecour commented Mar 4, 2020

And you referenced the wrong issue/PR I think.

thanks! fixed

But why?

makes default debug builds faster (or any build with --stacktrace:on), offering a better compromise between speed and debugging-ability

you loose nothing with this PR:

  • --stacktrace:inline is equivalent to --stacktrace:on before this PR
  • --stacktrace:off is equivalent to --stacktrace:off before this PR
  • --stacktrace:on now honors {.inline.} so they don't slow down performance; inlined procs is rarely what you want to debug as these correspond to, typically, performance sensitive code where inlining matters, including a bunch of mathematics operators l. But you can still debug these via --stacktrace:inline

@timotheecour timotheecour changed the title [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline Mar 4, 2020
@Clyybber
Copy link
Contributor

Clyybber commented Mar 4, 2020

you loose nothing with this PR:

Sure. We loose sane defaults IMO.

@timotheecour
Copy link
Member Author

timotheecour commented Mar 4, 2020

Sure. We loose sane defaults IMO.

cmon, it's just a matter of flipping the roles of the --stacktrace flags eg this could also be done with very little change from current state of this PR:

--stacktrace:off # never
--stacktrace:on # always
--stacktrace:noinline # only for procs not marked {.inline.}

so your "sane" defaults are preserved that way. I prefer the way I wrote it but willing to change to these if there's consensus

@timotheecour timotheecour changed the title [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline [RFC] [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline Mar 4, 2020
@Clyybber
Copy link
Contributor

Clyybber commented Mar 4, 2020

But this new behaviour is a step back. If you want performance at the cost of stacktraces use {.push: stacktraces: off.}.
No need to introduce surprising and more complex behaviour.

@timotheecour
Copy link
Member Author

timotheecour commented Mar 4, 2020

I've changed the default so that nothing changes except if you pass the new flag --stacktrace:noinline, so there is no "step back"

No need to introduce surprising and more complex behaviour.

there's nothing surprising or complex, this matches how native stacktraces work!

  • keep using the flags as you did and there will be zero difference vs before this PR.
  • or use the new --stacktrace:noinline and you get a nice speedup in lots of cases, whether you use it in combination with -d:danger or for debug builds.

Here's an example: you get a 3X speedup compared to --stacktrace:on, and you get the exact same stacktrace here (but this is common case; the things we're inlining are hotspots, well debugged, and unlikely to be where crashes or anything interesting happens).

import tables, times
var tab: Table[int, int]
proc main() =
  let n = 1000
  for i in 0..<n: tab[i] = i # doAssert i < n-1 # insert this to crash and show stacktrace
let t = cpuTime()
for i in 0..<100000: main()
echo cpuTime() - t

nim c -d:danger -r --stacktrace:off main 0.490725 s
nim c -d:danger -r --stacktrace:on main 2.041365 s
nim c -d:danger -r --stacktrace:noinline main 0.658695 s

likewise, we get a 3X speedup between nim c --opt:speed -r --stacktrace:on main and nim c --opt:speed -r --stacktrace:inline main

as for the stacktraces: insert the doAssert line:

  • with --stacktrace:off you get no stacktrace
  • with --stacktrace:noinline you get the exact same stacktrace as for --stacktrace:on

in the rare cases where the thing that crashed is inside a {.inline.} proc, just re-run with --stacktrace:on

If you want performance at the cost of stacktraces use {.push: stacktraces: off.}.

as I already explained, this is a worse solution: you'll end up having to find where bottlenecks are, modify sources, rerun, and modify sources back, and it'll affect everything inside those push/pop instead of just the inline bits.

It's the exact analog as why we have both -d:danger and -d:release:

  • -d:release is a good compromise between safety and speed
  • --stacktrace:inline is a good compromise between debuggability and speed
    (and these can be combined, eg -d:danger --stacktrace:inline)

@Clyybber
Copy link
Contributor

Clyybber commented Mar 4, 2020

We already have -d:danger.
This makes everything just more complicated, the next issue could then be "allow overwriting --stacktrace:noinline with {.push: stacktrace: on.}".
Its a rabbit hole that just doesn't need opening IMO, when you use the features nim already provides you with to accomplish what you need.

@timotheecour timotheecour changed the title [RFC] [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with same stacktraces typically Mar 4, 2020
@timotheecour timotheecour changed the title [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with same stacktraces typically [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with typically same stacktraces Mar 4, 2020
@timotheecour timotheecour changed the title [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with typically same stacktraces [RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces Mar 4, 2020
@stale
Copy link

stale bot commented Mar 9, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Mar 9, 2021
@stale stale bot closed this Apr 8, 2021
@timotheecour timotheecour reopened this Apr 9, 2021
@stale stale bot removed the stale Staled PR/issues; remove the label after fixing them label Apr 9, 2021
@timotheecour timotheecour marked this pull request as draft April 9, 2021 06:39
@Varriount
Copy link
Contributor

So I'm a bit confused about the objections here - aside from what I see to be an internal optimization and a slight CLI change, what is the downside to this?

Yes, one can turn stack traces off or inline functions using a pragma, but one can also write their own stack traces manually and inline functions by hand. To me (at least, currently), this seems like an improvement with little in the way of a downside.

@Araq
Copy link
Member

Araq commented Apr 9, 2021

Advantages:

  • Simple idea and implementation.

Disadvantages:

  • Yet another switch.
  • There are already other ways to speed up strack tracking for production use, based on libunwind or whatever it's called.

Personally I would rather see the effort spend on nlvm, which has the potential to give us really decent debugging support.

@Varriount
Copy link
Contributor

@Araq Regarding "yet another switch", couldn't this just be made the default implementation? I also can't find any mention of libUnwind in the repository. If it isn't currently implemented, then (presumably) an implementation would require a switch anyway.

@stale
Copy link

stale bot commented Apr 27, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Apr 27, 2022
@stale stale bot closed this May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants