Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reliable line numbers in code info printing #29893

Merged
merged 8 commits into from
Nov 30, 2018
Merged

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Nov 1, 2018

fixes #27149

This implements a uniform and flexible printing of line number and inlining information across all of our compiler introspection passes (*see comparison screen shots at the bottom). The goal here is to expose all information which may be necessary to the user, but to also provide a uniform framework for letting the user augment or limit that information. As such, all code_ functions and macros now take a debuginfo argument.

This PR utilizes a flexible callback-based design (loosely based on the LLVM design of the same that has proved very useful to us) so that users can create new designs which render less or more information than the default (and to also set those to the default(*), if they wish). Currently the only modes for :debuginfo that are implemented are :source (showing only line number information) and :none (showing no debug information). Also, all provide :default (user-customizable(*)). In the future, I’m hoping users will implement new modes to suit other specific interests. Such examples, ordered by verbosity, include:

  • none (“just give me the code” mode—implemented here)
  • oneliner (only show information if it fits on one line—e.g. the approximate printer we have now on v1.0, but needs Keno to port this over)
  • source (fully accurate line info printer capable of handling inlining—implemented here)
  • types (like source, but also show specialization information on everything)
  • metadata (like types, but also show all metadata that the compiler has)
  • variables (like metadata, but also represent further information about where everything came from—e.g. what happened to the original variables or fields)

Example snippets below. This function was chosen because it has unusually complicated line information, even at the source level, because of the macro. This is not true of most functions: typically unoptimized code will not contain any inlining bar delineating any statements as belonging to some foreign frame. I’m also hopeful to see if someone—maybe me—can move the bar to the right, while still preserving the accuracy that I've worked hard over the years to ensure.) Note: these were taken using the Solarized Dark theme, which sometimes shifts colors pretty far from their defaults to much more muted tones, so YMMV.

input
screen shot 2018-10-30 at 1 56 44 pm

code_lowered
screen shot 2018-10-30 at 1 55 45 pm

code_warntype
screen shot 2018-10-30 at 1 55 30 pm

code_typed
screen shot 2018-10-30 at 1 55 07 pm

code_llvm
screen shot 2018-10-30 at 1 54 33 pm

code_native
screen shot 2018-10-30 at 1 54 08 pm

* user-customization is handled via an open dictionary Base.IRShow.debuginfo that maps symbol name (such as :default) to a callback annotation printer. Users can alter or add mappings in their .juliarc.jl files and create new packages to create novel new modes of their own!

@Keno
Copy link
Member

Keno commented Nov 1, 2018

This seems nice, but can we leave the default non-verbose? I'm worried about new users trying to use these and getting a mess of line info thrown at them (this is already happening with code_native as implemented right now).

@JeffBezanson
Copy link
Member

This is an absurd bikeshed comment, but the brackets look weird to me. The closing characters should be .

@vtjnash
Copy link
Member Author

vtjnash commented Nov 26, 2018

oneliner mode (only show information if it fits on one line)

This just needs someone to adapt it to the framework here. If it had gotten done by now by someone, I was figuring that we could merge it into this PR. But we can merge that on master separately whenever it's ready. For now, I’m most interested in finally recovering the capability for this code to present information completely and correctly.

I'm worried about new users trying to use these and getting a mess of line info thrown at them

I have great confidence in users abilities—they aren’t stupid. Regardless, I expect that I'll appreciate having this information available by default in bug reports, in much the same way that I want backtraces to be overly information-laden by default.
I'm trying very hard not to comment on the current situation (hence my very slow rate of replies), but stating this opinion as a fact is really not helpful. As it is now, I don't understand most of the line information getting thrown at me. I'm somewhat sure I've come across bugs, but the output is so intentionally inaccurate that I've been unable to open new issues to explain the problems. I think that new users will appreciate having the information shown in a format that attempts to simply and accurately represent the source of each ssavalue. And regardless, I know that some users will already be happy that this PR allows them to eliminate "extraneous" output, even though that wasn't exactly my goal.

Btw, @Keno you mentioned that you'd like the bars on the right. I'm generally for that too, since otherwise the right-hand-size seems to be under-utilized, and I don't yet know what else we would need to put there. Once this general framework is merged, I'm hopeful that we can find some way to get that changed where supported.

@JeffBezanson sure, done. It was a tossup for me whether it was drawing a strict grouping or some more abstract margin line design. Going with the bracket interpretation seems good.

@Keno
Copy link
Member

Keno commented Nov 26, 2018

May I suggest for the purpose of getting this merged, you switch the default to non-verbose, since everybody seems to be in favor of that part of this change, and then we have a separate debate on that part of the change? I don't think coupling the two is necessary.

@Keno
Copy link
Member

Keno commented Nov 26, 2018

And FWIW, I'd be fine if "non-verbose" means "none" here until we re-implement the RHS information.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 26, 2018

Can we just recommend that those folks edit their .juliarc.jl to set their personal preference (and no, not everyone agrees with you)? I think for casual and initial interaction with this infrastructure, we may want to present users with a hint at the wider range of information what we can provide. I expect that most users aren't familiar with ssa-form, and would find the extent of our lowering pretty confusing—where these line numbers may help to anchor them. But then as they dig deeper into their problem and have isolated it to one section of the code, they can start to turn off (or perhaps sometimes increase) the irrelevant verbosity around that snippet.

True, that doesn't really need to block merging, since I can just set my default too, so I've switched that default in my latest commit:

Base.IRShow.debuginfo[:default] = Base.IRShow.debuginfo[:source] # or :none

Copy link
Member

@Keno Keno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with me.

@StefanKarpinski
Copy link
Member

Should we call this verbosity instead of debuginfo? Or just info? I can't seem to find where the different allowable values are. What does :source do as compared to :default?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 27, 2018

I can't seem to find where the different allowable values are. What does :source do as compared to :default?

Maybe not too visible in the diff, but the currently allowed values are explained in every doc-string for each code_* function and macro (and observations about possible future work to add more values is mentioned in the PR text above). These doc-string additions ended up somewhere near to the lower 3/4th of the github changes view.

Looking at the doc-string, it could also be called comments. But you don't say why those proposals would be better.

What does :source do as compared to :default?

The alias :default can be decided to be set to anything else, without affecting the meaning of :source. It's the representation of "not-specified".

@StefanKarpinski
Copy link
Member

Perhaps I'm missing it but I don't see :source mentioned in any doc string or otherwise documented.

My objection is that the name debuginfo is long, redundant and slightly awkward. We also already use underscores in the names of other code reflection options, so this should be debug_info for consistency. I think that comments or verbose would be better names. I also find the symbol values a bit awkward. Perhaps an integer verbosity level with 0 for no debug comments, 1 for a medium amount (roughly what we used to print), and 2 for all the information that's available.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 27, 2018

In prose, I write the option name without the leading colon.

My objection is that the name debuginfo is long, redundant and slightly awkward. We also already use underscores in the names of other code reflection options

Seriously, you're worried that it's a single character too long? What is it redundant with? If you're going to bike-shed the name, it'd be helpful to at least have a more concrete reason than "comments=2" is "better and less awkward". Most of the options are single words, and avoid the underscore question. But typically across Julia, we
seem to be attempting to avoid them (ref, Underscore Audit. Aside: there's also a great typo on that page: "strive to eliminate all names where underscores are necessary"). I guess I gravitated towards debuginfo from prior exposure (In LLVM, this information is called DebugInfo or DI), although "comments" seems an acceptable compromise to me.

Perhaps an integer verbosity level with 0

We could permit integers as aliases, for ease of typing, but even C has had enums for ages for better readability.

@jekbradbury
Copy link
Contributor

jekbradbury commented Nov 28, 2018

On this branch, if you set Base.IRShow.debuginfo[:default] = Base.IRShow.debuginfo[:none], the debuginfo=:source kwarg doesn't seem to work for me; the reverse is fine for code_typed and code_llvm but not code_native.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 28, 2018

It works fine, but specifies something slightly different. In particular, they can independently specify a choice to filter out available debug information. We may synchronize those options in more cases in the future. But right now, master doesn't give you any choice, so this is at least an improvement.

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I am quite looking forward to this.

stdlib/InteractiveUtils/src/codeview.jl Outdated Show resolved Hide resolved
Use box-drawing characters and indentation to make the output readable more rapidly.
…_native

For consistency of user experience, reduce the variance in our IR printing across formats.
This also now shows inlining and line number information even if the output device might not support color,
which was previously impossible (a regression since v0.6).
Here we make the observation that it's somewhat common to have chains
of methods of one function that recursively handle arguments in some
fashion (for example, map-tuple or +). However, since they all have the
same method name, it's possible to represent these on a single line.
These levels are intended to roughly correspond approximately to compiler options -g0 (none) through -g2 (including local variables).
Currently we do not implement the ability to represent local variable information, so the default is to only show line number information
suitable for identifying the source line of a given statement (but not the types).
Only for Julia IR, for now. Later, this should be threaded into code_llvm and enabled there also.
I chose green for new nodes (like new life), and used yellow for metadata
(because the strong contrast helps to hide it visually, and similar to sticky notes),
and used grey for basic blocks (because they are minor information),
and used cyan/gray/Base.warn_color() for type information
(because that's what we've always used).
@vchuravy vchuravy merged commit b3cf152 into master Nov 30, 2018
@vchuravy vchuravy deleted the jn/codeinfo-lineno branch November 30, 2018 14:24
@StefanKarpinski
Copy link
Member

Is it just me or does this option not have any effect?

julia> function collatz(n)
           k = 0
           while n > 1
               n = iseven(n) ? n >> 1 : 3n + 1
               k += 1
           end
           return k
       end
collatz (generic function with 1 method)

julia> @code_native debuginfo=:none collatz(100)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[9]:2 within `collatz'
	xorl	%eax, %eax
; │ @ REPL[9]:3 within `collatz'
; │┌ @ operators.jl:286 within `>'
; ││┌ @ int.jl:49 within `<'
	cmpq	$2, %rdi
; │└└
	jl	L45
	nopl	(%rax,%rax)
; │ @ REPL[9]:4 within `collatz'
; │┌ @ int.jl:448 within `>>' @ int.jl:441
L16:
	movq	%rdi, %rcx
	sarq	%rcx
; │└
	testb	$1, %dil
; │┌ @ int.jl:53 within `+'
	leaq	1(%rdi,%rdi,2), %rdi
; │└
	cmoveq	%rcx, %rdi
; │ @ REPL[9]:5 within `collatz'
; │┌ @ int.jl:53 within `+'
	addq	$1, %rax
; │└
; │ @ REPL[9]:3 within `collatz'
; │┌ @ operators.jl:286 within `>'
; ││┌ @ int.jl:49 within `<'
	cmpq	$2, %rdi
; │└└
	jge	L16
; │ @ REPL[9]:7 within `collatz'
L45:
	retq
	nop
; └

@KristofferC
Copy link
Member

It works for code_llvm but not for code_native, it seems.

@KristofferC
Copy link
Member

code_typed doesnt seem to have any line annotations?

julia> @code_typed collatz(3)
CodeInfo(
1 ─       nothing::Nothing
2 ┄ %2  = φ (#1 => 0, #6 => %18)::Int64
│   %3  = φ (#1 => _2, #6 => %17)::Int64
│   %4  = (Base.slt_int)(1, %3)::Bool
└──       goto #7 if not %4
3 ─ %6  = (Base.checked_srem_int)(%3, 2)::Int64
│   %7  = (%6 === 0)::Bool
│   %8  = (Base.not_int)(%7)::Bool
│   %9  = (Base.not_int)(%8)::Bool
└──       goto #5 if not %9
4 ─ %11 = (Base.ashr_int)(%3, 0x0000000000000001)::Int64
│   %12 = (Base.shl_int)(%3, 0xffffffffffffffff)::Int64
│   %13 = (Base.ifelse)(true, %11, %12)::Int64
└──       goto #6
5 ─ %15 = (Base.mul_int)(3, %3)::Int64
└── %16 = (Base.add_int)(%15, 1)::Int64
6 ┄ %17 = φ (#4 => %13, #5 => %16)::Int64
│   %18 = (Base.add_int)(%2, 1)::Int64
└──       goto #2
7 ─       return %2
) => Int64

@KristofferC
Copy link
Member

Something seems wrong with the lines to the left? They are cut everywhere there are instructions.

screenshot 2018-11-30 at 10 38 56

@KristofferC
Copy link
Member

Wasn't the intention to default to not showing all this line info?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 30, 2018

@vtjnash does not want to do that even though there does seem to be fairly broad consensus behind it. Personally, my preferred state of affairs would be having three levels of debug info:

  1. none: just show the code
  2. some: show a bit of debug info but not all of it
  3. all: show all available debug info

Where some is the default. As I've stated above, I also feel that the debuginfo is not the best name, and that :source, :none are not especially clear as values for this option. However, I don't get the impression that feedback, discussion or consensus are welcomed here.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 30, 2018

@KristofferC It's intentional: LLVM really does not not want us printing on the same line (so that the output is valid code). We can hack around it as follows, but then it's not a valid file anymore, so on balance it doesn't seem to be worthwhile.

diff --git a/src/disasm.cpp b/src/disasm.cpp
index 418643d568..04f6465059 100644
--- a/src/disasm.cpp
+++ b/src/disasm.cpp
@@ -806,6 +806,7 @@ static void jl_dump_asm_internal(
     // start out with a raw stream, and create formatted stream from it here.
     // LLVM will desctruct the formatted stream, and we keep the raw stream.
     auto ustream = llvm::make_unique<formatted_raw_ostream>(rstream);
+    llvm::formatted_raw_ostream *private_ustream = ustream.get();
     Streamer.reset(TheTarget->createAsmStreamer(Ctx, std::move(ustream), /*asmverbose*/true,
                                                 /*useDwarfDirectory*/ true,
                                                 IP, CE, MAB, /*ShowInst*/ false));
@@ -860,7 +861,8 @@ static void jl_dump_asm_internal(
         uint64_t nextLineAddr = -1;
         DILineInfoTable::iterator di_lineIter = di_lineinfo.begin();
         DILineInfoTable::iterator di_lineEnd = di_lineinfo.end();
-        DILineInfoPrinter dbgctx{"; ", true};
+        DILineInfoPrinter dbgctx{"", true};
+        dbgctx.SetVerbosity(debuginfo);
         if (pass != 0) {
             if (di_ctx && di_lineIter != di_lineEnd) {
                 // Set up the line info
@@ -907,6 +909,7 @@ static void jl_dump_asm_internal(
                 if (symbol)
                     Streamer->EmitLabel(symbol);
             }
+            *private_ustream << dbgctx.inlining_indent("│");
 
             MCInst Inst;
             MCDisassembler::DecodeStatus S;

Wasn't the intention to default to not showing all this line info?

No. The intention of the PR was to improve formatting. If you want to change the defaults, make your own PR.

@maxbennedich
Copy link
Contributor

I came here trying to figure out why debuginfo is not working for code_native. Every time I do code_native, I find myself manually removing the comments, and frankly I'm getting quite tired of it. So I typed up this hacky macro, and I'm sharing it here if anyone else is in the same situation. It uses Suppressor.jl.

using Suppressor

macro no_comments(ex)
    quote
        full_output = split((@capture_out $(esc(ex))), '\n')
        without_comments = filter(s -> !startswith(s, ';'), full_output)
        foreach(println, without_comments)
    end
end

Before:

julia> code_native(x -> sqrt(x), (Float64,); syntax=:intel)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[6]:1 within `#17'
; │┌ @ math.jl:492 within `sqrt'
; ││┌ @ REPL[6]:1 within `<'
	push	rax
	vxorps	xmm1, xmm1, xmm1
	vucomisd	xmm1, xmm0
; ││└
	ja	L17
; ││ @ math.jl:493 within `sqrt'
	vsqrtsd	xmm0, xmm0, xmm0
; │└
	pop	rax
	ret
; │┌ @ math.jl:492 within `sqrt'
L17:
	movabs	rax, offset throw_complex_domainerror
	movabs	rdi, 4363115776
	call	rax
	ud2
	nop	dword ptr [rax]
; └└

After:

julia> @no_comments code_native(x -> sqrt(x), (Float64,); syntax=:intel)
	.section	__TEXT,__text,regular,pure_instructions
	push	rax
	vxorps	xmm1, xmm1, xmm1
	vucomisd	xmm1, xmm0
	ja	L17
	vsqrtsd	xmm0, xmm0, xmm0
	pop	rax
	ret
L17:
	movabs	rax, offset throw_complex_domainerror
	movabs	rdi, 4363115776
	call	rax
	ud2
	nop	dword ptr [rax]

¡A blissful sight!

@StefanKarpinski
Copy link
Member

@vtjnash: It's pretty ridiculous that we still cannot suppress all this junk in the output of @code_* macros. There's even debuginfo option but it doesn't do anything. Wah, wah, wah...

@vchuravy
Copy link
Member

This is fixed by #30223

@@ -122,11 +122,11 @@ calls and expand argument types automatically:
```julia-repl
julia> @code_llvm +(1,1)

; Function Attrs: sspreq
define i64 @"julia_+_130862"(i64, i64) #0 {
; @ int.jl:53 within `+'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for printing this as `+' instead of `+`? It messes up syntax highlighting sometimes when pasting output.

@vtjnash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just fix it to be paired backticks, which, imo looks better anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NewOptimizer] (Legacy) IR does not show line number information anymore
8 participants