Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove potential Go module versions from shortened names #571

Merged
merged 6 commits into from
Oct 16, 2020

Conversation

zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Oct 9, 2020

Fixes #515.

Remove potential module path versions (v2, v3, v4, etc) from the input string before extracting a shortened name. This makes it easier to tell which packages are which if the versions happen to match.

An example similar to my report:

image

@google-cla google-cla bot added the cla: yes label Oct 9, 2020
@codecov-io
Copy link

codecov-io commented Oct 9, 2020

Codecov Report

Merging #571 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
- Coverage   67.14%   67.13%   -0.01%     
==========================================
  Files          78       78              
  Lines       14072    14074       +2     
==========================================
  Hits         9449     9449              
- Misses       3788     3789       +1     
- Partials      835      836       +1     
Impacted Files Coverage Δ
internal/graph/graph.go 28.03% <100.00%> (+0.15%) ⬆️
.../github.com/google/pprof/internal/report/source.go 80.76% <0.00%> (-0.65%) ⬇️
...rc/github.com/google/pprof/internal/graph/graph.go 28.03% <0.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67992a1...b05cf5f. Read the comment docs.

Copy link
Contributor

@nolanmar511 nolanmar511 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR!

@@ -34,6 +34,8 @@ var (
// Removes package name and method arugments for Go function names.
// See tests for examples.
goRegExp = regexp.MustCompile(`^(?:[\w\-\.]+\/)+(.+)`)
// Checks for a package name that could be a module version.
goVerRegExp = regexp.MustCompile(`^v[2-9]+\.`)
Copy link
Contributor

@nolanmar511 nolanmar511 Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a regex like (?:^(?:[\w\-\.]+\/)+((?:[\w\-\.]+\/v[0-9]+)(?:\.[^.\n]+){2})$)|(?:^(?:[\w\-\.]+\/)+(.+)) could be used for goRegExp rather than adding goVerRegExp. Though, I think @aalexand should make the call as to whether we'd prefer to add code or add a more complex regexp.

Otherwise, I think goVerRegExp would miss "v14" and "v10". Perhaps ^v[0-9]+\. or ^v[2-9][0-9]+\. would match better?

Copy link
Contributor Author

@zikaeroh zikaeroh Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right, I brainfarted on that one. Should have been ^v([2-9]|[1-9][0-9]+)\..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it to a corrected regex, but if there's some more complicated one that works better it can be used. I know my brain shuts down trying to read that long one... 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key thing is to allow v2, v3, v10, v1234, etc, but not v0 or v1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:^(?:[\w\-\.]+\/)+((?:[\w\-\.]+\/v(?:[2-9]|[1-9][0-9]+)+)(?:\.[^.\n]+){2})$)|(?:^(?:[\w\-\.]+\/)+(.+))

Is the long one with the restricted version select; that'd be a one line change and it appears to pass the tests. I'm happy to use it instead.

@@ -451,6 +451,18 @@ func TestShortenFunctionName(t *testing.T) {
"github.com/blah/blah/vendor/gopkg.in/redis.v3.(*baseClient).(github.com/blah/blah/vendor/gopkg.in/redis.v3.process)-fm",
"redis.v3.(*baseClient).(github.com/blah/blah/vendor/gopkg.in/redis.v3.process)-fm",
},
{
"github.com/jackc/pgx/v4.(*Conn).Query",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider using more abstract test case names? (To be in keeping with the style of existing tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

end = idx
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove this line break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you want. I view the above block as calculating end, so that was my thinking.

@nolanmar511
Copy link
Contributor

@aalexand -- Did some initial review; wanted to know your thoughts on the approach; specifically if it makes sense to try to use a single regexp or if adding an additional function (as is done here) is the right approach.

@@ -451,6 +451,30 @@ func TestShortenFunctionName(t *testing.T) {
"github.com/blah/blah/vendor/gopkg.in/redis.v3.(*baseClient).(github.com/blah/blah/vendor/gopkg.in/redis.v3.process)-fm",
"redis.v3.(*baseClient).(github.com/blah/blah/vendor/gopkg.in/redis.v3.process)-fm",
},
{
"github.com/foo/bar/v4.(*Foo).Bar",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - why is version string sometimes a separate subdirectory and sometimes a prefix of the package name? Is this something that the package owners choose? Are these options restricted at these two, or are there more?

Oh, I guess it's a function of how deep below the versioning level the actual symbol is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're referring to the tests; some of the tests I've added are where the "version" isn't a version at all. The only valid version in paths are "v2", "v3", ... "v1234", etc. So you'd have github.com/foo/bar, github.com/foo/bar/v2, github.com/foo/bar/v3, and so on, then subpackages like github.com/foo/bar/v3/baz. Custom domains mean you can have things like gotest.tools/assert, gotest.tools/v3/assert. But a package can be at the level where the version appears, so when github.com/jackc/pgx was bumped to github.com/jackc/pgx/v4, it's still referred to as pgx in the code.

But if it isn't a valid version part, then I don't want to treat it as one naively (i.e. "something.com/hello/v123xyz" isn't versioned, "something.com/hello/v123/xyz" is because the version is its own element).

return name
}

// The shortened name could start with a module version (like "v2"). Go back one slash.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep comments in 80 columns please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

return strings.Join(matches[1:], "")
name := strings.Join(matches[1:], "")
if re == goRegExp {
return shortenGoFunc(f, name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it might be simpler to first remove the version substring from the name, and then handle it just like before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you saw the previous review comments, but if preferred this all can be removed and replaced with a single regex change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it might be simpler to first remove the version substring from the name, and then handle it just like before.

I'm not sure how this is possible; the name here is extracted from the regex directly. If we remove the version suffix, you get the empty string.

Copy link
Contributor Author

@zikaeroh zikaeroh Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, you mean that if the name matches a version, remove the suffix from the whole path and then try again. It wouldn't distinguish two versions of the same module, but I guess it's no worse than any other name aliasing within the same graph. Would be short; I can do that if preferred.

Copy link
Contributor Author

@zikaeroh zikaeroh Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misinterpreted again (sorry!), so I'll wait for clarification.

I think you meant just starting with something like github.com/jackc/pgx/v4/foo.bar, then replacing the first instance of /v4/ with /, then running the regex again. Not quite a suffix, but functional enough. This is all heuristics after all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, /v[1-9][0-9]*[./].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And not all occurrences but at most one occurrence (assuming there can't be two version substrings in the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All instances wouldn't work, because it's legal for me to write github.com/foo/bar/v4/something/v8 or similar. First instance I believe would work as intended, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try. Would be v([2-9]|[1-9][0-9]+)\., though, as v0 and v1 don't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and retitled to match the new fix. regexp doesn't have a nice "just replace once", so I used the replace all with two capture groups method.

@zikaeroh zikaeroh changed the title Expand shortened name if package name appears to be a module version Remove potential Go module versions from shortened names Oct 16, 2020
aalexand
aalexand previously approved these changes Oct 16, 2020
@zikaeroh
Copy link
Contributor Author

I just realized I didn't add a test for the multi-version case; I can add one quick if that's alright.

@zikaeroh
Copy link
Contributor Author

zikaeroh commented Oct 16, 2020

Ah, yeah, my replacement was overzealous. github.com/foo/bar/v4/foo/bar/v4.(*Foo).Bar is converted to bar.(*Foo).Bar when it shouldn't be. How about a method similar to the one @nolanmar511 suggested with a modified Go regex?

I'm not sure off the top of my head how to construct my regex to capture as little text on the left side as possible.

@zikaeroh
Copy link
Contributor Author

zikaeroh commented Oct 16, 2020

Doh; one character. (.*?)

Sorry to dismiss the review.

@aalexand aalexand merged commit 8ef5528 into google:master Oct 16, 2020
giordano added a commit to JuliaPackaging/Yggdrasil that referenced this pull request Nov 13, 2020
* Update pprof to latest revision

Bump from 20191205061153 => 20201109224723

My personal interest is to pull in google/pprof#564, which adds support for displaying names with `"` in them, which julia functions sometimes have (e.g. `var"#foo#23"`)

Includes:
- google/pprof#564
- google/pprof#575
- google/pprof#574
- google/pprof#571
- google/pprof#572
- google/pprof#570
- google/pprof#562
- google/pprof#561
- google/pprof#565
- google/pprof#560
- google/pprof#563
- google/pprof#557
- google/pprof#554
- google/pprof#552
- google/pprof#545
- google/pprof#549
- google/pprof#547
- google/pprof#541
- google/pprof#534
- google/pprof#542
- google/pprof#535
- google/pprof#531
- google/pprof#530
- google/pprof#528
- google/pprof#522
- google/pprof#525
- google/pprof#527
- google/pprof#519
- google/pprof#520
- google/pprof#517
- google/pprof#518
- google/pprof#514
- google/pprof#513
- google/pprof#510
- google/pprof#508
- google/pprof#506
- google/pprof#509
- google/pprof#504

* Update P/pprof/build_tarballs.jl - use a real version number

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>

* Remove now unused `timestamp`

* [pprof] Use `GitSource`

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
gmarin13 pushed a commit to gmarin13/pprof that referenced this pull request Dec 17, 2020
* Expand shortened name if package name appears to be a module version

* Correct version regexp, use more generic names in tests, remove an empty line

* 80 character columns

* Remove first matching Go module version from path

* Test and fix multi-version case

Co-authored-by: Maggie Nolan <nolanmar@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Go module version suffix shown in graphs instead of package name
4 participants