Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cyclomatic comeplxity threshold #61

Closed
wants to merge 4 commits into from

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented May 9, 2017

Reduced from 19 to 16.

Copy link
Collaborator

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise LGTM 😍

@@ -12,7 +12,9 @@ import (
"time"

"github.com/Sirupsen/logrus"
"github.com/docker/cli/cli/compose/loader"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum although I like the fact we re-use this code, I would have extract it from there (because now container package depends on the compose package). But this is a nit 👼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree , I was thinking about that too. Currently it outputs a compose type, so it would possibly require a larger change to create some shared type, and then translate it to a compose type later.

I think it's inevitable there is going to be sharing between compose and other cli components. I think we use service functions in convert right now

@aaronlehmann
Copy link
Contributor

I've found gocyclo to not be very useful in the past. Personally, I don't like mechanically enforcing these limits.

@dnephin
Copy link
Contributor Author

dnephin commented May 10, 2017

I find gocyclo to be an extremely value code metric. Most functions which have high cyclomatic complexity are not very readable and would benefit from refactoring into smaller functions. The only exception I've found to that rule are large switch/case statements where each case is a single function call.

For those rare cases where there is value in a longer function we can whitelist the function.

@aaronlehmann
Copy link
Contributor

Yes, the false positives with long switch statements caused problems for me in the past. Do you think gocyclo adds enough value to justify needing to whitelist each one of these?

I agree there may be high correlation between an unreadable function and one where gocyclo returns a high score, but if a function is unreadable, we shouldn't need a metric to realize that during the code review process.

@dnephin
Copy link
Contributor Author

dnephin commented May 10, 2017

Do you think gocyclo adds enough value to justify needing to whitelist each one of these?

Yes I do, I think it's extremely valuable. No new PRs should really add to the whitelist. If a change puts a function over the threshold that's a good signal that the function needs to be refactored. Adding to the whitelist should only be allowed in very rare situations. Existing code is white listed for now, but these whitelist comments should really be treated as // TODO: fixme comments. I've addressed some of these already, but I didn't want to put too much into a single PR. I will continue to work on removing the need for the current whitelisted functions, and I hope that other contributors will help.

we shouldn't need a metric to realize that during the code review process.

Unfortunately that doesn't seem to be the case. We've had the same PR review policy for a long time, and yet we have many functions that are way too complex.

I think one reason for that might be when you look at a PR you only really see a diff. The diff might only show you that you're adding a few lines, maybe 1 or 2 new conditions to a function. But over time 4 or 5 of those PRs results in an overloaded function. That's harder for a human to see.

Readable is also very subjective. I'm sure most functions are readable if you study them for long enough. It's nice to have a more objective number.

@thaJeztah
Copy link
Member

I think one reason for that might be when you look at a PR you only really see a diff. The diff might only show you that you're adding a few lines, maybe 1 or 2 new conditions to a function. But over time 4 or 5 of those PRs results in an overloaded function. That's harder for a human to see.

I agree that's a plausible explanation why complexity gets into the source code 👍

@aaronlehmann
Copy link
Contributor

I think one reason for that might be when you look at a PR you only really see a diff. The diff might only show you that you're adding a few lines, maybe 1 or 2 new conditions to a function. But over time 4 or 5 of those PRs results in an overloaded function. That's harder for a human to see.

This was another problem I ran into with gocyclo in the past. Occasionally there would be a bug fix that needed a conditional to be added which put the function over the complexity threshold. Then a 3-line fix would become a big refactor, and it was much harder to have confidence in the fix and backport it.

I'm still very much against mechanically enforcing this. I think it's fine to emit a list of potential refactoring candidates for human review. But failing CI because a function has too many conditionals is going too far.

@dnephin
Copy link
Contributor Author

dnephin commented May 11, 2017

I think what you see as a problem I see as a benefit :)

Occasionally there would be a bug fix that needed a conditional to be added which put the function over the complexity threshold.

If the complexity of that function wasn't already at the threshold maybe that bug would have been discovered earlier, because bugs are more noticeable in less complex code. So instead of requiring a critical fix after a release it would have already been fixed before the release!

Then a 3-line fix would become a big refactor, and it was much harder to have confidence in the fix and backport it.

If you aren't confident in a change it's a sign it doesn't have adequate testing. I don't think a few extra lines in a diff should significantly impact confidence in a change.

But failing CI because a function has too many conditionals is going too far.

Warnings are not enough. It should be up to the contributor who is adding overly complex code to fix it, not up to someone else to come by later on and clean things up.

We already fail CI for lots of reasons including test failures, gofmt/golint/vet failures. Why would gocyclo be any different? They all signal the same thing: problems with the code.

There are sometimes going to be exceptional circumstances that require a test to be disabled temporarily, or a complex function to be whitelisted. That's fine, we have a mechanism to make those temporary exceptions. But the standard should be to enforce good quality contributions.

@stevvooe
Copy link
Contributor

I would urge extreme caution in enabling gocyclo as a requirement for submissions. The refactoring required to satisfy it aren't always in the best interest of readability, often forcing code to be arbitrarily broken down. Even worse, such changes could introduce new bugs in working code.

And, from what I can tell, cyclomatic complexity has largely been debunked as an indicator of defects:

Thus, reducing the cyclomatic complexity of code is not proven to reduce the number of errors or bugs in that code.

Indeed, we do run several checks as part of CI, but we should be cautious in ensuring that such checks provide value and don't contribute to a higher defect rate. golint, go vet, gofmt are worth their cost, but gocyclo, not so much.

@codecov-io
Copy link

codecov-io commented May 23, 2017

Codecov Report

Merging #61 into master will increase coverage by 0.16%.
The diff coverage is 50.28%.

@@            Coverage Diff             @@
##           master      #61      +/-   ##
==========================================
+ Coverage   46.14%   46.31%   +0.16%     
==========================================
  Files         193      193              
  Lines       16073    16076       +3     
==========================================
+ Hits         7417     7445      +28     
+ Misses       8269     8241      -28     
- Partials      387      390       +3

@vdemeester vdemeester mentioned this pull request May 26, 2017
20 tasks
@thaJeztah
Copy link
Member

@stevvooe @aaronlehmann I see the discussion here is mainly about making the cyclomatic complexity check a requirement. Separate from that discussion, do the changes in this PR look good to you?

@stevvooe
Copy link
Contributor

@thaJeztah No, this PR is proposing changes to satisfy the constraints that are laid out by the cyclo check. Because the checks performed by gocyclo are roughly junk science, changes performed in pursuit satisfying gocyclo should be considered invalid.

Specifically, in this case, this PR is actually exporting a function, ParseVolume, that was previously confined to a particular part of the code. Given the sensitivity of this particular functionality, we need to take care that this ParseVolume is used in the correct context. In particular, it is disconcerting that the command line now has a dependency on compose for volume parsing. It is also worrisome that volumeSplitN has been completely deleted, which, if I recall correctly, has some magic to deal with certain edge cases.

@dnephin
Copy link
Contributor Author

dnephin commented May 30, 2017

this PR is proposing changes to satisfy the constraints that are laid out by the cyclo check.

These are changes I would have made anyway. Complex functions are code debt.

ParseVolume, that was previously confined to a particular part of the code

It was always our intent to have a single piece of code for parsing volumes from the client. This was just a good opportunity to accomplish that goal.

It is also worrisome that volumeSplitN has been completely deleted, which, if I recall correctly, has some magic to deal with certain edge cases

Any special cases that apply to the flag would also apply to the Compose file parsing. The logic in both cases should be identical, and it was a bug that they were not using the same code. Unfortunately the test cases for splitVolumeN() tested a bunch of cases that are not actually valid volume specs, so I wasn't able to preserve the test (although I did run it to confirm that relevant cases did still pass with the new code).

@stevvooe
Copy link
Contributor

@dnephin How this PR proposes merging the ParseVolume functionality?

Unfortunately the test cases for splitVolumeN() tested a bunch of cases that are not actually valid volume specs, so I wasn't able to preserve the test

This is what has me concerned. The definition of a "valid volume spec" is what is supported by volumeSplitN.

@cpuguy83
Copy link
Collaborator

Unfortunately the test cases for splitVolumeN() tested a bunch of cases that are not actually valid volume specs

Can you point one out as example? A quick look and I'm not really finding anything right now.

@dnephin
Copy link
Contributor Author

dnephin commented May 30, 2017

The definition of a "valid volume spec" is what is supported by volumeSplitN.

I can see why someone would make that assumption, but I don't believe that's the case. The official parsing of a volume spec is handled by the server, and depends on the platform. The daemon parses the volume spec.

The volumeSplitN() code (as the name suggests) was a partial parsing of platform-agnostic volume spec to determine if the the volume was of type bind or type volume. It never handled options for either type.

The test cases included cases like '{/foo:/bar:ro, 2, []string{/foo, /bar:ro}}' which is not a valid parsing of a volume on any platform. (cc @cpuguy83 this is the main one that stands out now. Other cases were more difficult to reproduce because they were incomplete specs. I'll try again to preserve more of these tests by removing default values from the parsed spec)

I understand your concern. This parsing is error prone and has lots of strange special cases. ParseVolume() has its own test cases which I believe covers all of those cases.

I'll see about restoring more of the previous test cases.

Edit: other cases are things like mode rW , which is parsed to read_only: false. There's no way to restore the case, but these values are case insensitive, and also ignored by the code which called volumeSplitN().

@cpuguy83
Copy link
Collaborator

@dnephin The issue here is it is not parsing a volume it's deciding how to apply the spec, e.g. is it a volume or is it a bind.
The example you gave is the exact thing that would be used with docker run -v /foo:/bar:ro

@dnephin
Copy link
Contributor Author

dnephin commented May 30, 2017

Ok, I've found a much better way of preserving the test cases. Instead of trying to recreate the output of volumeSplitN() (which was completely ignored except for the length of the array) I'm setting the expected value based on the length of the array.

It appears that the term "bind" here is not exactly the same as other places. Any volume with a source was being added as a "bind". I've updated the caller to check parsed.Source != "" instead of parsed.Type == "bind".

I've restored the old test cases, and they pass without any modification to the test cases.

@dnephin
Copy link
Contributor Author

dnephin commented May 30, 2017

The example you gave is the exact thing that would be used with docker run -v /foo:/bar:ro

It's only used to check the length of the array. It was effectively the same as returning []string{"", ""}. I found a way to preserve the old test cases.

@stevvooe
Copy link
Contributor

I've updated the caller to check parsed.Source != "" instead of parsed.Type == "bind".

This check doesn't seem quite right, either. The specs are passed in Binds if they are either a named volume mount or a bind mount. The type "bind" mounts are used to declare mounts that are mounted from the host filesystem. In that regard, there is no relation to between the type "bind" and what goes into the Binds field, because the Binds field accepts non-Bind mounts.

I guess, they would be of type "volume" or "bind" with a defined source.

@dnephin
Copy link
Contributor Author

dnephin commented May 30, 2017

Yes, that sounds right, and the code should reflect that now

@dnephin
Copy link
Contributor Author

dnephin commented Jun 5, 2017

I've moved the volume spec parsing change to #152 so that change can be discussed in isolation. That change is no longer part of this PR.

@aaronlehmann
Copy link
Contributor

I still think enforcing a more strict gocylo threshold is going in the wrong direction. I think we should remove the cyclomatic complexity check completely.

@dnephin
Copy link
Contributor Author

dnephin commented Jun 6, 2017

Yes, I understand that. I still think that this check is valuable. 16 is not really "strict", it's actually well above what is considered an acceptable level of cyclomatic complexity (10).

To summarize my arguments for enforcing this check:

gocyclo linting ensures that no function is "more complex" than our threshold unless we allow it explicitly by whitelisting it. Over time this means that we have smaller functions, and small code changes don't continually increase the size of functions without anyone noticing.

It is true that cyclomatic complexity is not a perfect measure of complexity, but it is the best available proxy, and we have a mechanism for white listing exceptions.

I'm not really seeing the downside to enforcing this lint check. I'm going to try to summarize some of the arguments to better understand your concerns.

  • the false positives with long switch statements caused problems for me in the past (addressed by the whitelist, also can be converted to maps, which is arguably more readable).
  • we shouldn't need a metric to realize [a function is unreadable] during the code review process. (maybe, although there is plenty of code that is overly complex that has made it through code review, so maybe not. Also maybe PRs make it difficult to see.)
  • Occasionally there would be a bug fix that needed a conditional to be added which put the function over the complexity threshold. Then a 3-line fix would become a big refactor, and it was much harder to have confidence in the fix and backport it. (At some point we need to break up that function. If the bug fix isn't the right time to do so we can whitelist the function and open an issue to fix it later).
  • The refactoring required to satisfy it aren't always in the best interest of readability (Again, if we truely believe that a function must remain over the limit, we have the ability to whitelist it. These cases are extremely rare. It is almost always better to split things into smaller functions)

I believe many of these concerns can be addressed by white listing a function. I think we should at the very least try this with a more reasonable limit (16). If we find that after a few months that the hassle of adding a comment to some functions outweights the value, we can always increase the limit, or even disable it.

@dnephin dnephin force-pushed the more-strict-lint branch 3 times, most recently from 8b73299 to c91b3ab Compare June 13, 2017 19:11
@dnephin dnephin force-pushed the more-strict-lint branch 4 times, most recently from d3f89d7 to f68d98e Compare June 21, 2017 15:48
@dnephin dnephin force-pushed the more-strict-lint branch 2 times, most recently from 9e701f6 to f291f7a Compare June 29, 2017 21:54
@dnephin dnephin force-pushed the more-strict-lint branch 2 times, most recently from 5a57da9 to afecf91 Compare July 11, 2017 18:45
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Improve test coverage
Reduce cyclo to 16

Signed-off-by: Daniel Nephin <dnephin@docker.com>
@dnephin dnephin closed this Sep 26, 2017
@dnephin dnephin deleted the more-strict-lint branch September 26, 2017 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants