Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CSV format for get_info and allow including any violation types #110

Merged
merged 8 commits into from
Oct 13, 2023

Conversation

oboxodo
Copy link
Contributor

@oboxodo oboxodo commented Sep 11, 2023

This PR refactors code from get_info to easily allow other violation types besides privacy and dependency. On top of that, it allows outputting per-pack data in CSV format.

That being said, the default output remains being exactly the same as before.

I think the CSV format is very valuable because it can easily be used for getting periodic progress reports and even transform it to other formats via external tools (JSON, markdown table, etc). I'm personally using it to generate a markdown table from it.

Example using default "detail" format and picking dependency and architecture violations:

$ bin/packs get_info -t dependency,architecture packs/foo packs/bar packs/baz
There are 58 total inbound dependency violations
There are 20 total inbound architecture violations
There are 171 total outbound dependency violations
There are 149 total outbound architecture violations

=========== Info about: packs/bar
Owned by: No one
Size: 47 ruby files
Public API: packs/bar/app/public
There are 17 inbound dependency violations
There are 0 inbound architecture violations
There are 117 outbound dependency violations
There are 117 outbound architecture violations

=========== Info about: packs/baz
Owned by: No one
Size: 18 ruby files
Public API: packs/baz/app/public
There are 31 inbound dependency violations
There are 10 inbound architecture violations
There are 50 outbound dependency violations
There are 32 outbound architecture violations

=========== Info about: packs/foo
Owned by: No one
Size: 9 ruby files
Public API: packs/foo/app/public
There are 10 inbound dependency violations
There are 10 inbound architecture violations
There are 4 outbound dependency violations
There are 0 outbound architecture violations

Same data, but using CSV format:

$ bin/packs get_info -f csv -t dependency,architecture packs/foo packs/bar packs/baz
Pack name,Owned by,Size,Public API,Inbound dependency violations,Inbound architecture violations,Outbound dependency violations,Outbound architecture violations
packs/bar,No one,47,packs/bar/app/public,17,0,117,117
packs/baz,No one,18,packs/baz/app/public,31,10,50,32
packs/foo,No one,9,packs/foo/app/public,10,10,4,0

Copy link
Contributor

@professor professor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this PR.

I apologize for the delay in getting back to you on it. I would love to see this PR merged into the code base. I have one stylistic concern. Other than that, this looks great.

puts "There are #{all_inbound.select(&:privacy?).sum { |v| v.files.count }} total inbound privacy violations"
puts "There are #{all_inbound.select(&:dependency?).sum { |v| v.files.count }} total inbound dependency violations"
puts "There are #{all_outbound.select(&:privacy?).sum { |v| v.files.count }} total outbound privacy violations"
puts "There are #{all_outbound.select(&:dependency?).sum { |v| v.files.count }} total outbound dependency violations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the original code with the 4 puts easier to read and understand. I understand that there's duplication. I would prefer a small amount of duplication over an abstraction layer that makes it harder for new-to-the-code-base developer to understand.

If you agree, that would mean that we don't need these above:

      violations = {
        inbound: {},
        outbound: {}
      }
      directions = violations.keys
      dir_x_types = directions.product(types)

And we could simply the printing below.

Otherwise, I'm happy to merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

The problem with that is that then we can't easily support new violation types. Or even allow the user to select which ones they want from the CLI.

In my case, I'm currently interested in architecture violations and not dependency nor privacy for the time being.

An intermediate solution would be to have duplicate lines one for inbound and one for outbound, but still loop through a list of violation types. Would you be ok with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now, @professor? I removed the directions and dir_x_types variables and the looping over directions. I can't really remove the looping over types as that's one of the main purposes of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update to the PR.

When I look at the original code, I would find something like [1] easier to read than [2]

# [1]
puts "There are #{all_inbound.select(&:privacy?).sum { |v| v.files.count }} total inbound privacy violations" if types.include?(:privacy)
puts "There are #{all_inbound.select(&:dependency?).sum { |v| v.files.count }} total inbound dependency violations" if types.include?(:dependency)
puts "There are #{all_inbound.select(&: architecture?).sum { |v| v.files.count }} total inbound dependency violations" if types.include?(: architecture)
# [2]
types.each do |type|
  key = ['inbound', type, 'violations'].join('_').to_sym
  row[key] = (violations[:inbound][pack.name] || []).select { _1.type.to_sym == type }.sum { |v| v.files.count }
end
...
types.each do |type|
    key = ['inbound', type, 'violations'].join('_').to_sym
    puts "There are #{row[key]} inbound #{type} violations"
end

[1] will allow the user to select which types they want displayed. Maybe I'm missing something else with the change. Thanks again @oboxodo for improving the code with this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @professor. Here are my thoughts:

Yes, [1] is more readable than [2]. But [1] is hardcoded to only support privacy, dependency, and architecture. IIRC I've seen visibility checks too in packwerk and then that code would need to be changed in order to support that (or any other future check FWIW). [2] has the benefit of not needing to be updated because it supports any checks that could come in the future (as long as they support the same API as the current ones which seems to be the goal).

On top of that, you're showing [2] as two separate loops but those loops are there for different reasons. The goal of the first loop is to collect the violations regardless of how they're going to be reported. I introduced that row hash as an intermediate format-independent way to collect the data, then there's separate code to interpret that row and convert it into either the csv format or the detail (original) format.

I've actually thought a follow-up PR could make this separation more explicit and then the current method would be left only with the pure data-collection code and then delegate outputting to different methods depending on the selected format.

I'd really like to avoid hardcoding anything related to the violation types. Thanks to this, this was the only required change in order to support architecture violations. Adding visibility or any other future extension should be as simple as that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record... all of that reasoning is in line with the work I did on my first PR to this project in #104 where I also removed some hardcoded call to dependency? with a more dynamic check, allowing support for both privacy and architecture.

@oboxodo oboxodo force-pushed the support_csv_format_for_get_info branch from 670ecfb to bab0310 Compare October 12, 2023 16:00
Including the date as part of the report makes it easy to store the output as snapshots over time to compare progress.
This is to satisfy some feedback I got on the PR.
@oboxodo oboxodo force-pushed the support_csv_format_for_get_info branch from bab0310 to 3431f36 Compare October 12, 2023 16:01
Copy link
Contributor

@professor professor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code!

@professor professor merged commit 43d7a91 into rubyatscale:main Oct 13, 2023
7 checks passed
@oboxodo oboxodo deleted the support_csv_format_for_get_info branch October 13, 2023 20:34
@oboxodo
Copy link
Contributor Author

oboxodo commented Oct 13, 2023

Thanks for merging! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants