Support CSV format for `get_info` and allow including any violation types #110

oboxodo · 2023-09-11T02:39:52Z

This PR refactors code from get_info to easily allow other violation types besides privacy and dependency. On top of that, it allows outputting per-pack data in CSV format.

That being said, the default output remains being exactly the same as before.

I think the CSV format is very valuable because it can easily be used for getting periodic progress reports and even transform it to other formats via external tools (JSON, markdown table, etc). I'm personally using it to generate a markdown table from it.

Example using default "detail" format and picking dependency and architecture violations:

$ bin/packs get_info -t dependency,architecture packs/foo packs/bar packs/baz
There are 58 total inbound dependency violations
There are 20 total inbound architecture violations
There are 171 total outbound dependency violations
There are 149 total outbound architecture violations

=========== Info about: packs/bar
Owned by: No one
Size: 47 ruby files
Public API: packs/bar/app/public
There are 17 inbound dependency violations
There are 0 inbound architecture violations
There are 117 outbound dependency violations
There are 117 outbound architecture violations

=========== Info about: packs/baz
Owned by: No one
Size: 18 ruby files
Public API: packs/baz/app/public
There are 31 inbound dependency violations
There are 10 inbound architecture violations
There are 50 outbound dependency violations
There are 32 outbound architecture violations

=========== Info about: packs/foo
Owned by: No one
Size: 9 ruby files
Public API: packs/foo/app/public
There are 10 inbound dependency violations
There are 10 inbound architecture violations
There are 4 outbound dependency violations
There are 0 outbound architecture violations

Same data, but using CSV format:

$ bin/packs get_info -f csv -t dependency,architecture packs/foo packs/bar packs/baz
Pack name,Owned by,Size,Public API,Inbound dependency violations,Inbound architecture violations,Outbound dependency violations,Outbound architecture violations
packs/bar,No one,47,packs/bar/app/public,17,0,117,117
packs/baz,No one,18,packs/baz/app/public,31,10,50,32
packs/foo,No one,9,packs/foo/app/public,10,10,4,0

professor

Thank you for making this PR.

I apologize for the delay in getting back to you on it. I would love to see this PR merged into the code base. I have one stylistic concern. Other than that, this looks great.

professor · 2023-10-11T18:32:55Z

lib/packs/private.rb

-      puts "There are #{all_inbound.select(&:privacy?).sum { |v| v.files.count }} total inbound privacy violations"
-      puts "There are #{all_inbound.select(&:dependency?).sum { |v| v.files.count }} total inbound dependency violations"
-      puts "There are #{all_outbound.select(&:privacy?).sum { |v| v.files.count }} total outbound privacy violations"
-      puts "There are #{all_outbound.select(&:dependency?).sum { |v| v.files.count }} total outbound dependency violations"


I find the original code with the 4 puts easier to read and understand. I understand that there's duplication. I would prefer a small amount of duplication over an abstraction layer that makes it harder for new-to-the-code-base developer to understand.

If you agree, that would mean that we don't need these above:

violations = { inbound: {}, outbound: {} } directions = violations.keys dir_x_types = directions.product(types)

And we could simply the printing below.

Otherwise, I'm happy to merge this PR.

Thanks for the review!

The problem with that is that then we can't easily support new violation types. Or even allow the user to select which ones they want from the CLI.

In my case, I'm currently interested in architecture violations and not dependency nor privacy for the time being.

An intermediate solution would be to have duplicate lines one for inbound and one for outbound, but still loop through a list of violation types. Would you be ok with that?

How about now, @professor? I removed the directions and dir_x_types variables and the looping over directions. I can't really remove the looping over types as that's one of the main purposes of this PR.

Thanks for the update to the PR.

When I look at the original code, I would find something like [1] easier to read than [2]

# [1] puts "There are #{all_inbound.select(&:privacy?).sum { |v| v.files.count }} total inbound privacy violations" if types.include?(:privacy) puts "There are #{all_inbound.select(&:dependency?).sum { |v| v.files.count }} total inbound dependency violations" if types.include?(:dependency) puts "There are #{all_inbound.select(&: architecture?).sum { |v| v.files.count }} total inbound dependency violations" if types.include?(: architecture)

# [2] types.each do |type| key = ['inbound', type, 'violations'].join('_').to_sym row[key] = (violations[:inbound][pack.name] || []).select { _1.type.to_sym == type }.sum { |v| v.files.count } end ... types.each do |type| key = ['inbound', type, 'violations'].join('_').to_sym puts "There are #{row[key]} inbound #{type} violations" end

[1] will allow the user to select which types they want displayed. Maybe I'm missing something else with the change. Thanks again @oboxodo for improving the code with this feature.

Thanks for the feedback, @professor. Here are my thoughts:

Yes, [1] is more readable than [2]. But [1] is hardcoded to only support privacy, dependency, and architecture. IIRC I've seen visibility checks too in packwerk and then that code would need to be changed in order to support that (or any other future check FWIW). [2] has the benefit of not needing to be updated because it supports any checks that could come in the future (as long as they support the same API as the current ones which seems to be the goal).

On top of that, you're showing [2] as two separate loops but those loops are there for different reasons. The goal of the first loop is to collect the violations regardless of how they're going to be reported. I introduced that row hash as an intermediate format-independent way to collect the data, then there's separate code to interpret that row and convert it into either the csv format or the detail (original) format.

I've actually thought a follow-up PR could make this separation more explicit and then the current method would be left only with the pure data-collection code and then delegate outputting to different methods depending on the selected format.

I'd really like to avoid hardcoding anything related to the violation types. Thanks to this, this was the only required change in order to support architecture violations. Adding visibility or any other future extension should be as simple as that.

Just for the record... all of that reasoning is in line with the work I did on my first PR to this project in #104 where I also removed some hardcoded call to dependency? with a more dynamic check, allowing support for both privacy and architecture.

…ange)

Including the date as part of the report makes it easy to store the output as snapshots over time to compare progress.

This is to satisfy some feedback I got on the PR.

professor

Thanks for the code!

oboxodo · 2023-10-13T20:51:31Z

Thanks for merging! 🎉

professor reviewed Oct 11, 2023

View reviewed changes

oboxodo added 6 commits October 12, 2023 12:45

Refactor to separate stats-collection from output (no visible change)

33be05e

Use data instead of methods to distinguish violation type

7d6ca58

Use hashes and arrays allow easy support for new types (no visible ch…

490dd31

…ange)

Add support for reporting packs info in CSV format

b876ba6

Include architecture violations on get_info

8c874fe

Allow specifying what types of violations to output on get_info

7c24364

oboxodo force-pushed the support_csv_format_for_get_info branch from 670ecfb to bab0310 Compare October 12, 2023 16:00

oboxodo added 2 commits October 12, 2023 13:01

Allow including a Date in the get_info report (useful for snapshots)

9f419a5

Including the date as part of the report makes it easy to store the output as snapshots over time to compare progress.

Prefer some more code readability over DRYness

3431f36

This is to satisfy some feedback I got on the PR.

oboxodo force-pushed the support_csv_format_for_get_info branch from bab0310 to 3431f36 Compare October 12, 2023 16:01

professor approved these changes Oct 13, 2023

View reviewed changes

professor merged commit 43d7a91 into rubyatscale:main Oct 13, 2023
7 checks passed

oboxodo deleted the support_csv_format_for_get_info branch October 13, 2023 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CSV format for `get_info` and allow including any violation types #110

Support CSV format for `get_info` and allow including any violation types #110

oboxodo commented Sep 11, 2023 •

edited

Loading

professor left a comment

professor Oct 11, 2023

oboxodo Oct 11, 2023

oboxodo Oct 12, 2023

professor Oct 13, 2023

oboxodo Oct 13, 2023

oboxodo Oct 13, 2023

professor left a comment

oboxodo commented Oct 13, 2023

Support CSV format for get_info and allow including any violation types #110

Support CSV format for get_info and allow including any violation types #110

Conversation

oboxodo commented Sep 11, 2023 • edited Loading

professor left a comment

Choose a reason for hiding this comment

professor Oct 11, 2023

Choose a reason for hiding this comment

oboxodo Oct 11, 2023

Choose a reason for hiding this comment

oboxodo Oct 12, 2023

Choose a reason for hiding this comment

professor Oct 13, 2023

Choose a reason for hiding this comment

oboxodo Oct 13, 2023

Choose a reason for hiding this comment

oboxodo Oct 13, 2023

Choose a reason for hiding this comment

professor left a comment

Choose a reason for hiding this comment

oboxodo commented Oct 13, 2023

Support CSV format for `get_info` and allow including any violation types #110

Support CSV format for `get_info` and allow including any violation types #110

oboxodo commented Sep 11, 2023 •

edited

Loading