Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support teams, team repositories, and team members #165

Merged
merged 18 commits into from
Jun 16, 2022

Conversation

japborst
Copy link
Contributor

@japborst japborst commented Jun 5, 2022

My first go and first steampipe contribution, so bear with me 😄

Changes
Added 3 new tables:

  • github_team
  • github_team_members
  • github_team_repositories

Considerations
I feel there's quite some duplicate code already present in the repo on how the list results are handled, but I feel that's outside the scope of this PR to fix.

Open questions
For some reason, the following works

SELECT
  t.id          AS team_id,
  t.name        AS team_name,
  t.privacy     AS team_privacy,
  t.description AS team_description,
  tr.name       AS repo_name,
  tr.fork       AS repo_is_fork,
  tr.private    AS repo_is_private,
  tr.archived   AS repo_is_archived,
  tr.language   AS repo_primary_language
FROM github.github_team AS t
  INNER JOIN github.github_team_repository AS tr
             ON t.organization = tr.organization
               AND t.slug = tr.slug
WHERE t.organization = 'my_org'

but this doesn't, even though it should work and I don't see clear difference between github_team_repository and github_team_member.

SELECT
  tm.login      AS user_login,
  t.id          AS team_id,
  t.name        AS team_name,
  t.privacy     AS team_privacy,
  t.description AS team_description
FROM github.github_team AS t
    INNER JOIN github.github_team_member AS tm
               ON t.organization = tm.organization
                 AND t.slug = tm.slug
WHERE t.organization = 'my_org'
>> [HV000] ERROR: rpc error: code = Internal desc = 'List' call for table 'github_team_member' is missing 1 required qual: column:'slug' operator: =

github/utils.go Outdated Show resolved Hide resolved
@cbruno10 cbruno10 requested a review from misraved June 6, 2022 01:49
docs/tables/github_team.md Outdated Show resolved Hide resolved
@misraved
Copy link
Contributor

misraved commented Jun 7, 2022

Thanks @japborst for using Steampipe 👍. The tables and all the documents around it look great 😄 .

There are a few points which we would like to address before releasing them:

  • Can we add github_team_member and github_team_repository tables information in the github_team table? By using joins with github_user and github_repository tables I believe we can retrieve similar information, instead of having two separate tables. This is similar to how we have implemented the github_organization and github_my_organization tables.
  • As far as the open questions are concerned, I am also getting similar results. I am trying to dig in to figure out the possible reason for the failure. I will add more information once I have a conclusive reason for the failure.

Please let us know how you feel about the above approach, looking forward to your thoughts.

//// HYDRATE FUNCTIONS

func tableGitHubTeamGet(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) {
var org string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, the code to support both cases is pretty ugly. Suggestions are welcome.

@japborst
Copy link
Contributor Author

japborst commented Jun 7, 2022

@misraved

Can we add github_team_member and github_team_repository tables information in the github_team table?

I would prefer to not do that. Namely, if we collect to an array (repository names, member logins), that doesn't give us any additional information. I guess one could flatten that array and join with the user table, but I'm not sure what we're really gaining then, as joining is definitely more convenient.

@cbruno10
Copy link
Contributor

cbruno10 commented Jun 7, 2022

@japborst

I would prefer to not do that. Namely, if we collect to an array (repository names, member logins), that doesn't give us any additional information. I guess one could flatten that array and join with the user table, but I'm not sure what we're really gaining then, as joining is definitely more convenient.

I took a closer look at the information available in the GitHub API around team memberships + team repository relationships, and it seems like there's some additional information available for each relationship that GitHub does not return from a regular list call that would be useful to include in a separate table representing the relationship.

For instance:

In some of the tables that @misraved had mentioned, like the github_organization and github_repository tables, we currently make separate hydrate calls to get organization members or repository collaborators (even though the Terraform provider represents them as separate resources), in an attempt to reduce the number of tables.

But, notably for organization members, we don't include each member's url, role, or state information anywhere in the table's columns, as we'd need to make an API call per member to the https://docs.github.com/en/rest/orgs/members#get-organization-membership-for-a-user API, which may be too expensive to do in the github_organization table.

So it seems like actually having separate tables for relationships can be beneficial, as there's information about the relationship that you need to make individual API calls for, e.g., one API call per organization member to get its state and role information. These types of calls don't seem great to include in the parent table, like github_team or github_organization.

We could follow what Terraform does in general for these types of tables, which would mean the two additional tables in this PR would be named correctly. However, @japborst, if we were to proceed in this way, one suggestion I'd have is to reduce the columns down in both. For instance, in the github_team_member table, I'd propose that the columns should only include fields from the responses in https://docs.github.com/en/rest/teams/members#list-team-members and https://docs.github.com/en/rest/teams/members#get-team-membership-for-a-user, but not necessarily the full set of columns from the github_team and github_user tables, as there are additional columns in those tables that we don't get from the API responses.

@johnsmyth @japborst Interested in your thoughts on the proposal above (it came out a lot longer than I thought it'd be). Thanks!

@japborst
Copy link
Contributor Author

japborst commented Jun 8, 2022

Thanks, @cbruno10! Agree to your points. Especially role is quite relevant.

  • I do see that the repo permissions are not here yet, where actually that's super useful information. Will see if I can add easily.
  • I've updated the github_team_member column definition to narrow the scope, as you mentioned.
  • The github_my_team and github_team data is the same, as it uses the same underlying API, so no changes in columns there
  • Same goes for the github_team_repository and github_repository columns

Copy link
Contributor

@cbruno10 cbruno10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@japborst I'm also looking into why your second query is failing you mentioned in your original pull request comment, and what's a solution/workaround to allow those queries to work properly. I'll add another comment once I have more findings.

In the meantime, I've added some more suggestions, and if you have any questions please let me know, thanks!

github/table_github_user.go Outdated Show resolved Hide resolved
github/table_github_team_member.go Outdated Show resolved Hide resolved
github/table_github_team_member.go Outdated Show resolved Hide resolved
github/table_github_team_repository.go Show resolved Hide resolved
github/table_github_team_repository.go Show resolved Hide resolved
@japborst
Copy link
Contributor Author

@cbruno10 @misraved I've addressed all the comments.

@cbruno10
Copy link
Contributor

cbruno10 commented Jun 15, 2022

Hey @japborst , I was looking at how we could get your second example working (joining github_team and github_team_member), as there's a current limitation from Steampipe's side around query planning joins on tables with multiple required key quals (which github_team and github_team_member both have).

I believe that adding a ParentHydrate to github_team, with a user's organizations acting as the parent hydrate, should help solve this issue and make the table a bit easier to use.

So now if a user runs select * from github_team, the following sequence occurs:

  • tableGitHubMyOrganizationList is called first, listing all organizations the user is a member of
  • For each organization, tableGitHubTeamList is called, listing all teams in the organization the user has visibility to
  • If any columns are included in the query that rely on the hydrate tableGitHubTeamGet, tableGitHubTeamGet will be called for each team

If a user runs select * from github_team where organization = 'myorg' and slug = 'team-slug', only the tableGitHubTeamGet hydrate function will be called (same as before).

The github_my_team table is unaffected by these proposed changes.

Because users can only list teams they're either a part of or have visibility to (which I believe requires the user to at least be a member of an organization they're attempting to list teams for), having the github_team table first list all organizations that user is a member of should give an accurate list of teams they have visibility to across GitHub.

I've implemented and tested these changes in https://github.com/turbot/steampipe-plugin-github/tree/add-parent-hydrate-team, and they appear to be working well so far (your original query now runs ok as well). In that branch, I've also updated a few other items:

  • I updated the table docs for github_my_team and github_team to reflect the relationship between the two tables
  • I also bumpedgo-github to v45, like you had done in the audit logs PR, as I needed another response property for the Team item
  • I removed all pb aliases for the "github.com/turbot/steampipe-plugin-sdk/v3/grpc/proto" import, as we needed this when we were first developing the GitHub plugin earlier due to different package versions, but now they're no longer necessary (most tables do not use this alias)

Can you please have a look at my proposal above along with the branch, and let me know if this is aligned with your original intentions for this PR, and if you have any other feedback/suggestions? Thanks!

cbruno10 and others added 2 commits June 15, 2022 22:46
…o github_team and github_myteam tables, use parent item in github_team get function
@japborst
Copy link
Contributor Author

Nice @cbruno10! That works. I've pushed some final changes.

Good to go from my end ✅

@cbruno10 cbruno10 merged commit 684095b into turbot:main Jun 16, 2022
@cbruno10
Copy link
Contributor

Thanks so much @japborst for all of the hard work and great discussions!

@japborst japborst deleted the japborst/github-teams branch June 17, 2022 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants