Skip to content

[Proposal] Use an infrastructure-as-code solution to manage the open-telemetry github #1596

Open
@jaronoff97

Description

Hello all, after some of the past week's challenges around repository maintenance, some operator contributors had the idea that we could use Github's official terraform provider to provision and manage the various SIG github groups, repositories, branch protection rules, etc. that the community repo is currently used to manage entirely manually. This would then allow the TC to just approve and merge PRs that change the governance of the open-telemetry github rather than needing to make a slew of manual changes.

Benefits

  • Organization-as-code for OpenTelemetry would enable self-describing SIGs and repos that are entirely clear to maintainers
  • TC can automatically ping code owners when repository maintenance is needed
  • Maintainers that need repository maintenance can make the changes themselves
  • TC members who are not familiar with the github UI can focus solely on the requested change
  • (i'm sure there are more, these are the ones I just thought of initially)

Risks

  • Will take some time to migrate us to this new strategy (this can be done piecemeal through terraform by importing resources as needed)
  • Operational overhead
    • We will need someone to set up this terraform code including CI/CD
    • I think we should have the expertise within the organization to make this happen
  • If any of the repository information isn't meant to be public, this may be a challenge
    • I'm not sure what wouldn't be okay being public
  • Secret management will still need to be manual
    • This is no change from today, and we already don't have to update / manage these often right now

Design

  • Phase 1 (team membership)
    • Community would contain a new folder that holds its terraform configuration
    • All members of the org are imported via the Membership resource
    • CI/CD is automatically applying the state of the repository to ensure that the repo's state is always in-sync with the code
    • At this point, all membership requests can now be made solely via terraform
  • Phase 2 (SIGs)
    • A terraform module for OpenTelemetrySIG is created that can be used to create and manage the membership and repositories for a SIG
      • A module is used here to abstract the logical grouping of a SIG within the OpenTelemetry community
    • Each SIG's groups can be imported as teams and team members [doc]
  • Phase 3 (Repositories)
    • SIGs can opt-in initially to import their repository configuration to the SIG repository, some SIGs may choose to initially opt-out if they have special use cases
    • The current state of the repository can be shown using the terraform state show command post import, this will help fill the required terraform fields
  • Phase 4 (Branch rules)
    • Branch protection rules are added to the SIG module
    • Existing protections are imported, otherwise the otel specified imports are used
  • Phase 5 (All SIGs are moved)
    • At this point, we should have the ability to manage most of the capabilities that are requested for repo management
    • All SIGs and their repositories will be imported in to Terraform with any additional feature requirements being added prior to this

Please let me know if there are any steps missing from this list.

Alternatives Considered

  • Do nothing
    • We stick with what we have and we get more kerfuffles and more time spent on making repo changes in an opaque manor
  • Pulumi
    • Entirely valid solution for this
    • Has a github provider
    • Terraform was chosen purely from my own comfort level, I think we should go with whatever tool we feel is right, I don't have opinions one way or the other
  • Terraform Cloud / Pulumi Cloud / Spacelift
    • These cost money, and we should use their OSS stacks IMO. The benefits of the cloud providers are having a UI for approvals, but we don't really need that given github provides us that OOTB

Overall, I think having Terraform to manage the OpenTelemetry github state would allow for much faster and reliable management for the TC and I'm excited to hear the rest of the community's thoughts.

Metadata

Assignees

No one assigned

    Labels

    area/project-infraNon-GitHub project infra (DockerHub, etc.)

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions