Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: GitHub Actions Logs Receiver #32505

Closed
1 of 2 tasks
reakaleek opened this issue Apr 18, 2024 · 8 comments
Closed
1 of 2 tasks

New component: GitHub Actions Logs Receiver #32505

reakaleek opened this issue Apr 18, 2024 · 8 comments

Comments

@reakaleek
Copy link

reakaleek commented Apr 18, 2024

The purpose and use-cases of the new component

Our primary use case is to collect logs from multiple GitHub repositories in our organization and store them in a single place.

The GitHub Actions Log Receiver would receive workflow_run GitHub events through a webhook.

When a workflow is completed, it downloads the log files and converts them to logs that can be consumed by the logs consumer.

Downloading logs requires the permission actions: read; hence, you can provide GitHub PAT or GitHub App credentials. The token or app needs the appropriate permissions for any webhook event it receives.

Also, you can validate the incoming payload using a webhook secret.

Hence, to allow the collector to receive webhook events, you can specify the webhook endpoint in a GitHub App or in a repository webhook.

Example configuration for the component

      githubactionslog:
        github_auth: # required
          app_id: "${env:GITHUB_APP_ID}"
          installation_id: "${env:GITHUB_INSTALLATION_ID}"
          private_key: "${env:GITHUB_APP_PRIVATE_KEY}"
          # or only a PAT
          token: "${env:GITHUB_TOKEN}"
        batch_size: 10000 # consume logs in batches
        webhook_secret: "${env:WEBHOOK_SECRET}" # optional
        endpoint: localhost:19419
        path: /events
        health_check_path: /health

Telemetry data types supported

logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

We are aware of and even highly inspired by the existing contribution in #27460, but we decided to create our own receiver.

However, if desired, I am willing to dismiss this proposal and contribute to the existing PR. cc @krzko

We have a working implementation in https://github.com/reakaleek/opentelemetry-collector-contrib/tree/feature/githubactionslogsreceiver/receiver/githubactionslogreceiver and are using it to store logs in elasticsearch.

image

Because we were aware of the existing receiver, we copied the functions to generate trace and span IDs, as well as service names, to be able to correlate traces with logs.

@reakaleek reakaleek added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Apr 18, 2024
@krzko
Copy link

krzko commented Apr 22, 2024

Nice! As a heads up, @Elfo404 has added log support to the GitHub Actions Receiver, without the need for a new component to be created here https://github.com/grafana/opentelemetry-collector-contrib/tree/feat-add-githubactionseventreceiver.

Worth collaborating perhaps.

Just going through the PR process with the GHA Receiver component.

@Elfo404
Copy link

Elfo404 commented Apr 22, 2024

Thanks for the heads up @krzko
Yep, the idea is to create a follow up PR once the github actions trace receiver is merged and add support for logs. We are using that version of the receiver for a while now without issues, but happy to collaborate. As a rule of thumb I'd rather have a single component instead of two, but happy to chat about it.

@reakaleek
Copy link
Author

reakaleek commented Apr 22, 2024

Oh, nice!

Good to know that this already exists. Thank you for sharing this @krzko

@Elfo404 A single receiver makes sense. I briefly looked at the code, and both our implementations look very similar, superficially looking at it.

In our case, we had to handle an edge case involving a workflow with a large number of logs.
This caused a high memory usage of up to 10GB, which was also caused by the number of attributes we were adding.

Nevertheless, adding a batching mechanism (configurable) where logs are consumed in batches helped us keep memory usage low — under 1GB for the same workflow!

Also, we had to handle cases where the job and step name in the entity object would differ from the folder or file names in the zip.

I'd be happy to contribute and collaborate with you when the main PR is merged!

@crobert-1
Copy link
Member

Thanks for the proposal! From a quick read through I agree that this would make the most sense to be added as a part of the proposed (and accepted) GitHub actions receiver. 👍

@crobert-1 crobert-1 added Accepted Component New component has been sponsored enhancement New feature or request and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage Accepted Component New component has been sponsored labels Apr 23, 2024
@adrielp
Copy link
Contributor

adrielp commented Apr 29, 2024

OpenTelemetry already has a component for WebHook events. We actually have been using the WebHook receiver for quite some time to build out DORA metrics, etc from GitHub Log events. The only thing this receiver doesn't do is webhook payload validation. Would it be better served to support payload validation within the WebHook receiver?

edit: I misread the first portion and realized the functionality needed here is the downloading of the pipeline logs and not the Events themselves. My bad.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jul 1, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2024
Copy link
Contributor

Pinging code owners for receiver/github: @adrielp @andrzej-stencel @crobert-1 @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants