Skip to content

Add documentation on the new features added today #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,23 @@ Finally you can add a scrape configuration to Prometheus gathering the metrics:

In Rust's production environment the container is running on ECS Fargate.

## Testing ingestion locally

The `simulate-heroku.py` script can help you test your configuration locally by
parsing the output of `heroku logs --tail` and submitting it to vector.dev in
the proper encoding. The recommended workflow locally is to start the
container, gather locally a sample of the production logs with:

```
heroku logs --tail --app crates-io > /dev/shm/crates-io-logs
```

...and every time you want to test, submit the logs to the local instance:

```
cat /dev/shm/crates-io-logs | ./simulate-heroku.py http://drain:${PASSWORD_DRAIN}@localhost/drain?app_name=crates-io
```

## Rationale and requirements

To ensure smooth service operations the crates.io team needs to gather the
Expand All @@ -70,6 +87,8 @@ following metrics:
* **Heroku Postgres** metrics, such as the load average, the IOPS or the cache hit
ratio.

* **Heroku Router** metrics, such as the amount of load balancer errors.

The Rust Infrastructure team maintains a centralized monitoring solution based
on Prometheus, but unfortunately that makes integration with applications
running on Heroku hard.
Expand All @@ -81,10 +100,10 @@ gathering instance-level metrics impossible with a centralized Prometheus
server. Heroku Postgres metrics aren't easier to gather either, as Heroku only
exposes them by periodically writing a line in the application logs.

Gathering Heroku Postgres metrics requires extracting them to the logs, so we
decided to scrape service-level metrics directly with Prometheus, and create a
single container (this!) to extract both Heroku Postgres and instance-level
metrics from the logs.
Gathering Heroku Postgres and Heroku Router metrics requires extracting them to
the logs, so we decided to scrape service-level metrics directly with
Prometheus, and create a single container (this!) to extract both Heroku
Postgres and instance-level metrics from the logs.

## Design

Expand Down Expand Up @@ -141,6 +160,12 @@ Heroku itself][heroku-postgres-metrics]. A Lua transform parses each line and
extracts the samples from it. There is no hardcoded list of metrics to extract,
so everything Heroku provides is exported.

### Heroku Router metrics

Heroku Router metrics are extracted by parsing the logs emitted by Heroku
itself for each requests. Multiple transforms then parse each log message and
increment the right metrics based on the outcome of that request.

[nginx]: https://nginx.org/
[Vector]: https://vector.dev/
[Heroku Logplex]: https://devcenter.heroku.com/articles/logplex
Expand Down