Skip to content
This repository was archived by the owner on Dec 3, 2025. It is now read-only.

Conversation

@acr92
Copy link
Collaborator

@acr92 acr92 commented Nov 26, 2025

As part of an ongoing issue in the GPT repository I need to debug why we suddenly stop processing events. I've tried to get information using PostgreSQL logs, pg_stat_ tables, tokio console, LLM analysis, reading a fuckton of docs myself, and I cannot figure it out.

Some time ago I created a PR in the library we use for worker processing ( leo91000#314 ) with support for OpenTelemetry, which, through traces, will give me a lot of useful information.

Ideally I would never need to fork the repo, but the maintainer seems a bit inactive. What I long for is something like resque for Rust. This seemed the best alternative I could find, and I liked that it was just using PostgreSQL, as that made it fit well within our existing infrastructure, and that it was based on a popular NodeJS library with the foundational SQL re-used, so that part had already been quite well tested and verified. A lot of projects in the Rust world just seems like they're made by one person in their spare time.

This PR adds support for OpenTelemetry based tracing, with automatic
links between job creation, and processing.

It also adds instrumentation to most db functions, except get_job, as
that one seemed to be called a lot, so it felt a bit excessive.

I've tried to follow the related OTel semantic conventions, which seems
to be these:

https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/

But I'm open for using other conventions, and/or change which tags are
exposed.
@acr92 acr92 closed this Dec 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants