Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildbot.ngi.nixos.org is down #347

Closed
wegank opened this issue Sep 2, 2024 · 5 comments
Closed

buildbot.ngi.nixos.org is down #347

wegank opened this issue Sep 2, 2024 · 5 comments
Assignees
Labels
bug Something isn't working infra Work on Ngipkgs itself, and related infrastructure maintenance Cleanup, refactoring, improving discoverability, tending to continuos integration

Comments

@wegank
Copy link
Member

wegank commented Sep 2, 2024

https://buildbot.ngi.nixos.org returns 502 if deployed with any commit later than #324, and I can no longer restart action makemake #51 as a temporary workaround. As a result, CI checks are hanging on the latest PRs.

Since I don't have access to the server, it would be nice if one of @Erethon @fricklerhandwerk @Janik-Haag @lorenzleutgeb could have a look and see what the error is, and if it persists after a reboot.

@wegank wegank added bug Something isn't working maintenance Cleanup, refactoring, improving discoverability, tending to continuos integration infra Work on Ngipkgs itself, and related infrastructure labels Sep 2, 2024
@wegank wegank pinned this issue Sep 2, 2024
@Erethon
Copy link
Collaborator

Erethon commented Sep 2, 2024

https://buildbot.ngi.nixos.org/ should be back online now. The master process failed to start with the following error https://gist.github.com/Erethon/e19d1a5bde98421bdc9c93138b94a372. As a workaround, I've followed these instructions to revert #344 and 36b9cf9, while also pinning buildbot-nix to 9086472a5f46982cae7047565102685e901c62bc [1]. These changes have been only applied to the checked out repo in makemake as a way to figure out what's going on before opening a PR in ngipkgs, so re-running the deployment action for makemake now will break buildbot again.

I can't follow up on the pydantic error right now as I'm not at home, but can look at it towards the end of the week.

[1]: It seemed like a safe good known version before any pydantic changes in buildbot.

@Erethon
Copy link
Collaborator

Erethon commented Sep 11, 2024

I spent some time looking at this yesterday and today. The pydantic error linked previously was our smoking gun. Buildbot-nix uses a JSON file as a cache for projects and for some reason, our version of that cache wasn't properly populated (it was missing installation_id). Deleting /var/lib/buildbot/github-project-cache-v1.json fixed the issue and allowed us to switch to the latest buildbot-nix version.

@Erethon Erethon closed this as completed Sep 11, 2024
@Erethon
Copy link
Collaborator

Erethon commented Sep 11, 2024

Documented upstream for completeness nix-community/buildbot-nix#270

@wegank
Copy link
Member Author

wegank commented Oct 27, 2024

Reopening as #380 seems to break https://buildbot.ngi.nixos.org/ again with the same error code 502. I've restarted action makemake #92 as a temporary workaround.

@wegank wegank reopened this Oct 27, 2024
@wegank
Copy link
Member Author

wegank commented Oct 31, 2024

The issue is gone...

@wegank wegank closed this as completed Oct 31, 2024
@wegank wegank unpinned this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infra Work on Ngipkgs itself, and related infrastructure maintenance Cleanup, refactoring, improving discoverability, tending to continuos integration
Projects
None yet
Development

No branches or pull requests

2 participants