Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing referential integrity leads to database inconsistencies #30446

Open
proxity opened this issue Apr 12, 2024 · 3 comments
Open

Missing referential integrity leads to database inconsistencies #30446

proxity opened this issue Apr 12, 2024 · 3 comments

Comments

@proxity
Copy link

proxity commented Apr 12, 2024

Description

We have pushed a high amount of maven packages and files into a well configured Gitea instance. We were also deleting release versions on Gitea and then pushing them again. All of this was done using a script. Esp. when doing parallel uploads with up to 16 CPUs it was possible to create invalid records in the database.
Here's how I verified it in our PostgreSQL v12.17:

Fatal inconsistencies leading to error 500

-- Package Files with non-existing Blob
select count(1) from package_file where not exists (select 1 from package_blob where id=package_file.blob_id);

Here we had one entry which made it impossible to use the package file. Therefore, we had to delete the defective record.

Non-fatal inconsistencies

-- Package Versions without a Package
select count(1) from package_version where not exists (select 1 from package where id=package_version.package_id);

-- Package Files without a Version
select count(1) from package_file where not exists (select 1 from package_version where id=package_file.version_id);

-- Blobs without a File
select count(1) from package_blob where not exists (select 1 from package_file where blob_id=package_blob.id);

Query 1 returned 0 which is fine. But query 2 returned a high number of orphaned files. And query 3 also returned a considerable number of orphaned blobs.
The good news is, since Gitea is iterating through the hierarchy from top to bottom, that orphaned records waste disk space, but do not show up as errors in the UI. We will delete them, including the data which is connected with the affected blobs.

Diagnosis

As I found out, XORM which is in use by gitea, doesn't support foreign keys yet. This leads to this kind of DB inconsistencies, since there's no referential integrity.

In package_file the version_id is a BIGINT NOT NULL.
What should be added is REFERENCES package_version, that would guarantee that no invalid value is possible.
In addition, ON DELETE CASCADE would automatically delete all package_files which package_version is deleted.

As it happens during concurrent uploads, I guess that the transaction handling is not perfect at the moment. Please fix the transactions if foreign keys cannot be implemented now.

Gitea Version

1.21.7

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

We are running the official docker image docker.io/gitea/gitea:1.21.7 in a K8S cluster.

Database

PostgreSQL

@tlusser
Copy link

tlusser commented Apr 15, 2024

Is it also possible to create a check also to the gitea doctor command to cleanup and show any issues there?

@lunny
Copy link
Member

lunny commented Apr 15, 2024

related to #30171 and should be fixed by #30335

@proxity
Copy link
Author

proxity commented Apr 15, 2024

@tlusser that's a good idea. gitea doctor check did not find any of my reported problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants