Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UUID columns unique in sqla #2323

Merged
merged 2 commits into from
Feb 8, 2019

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Dec 7, 2018

Fixes #2267

In django, all uuid columns are now unique. This does the same for sqlalchemy.

  • make uuid columns in sqlalchemy unique
  • checks for non-unique entries in the DB
  • generalize fixes that are in place for Node to other tables

@ltalirz
Copy link
Member Author

ltalirz commented Dec 7, 2018

@sphuber unexpectedly, the tests started passing after bringing the branch up to date with provenance_redesign, so I'm making the PR here already.
For the node migration I guess you added the fix of what to do when duplicate uuids were encountered.
The test for duplicate uuids is already generalized to other tables, the fix isn't yet.

Let me know how best to proceed, feel free to push to my fork if you like.

muhrin
muhrin previously approved these changes Dec 14, 2018
@sphuber
Copy link
Contributor

sphuber commented Dec 14, 2018

Let's wait with merging this. I would like to have a look next week to see that we properly generalize the fixes I originally developed for just Node

@sphuber sphuber force-pushed the uuid-unique-sqla branch 4 times, most recently from f61dead to a8df59c Compare January 18, 2019 10:01
@coveralls
Copy link

coveralls commented Jan 18, 2019

Coverage Status

Coverage increased (+0.04%) to 69.755% when pulling c034eb6 on ltalirz:uuid-unique-sqla into dcdfbf2 on aiidateam:provenance_redesign.

@sphuber
Copy link
Contributor

sphuber commented Jan 18, 2019

I have squashed and rebased @ltalirz initial work. On top of that I have added a commit that should address the bit that was missing: generalizing the deduplication code from working just on DbNode to the other tables. While working on this, I realized that the tests break for SqlAlchemy. They do so for the migration tests, the framework for which was recently added by @giovannipizzi . I have tried to debug it but haven't solved it yet. The problem seems to be that during the tests, at the end when it wants to revert the database back to head it hangs. Looking at the activity in postgres there are various connection open at that time and another select query seems to be active. It is a simple select on the db_dbgroup table which seems to block the database reset. I am not sure why that select blocks. But it might come from the TestGroupRenamingMigration test because if we comment that out, everything is fine. To be continued...

Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand correctly that this is fixing old migrations and not adding new ones, right?

from aiida.common.exceptions import IntegrityError

query = text(
'SELECT s.id, s.uuid FROM (SELECT *, COUNT(*) OVER(PARTITION BY uuid) AS c FROM {}) AS s WHERE c > 1'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but a (possibly faster) way to get the information without subqueries and aggregate functions is just using group by, something like

SELECT uuid, COUNT(uuid) from {} GROUP BY uuid HAVING COUNT(uuid) > 1

(that gives a different result, but in the end you just use the result to know if there are duplicates, not which one they are).

Just for info anyway (and possibly you need the aggregate when actually deduplicating the uuid).

@ltalirz
Copy link
Member Author

ltalirz commented Feb 7, 2019

@giovannipizzi the latest status on this PR was (if I remember correctly) that we were waiting for you to merge in certain changes into provenance_redesign before proceeding with this PR.
I guess this has happened now? should I check and resolve the conflicts?

@giovannipizzi
Copy link
Member

yes yes sorry my question was for @sphuber - I did the merges and @sphuber already merged these in and added the tests

@sphuber
Copy link
Contributor

sphuber commented Feb 7, 2019

@ltalirz I am taking care of it, this is still dependent on another PR that needs merging.

ltalirz and others added 2 commits February 8, 2019 08:12
This adds the uniqueness constraint on the UUID fields of the models:

 * Computer
 * Comment
 * Group
 * Node
 * Workflow

A check is performed before the migration is applied, to see whether the
affected tables actually contain any entries with duplicate UUIDs. In
this case a warning is emitted and the migration is stopped. The
infrastructure to deduplicate, that is present all ready for nodes, has
not yet been generalized to arbitrary tables.
The original deduplication code was implemented only for the `DbNode`
table but has now been extended to `DbComputer` and `DbGroup`. Note that
this has not been done for `DbComment`, because it does not yet have an
entity loader. If `load_comment` were to be implemented, support for
deduplication can be added to the `deduplicate_uuids` function.
@sphuber sphuber merged commit ba4b3e7 into aiidateam:provenance_redesign Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants