Description
@infotroph commented on Tue Jan 16 2018
[Task LIst, added 3-14-2018:]
- Clean up EBI data (https://www.betydb.org)
- Clean up TERRA-MEPP data (https://terra-mepp.illinois.edu/bety)
[TERRA-REF data is OK]
select distinct cultivar_id, cultivar, scientificname, species_id from traits_and_yields_view where cultivar_id=10;
cultivar_id | cultivar | scientificname | species_id
-------------+--------------+------------------+------------
10 | Cave-In-Rock | Acer rubrum | 30
10 | Cave-In-Rock | Panicum virgatum | 938
(2 rows)
Looks like the weird one is id 49920:
select distinct id,cultivar_id, cultivar, scientificname, species_id from traits_and_yields_view where cultivar_id=10 and species_id!=938;
id | cultivar_id | cultivar | scientificname | species_id
-------+-------------+--------------+----------------+------------
49920 | 10 | Cave-In-Rock | Acer rubrum | 30
@infotroph commented on Tue Jan 16 2018
Deleted some less useful comments. Here's a query that should help identify the mismatched records. Note that "correct" just means "as reported by the cultivars table".
SELECT traits_and_yields_view.id,
cultivar_id,
species_id AS traitview_species_id,
traits_and_yields_view.scientificname AS traitview_scientificname,
specie_id AS correct_species_id,
species.scientificname AS correct_scientificname
FROM traits_and_yields_view
JOIN cultivars ON cultivars.id = traits_and_yields_view.cultivar_id
JOIN species ON cultivars.specie_id = species.id
WHERE traits_and_yields_view.species_id != cultivars.specie_id;
First 10 of 618 results on my machine:
id | cultivar_id | traitview_species_id | traitview_scientificname | correct_species_id | correct_scientificname
-------+-------------+----------------------+--------------------------------------------------+--------------------+--------------------------------------------------
22941 | 324 | 2079 | Salix eriocephala | 1243 | Salix
22945 | 339 | 2871 | Salix miyabeana | 1243 | Salix
22946 | 327 | 2079 | Salix eriocephala | 1243 | Salix
22953 | 324 | 2079 | Salix eriocephala | 1243 | Salix
22957 | 339 | 2871 | Salix miyabeana | 1243 | Salix
22958 | 327 | 2079 | Salix eriocephala | 1243 | Salix
22965 | 324 | 2079 | Salix eriocephala | 1243 | Salix
22969 | 339 | 2871 | Salix miyabeana | 1243 | Salix
22970 | 327 | 2079 | Salix eriocephala | 1243 | Salix
22977 | 324 | 2079 | Salix eriocephala | 1243 | Salix
@dlebauer commented on Tue Jan 16 2018
@gsrohde did you ever implement or draft a constraint that required [id, specie_id] in the cultivars table to match the [cultivar_id, specie_id] in the traits table?
@gsrohde commented on Tue Jan 16 2018
I don't recall if I ever drafted a constraint for this, though I noted the need for one here: https://www.overleaf.com/2086241dwjyrd#/5297403/ (section 3.26 notes the need for this constraint in the traits table, and section 3.31 notes it for the yields table). Also, there is a Redmine issue noting the problem here: https://ebi-forecast.igb.illinois.edu/redmine/issues/2389, and Github issues #248 and #253 mention the need for species-cultivar consistency in the trait and yields tables (respectively) among the many other constraints to be implemented.
If I did ever draft a constraint, it's likely that the task of cleaning up existing data stood in the way of implementing it.
@gsrohde commented on Mon Mar 12 2018
@dlebauer I remember now. There is a constraint to this effect (a trigger function, actually). I implemented it when I added constraints to associate cultivars with sites. But existing bad data is "grandfathered in".
You can see the constraint in effect if you try to do the following update:
update traits set specie_id = 30 where id = 49920;
(The specie_id
in this row already is 30, but nevertheless, this returns an error, even though nothing would actually change (except maybe updated_at
) if the update succeeded.)
Updating to the proper value should succeed:
update traits set specie_id = 938 where id = 49920;
So the issue is a matter of cleaning up existing data.