-
Notifications
You must be signed in to change notification settings - Fork 5
Data Quality Flags
This document describes how iDigBio identifies known data quality issues of ingested specimen data and represents them in the iDigBio Search API. During the ingestion process, iDigBio often encounters data that are missing, inconsistent, factually incorrect, or out of compliance with meta-data standards and controlled vocabularies. For example, Taxonomic Names are added from the GBIF Backbone Taxonomy. To facilitate indexing, corrections are made to these data and they are flagged in the search API. Another example is replacement of common misspellings (e.g. "Flordia" instead of "Florida").
The following summary indicates the frequency that various flags have been assigned to records in iDigBio:
http://search.idigbio.org/v2/summary/top/records?top_fields=[%22flags%22]&count=1000
General guidelines for flag names:
- a flag named with "added" means the field was empty in the provided data and iDigBio added a value to help fully populate the record. This enhances searching and discovery.
- a flag named with "replaced" means the field contained data from the provider and iDigBio attempted to make it more consistent by replacing the value. Note that the original data values are always available in the raw data. The replaced values are designed to enhance searching and discovery.
- a flag named with "truncated" means part of a record was removed. This can happen when a field name contains unsupported characters, such as dots (periods, ".").
The table below describes the flags that might be added to records in iDigBio:
Flag | Definition |
---|---|
datecollected_bounds | Date Collected out of bounds (Not between 1500-01-02 and the date of Indexing). Date Collected is generally composed from dwc:year, dwc:month, dwc:day or as specified in dwc:eventDate. |
dwc_acceptednameusageid_added | Accepted Name Usage ID (dwc:acceptedNameUsageID) added where none was provided. |
dwc_basisofrecord_invalid | Darwin Core Basis of Record (dwc:basisOfRecord) missing or not a value from controlled vocabulary. |
dwc_basisofrecord_paleo_conflict | Darwin Core Basis of Record (dwc:basisOfRecord) is not FossilSpecimen but the record contains paleo context terms |
dwc_basisofrecord_removed | Darwin Core Basis of Record (dwc:basisOfRecord) removed because of invalid value. |
dwc_class_added | Darwin Core Class (dwc:class) added where none was provided. |
dwc_class_replaced | Darwin Core Class (dwc:class) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_continent_added | Darwin Core Continent (dwc:continent) added where none was provided. |
dwc_continent_replaced | Darwin Core Continent (dwc:continent) replaced with a standardized value. |
dwc_country_added | Darwin Core Country (dwc:country) added where none was provided. |
dwc_country_replaced | Darwin Core Country (dwc:country) replaced with a standardized value from Getty Thesaurus of Geographic Names. |
dwc_datasetid_added | Darwin Core Dataset ID (dwc:datasetID) added where none was provided. |
dwc_datasetid_replaced | Darwin Core Dataset ID (dwc:datasetID) replaced with value from ? TBD |
dwc_family_added | Darwin Core Family (dwc:family) added where none was provided. |
dwc_family_replaced | Darwin Core Family (dwc:family) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_genus_added | Darwin Core Genus (dwc:genus) added where none was provided. |
dwc_genus_replaced | Darwin Core Genus (dwc:genus) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_infraspecificepithet_added | Darwin Core Infraspecific Epithet (dwc:infraspecificEpithet) added where none was provided. |
dwc_infraspecificepithet_replaced | Darwin Core Infraspecific Epithet (dwc:infraspecificEpithet) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_kingdom_added | Darwin Core Kingdom (dwc:kingdom) added where none was provided. |
dwc_kingdom_replaced | Darwin Core Kingdom (dwc:kingdom) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_kingdom_suspect | Darwin Core Kingdom (dwc:kingdom) not replaced with a standardized value from GBIF Backbone Taxonomy due to insufficient confidence level. |
dwc_multimedia_added | TBD |
dwc_order_added | Darwin Core Order (dwc:order) added where none was provided. |
dwc_order_replaced | Darwin Core Order (dwc:order) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_originalnameusageid_added | Darwin Core Original Name Usage ID (dwc:originalNameUsageID) added where none was provided. |
dwc_parentnameusageid_added | Darwin Core Parent Name Usage ID (dwc:parentNameUsageID) added where none was provided. |
dwc_phylum_added | Darwin Core Phylum (dwc:phylum) added where none was provided. |
dwc_phylum_replaced | Darwin Core Phylum (dwc:phylum) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_scientificnameauthorship_added | Darwin Core Scientific Name Authorship (dwc:scientificNameAuthorship) added where none was provided. |
dwc_specificepithet_added | Darwin Core Specific Epithet (dwc:specificEpithet) added where none was provided. |
dwc_specificepithet_replaced | Darwin Core Specific Epithet (dwc:specificEpithet) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_stateprovince_replaced | Darwin Core State or Province (dwc:stateProvince) replaced with a standardized value. |
dwc_taxonid_added | Darwin Core Taxon ID (dwc:taxonID) added where none was provided. |
dwc_taxonid_replaced | Darwin Core Taxon ID (dwc:taxonID) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_taxonomicstatus_added | Darwin Core Taxonomic Status (dwc:taxonomicStatus) added where none was provided. |
dwc_taxonomicstatus_replaced | Darwin Core Taxonomic Status (dwc:taxonomicStatus) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_taxonrank_added | Darwin Core Taxon Rank (dwc:taxonRank) added where none was provided. |
dwc_taxonrank_invalid | The supplied Darwin Core Taxon Rank (dwc:taxonRank) is not contained in controlled vocabulary (Taxonomic Rank GBIF Vocabulary). |
dwc_taxonrank_removed | Darwin Core Taxon Rank (dwc:taxonRank) removed because it is not contained in controlled vocabulary (Taxonomic Rank GBIF Vocabulary). |
dwc_taxonrank_replaced | Darwin Core Taxon Rank (dwc:taxonRank) replaced with a standardized value from GBIF Backbone Taxonomy. |
dwc_taxonremarks_added | Darwin Core Taxon Remarks (dwc:taxonRemarks) added none was provided. |
dwc_taxonremarks_replaced | Darwin Core Taxon Remarks (dwc:taxonRemarks) replaced with a standardized value from GBIF Backbone Taxonomy. |
gbif_canonicalname_added | GBIF Canonical Name added from GBIF Backbone Taxonomy. |
gbif_genericname_added | GBIF Generic Name added from GBIF Backbone Taxonomy. |
gbif_reference_added | GBIF Reference added from GBIF Backbone Taxonomy |
gbif_taxon_corrected | A match in GBIF Backbone Taxonomy was found. Inverse of taxon_match_failed flag. |
gbif_vernacularname_added | GBIF Vernacular Name (common name) added. |
geopoint_0_coord | Geographic Coordinate contains literal '0' values. |
geopoint_bounds | Geographic Coordinate out of bounds (valid range is -90 to 90 lat, -180 to 180 long) |
geopoint_datum_error | Geographic Coordinate Datum (dwc:geodeticDatum) is Unknown or coordinate cannot be converted to WGS84. |
geopoint_datum_missing | Geographic Coordinate is missing Geodetic Datum (dwc:geodeticDatum) (Assumed to be WGS84). |
geopoint_low_precision | Geographic Coordinate contains a Low Precision value. |
geopoint_pre_flip | Geographic Coordinate latitude and longitude replaced with swapped values. Prior to examining other factors, the magnitude of latitude was determined to be greater than 180, and the longitude was less than 90. |
geopoint_similar_coord | Geographic Coordinate latitude and longitude are similar (+/- lat == +/- lon) and likely have data entry issue. |
idigbio_isocountrycode_added | iDigBio ISO 3166-1 alpha-3 Country Code added. |
idigbio_obis_extendedmeasurementorfact_truncated | Record truncated due to problematic field name. |
idigbio_chrono_chronometricage_truncated | Record truncated due to problematic field name. |
rev_geocode_both_sign | Geographic Coordinate Latitude and Longitude negated to place point in correct country. |
rev_geocode_corrected | Geographic Coordinate placed within stated country by reverse geocoding process. |
rev_geocode_eez | Geographic Coordinate is outside land boundaries of stated country but does fall inside the country's exclusive economic zone water boundary (approx. 200 miles from shore) based on reverse geocoding process. |
rev_geocode_eez_corrected | The reverse geocoding process was able to find a coordinate operation that placed the point within the stated country's exclusive economic zone. |
rev_geocode_failure | Geographic Coordinate could not be reverse geocoded to a particular country. |
rev_geocode_flip | Geographic Coordinate Latitude and Longitude replaced with swapped values to place point in stated country by reverse geocoding process. |
rev_geocode_flip_both_sign | Geographic Coordinate Latitude and Longitude replaced with both swapped and negated values to place point in stated country by reverse geocoding process. |
rev_geocode_flip_lat_sign | Geographic Coordinate Latitude and Longitude replaced with swapped values, Latitude negated, to place point in stated country by reverse geocoding process. |
rev_geocode_flip_lon_sign | Geographic Coordinate Latitude and Longitude replaced with swapped values, Longitude negated, to place it in stated country by reverse geocoding process. |
rev_geocode_lat_sign | Geographic Coordinate Latitude negated to place point in stated country by reverse geocoding process. |
rev_geocode_lon_sign | Geographic Coordinate had its Longitude negated to place it in stated country. |
rev_geocode_mismatch | Geographic Coordinate did not reverse geocode to stated country. |
scientificname_added | Scientific Name (dwc:scientificName) added where none was provided with the value constructed by concatenation of stated genus and species. |
taxon_match_failed | Unable to match a taxon in GBIF Backbone Taxonomy. Inverse of gbif_taxon_corrected flag. |
Searching records for the flag scientificname_added
:
{
"flags":"scientificname_added"
}
http://search.idigbio.org/v2/search/records?rq={%22flags%22:%22scientificname_added%22}
Searching my recordset records that are flagged with scientificname_added
:
{
"flags":"scientificname_added",
"recordset":"c38b867b-05f3-4733-802e-d8d2d3324f84"
}
http://search.idigbio.org/v2/search/records?rq={%22flags%22:%22scientificname_added%22,%22recordset%22:%22c38b867b-05f3-4733-802e-d8d2d3324f84%22}