Skip to content

Make data_object_type and data_category required #2346

Open
@bmeluch

Description

@bmeluch

Problem

DataObject encompasses all kinds of records from different parts of the data model (raw data, processed data, workflow parameters). data_object_type and data_object_category are narrow and broad ways of identifying the nature/purpose of a given record.

Neither of those slots are required, which means that some records (especially from older projects) may have NA for this slot. This means that any queries searching for a type of data (e.g. "give me all of the raw proteomics data") have the potential to miss records.

This most importantly has the potential to affect the bulk download on the data portal. In other contexts it just makes queries/filters more complicated than necessary (see NOM notebooks).

Actions

  • Identify records that will need these slots backfilled
  • Backfill records with a migrator or changesheets depending on how many records it is (?)
  • Make data_object_type required in the schema
  • Make data_category required in the schema

For reference

https://nmdc-group.slack.com/archives/CFVH4DYGH/p1739989643444869

data_object_type is a slot on DataObject with a range of FileTypeEnum

data_category is also on DataObject and has a range of DataCategoryEnum

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions