Skip to content

Files Specifications

Iosif Spartalis edited this page Oct 8, 2021 · 9 revisions

MIP "Frictionless data" table-schema

For describing the schema of a csv file, we use a modified version of the Frictionless Data Table Schema specifications designed to be expressible in JSON. More specifically, we modified the Field object by adding an additional property named MIPType. This property can take the following values:

  • text
  • nominal
  • numerical
  • integer
  • date

Those are the main Datatypes that are used in the HBP-MIP platform.

The modified schema of the Table Schema JSON object can be found HERE.

CDE Dictionary Excel file

CDE Dictionary Excel columns

The spreadsheet MUST have the following columns with the same order:

  1. mipname: The name of the variable.
  2. mip_code: The variable’s code.
  3. mip_type: The variable’s type.
  4. mip_values: The variable’s values. It may have an enumeration or a range of values. For enumerations, for every value please provide code and label in brackets. Example (for ADNI category): {"AD","Alzheimer’s Disease"},{"MCI","Mild Cognitive Impairment"},{"CN","Cognitively Normal"} For range of values simply state the min and max values with '-' in between. Example (for MMSE Total scores): 0-30.
  5. unit: The variable’s measurement unit.
  6. description: The variable’s description.
  7. comments: Comments about the variable’s nature.
  8. conceptPath: The variable’s concept path. Example (for ApoE4): /root/genetic/polymorphism/apoe4.
  9. variable_lookup: list of alternative names separated by commas
  10. enum_lookup: list of alternative enumerations in the form {code: actual value} separated by commas. Example: {"F": "Famme"},{"M", "Homme"}.
  11. domain: The main scientific domain category that the variable belongs to.

Data Catalogue Excel file

Necessary Conventions

!NOTE: the name of the file should be in the following format: "pathology name"cdes"version number".xlsx (e.g. demencia_cdes_v1.xlsx).

!NOTE: all the columns described below should be present in the uploaded file.

!NOTE: a template with the appropriate file name and columns is available here. In order to avoid naming inconsistencies we encourage users to use the provided template and fill-in their data.

DC Excel columns

The spreadsheet MUST have the following columns with the same order:

  1. csvFile: The name of the dataset file the variable is in.
  2. name: The name of the variable.
  3. code: The variable’s code.
  4. type: The variable’s type.
  5. values: The variable’s values. It may have an enumeration or a range of values. For enumerations, for every value please provide code and label in brackets. Example (for ADNI category): {"AD","Alzheimer’s Disease"},{"MCI","Mild Cognitive Impairment"},{"CN","Cognitively Normal"} For range of values simply state the min and max values with '-' in between. Example (for MMSE Total scores): 0-30.
  6. unit: The variable’s measurement unit.
  7. canBeNull: Whether the variable is allowed to be null or not: Y/N.
  8. description: The variable’s description.
  9. comments: Comments about the variable’s nature.
  10. conceptPath: The variable’s concept path. Example (for ApoE4): /root/genetic/polymorphism/apoe4.
  11. methodology: The methodology the variable has come from. Example (for rs10498633_T): lren-nmm-volumes.
Clone this wiki locally