-
Notifications
You must be signed in to change notification settings - Fork 4
JSON Main Sections
The NMDataParser JSON configuration syntax includes a set of keywords, specifying different strategies for reading the data from one or several excel sheets, as well as allowing combination of the excel structures (sheets, rows, columns, blocks of cells and cells) into the eNanoMapper data model. The JSON configuration file consists of several major sections which are objects on the first level of JSON schema:
"TEMPLATE_INFO": {
...
},
"DATA_ACCESS": {
...
},
"SUBSTANCE_RECORD": {
...
},
"PROTOCOL_APPLICATIONS": [
...
]
TEMPLATE_INFO is the first section and used for technical (administrative) purposes only. This section contains attributes like: NAME, VERSION and TYPE.
Section DATA_ACCESS defines the basic access to the spreadsheet template information i.e. how data for the substance records (nanomaterials) is iterated.
SUBSTANCE_RECORD is a section that defines data locations of the basic fields (components) of a substance record from the data model.
The last major section on the first JSON level, PROTOCOL_APPLICATIONS, configures how to read an array of ProtocolApplication objects which are included in the SubtsanceRecord object defined in the previous section. PROTOCOL_APPLICATIONS in an array of Protocol Application configurations. These Protocol Applications are associated with the Substance Record object, defined in the previous section.
Example configuration with test data illustrating the major JSON sections:
{
"TEMPLATE_INFO": {
"NAME": "Test1",
"VERSION": "version 1",
"TYPE": 1,
"ALLOW_QUALIFIER_IN_VALUE_CELL": 1
},
"DATA_ACCESS": {
"ITERATION": "ROW_SINGLE",
"SHEET_INDEX": 1,
"START_ROW": 2,
"END_ROW": 4,
"START_HEADER_ROW": 1,
"END_HEADER_ROW": 1,
"ALLOW_EMPTY": true,
"RECOGNITION": "BY_INDEX",
"VARIABLES":
{
"var1": {
"ITERATION" : "ABSOLUTE_LOCATION",
"COLUMN_INDEX" : "C",
"ROW_INDEX" : 10
},
"var2": {
"COLUMN_INDEX" : "B"
},
"topCategory": {
"ITERATION" : "ABSOLUTE_LOCATION",
"COLUMN_INDEX" : "E" ,
"ROW_INDEX" : 10
}
}
},
"SUBSTANCE_RECORD":
{
"SUBSTANCE_NAME": {
"COLUMN_INDEX": "D"
},
"SUBSTANCE_TYPE": "NPO_1317",
"OWNER_UUID": {
"COLUMN_INDEX": "C"
},
"OWNER_NAME" : "test-owner-name",
"SUBSTANCE_UUID": {
"COLUMN_INDEX": "D"
},
"REFERENCE_SUBSTANCE_UUID": {
"ITERATION" : "ABSOLUTE_LOCATION",
"COLUMN_INDEX": "D",
"ROW_INDEX" : 10
},
"PUBLIC_NAME": {
"COLUMN_INDEX": "A"
},
"ID_SUBSTANCE": 123456,
"EXTERNAL_IDENTIFIERS": [
{
"TYPE": "ID1",
"ID":{
"COLUMN_INDEX": "F"
}
},
{
"TYPE": "ID2",
"ID":{
"COLUMN_INDEX": "G"
}
}
],
"COMPOSITION": [
{
"CONTENT": {
"COLUMN_INDEX": "T"
},
"FORMAT" : {
"COLUMN_INDEX": "U"
},
"FORMULA" : {
"COLUMN_INDEX": "V"
},
"SMILES" : {
"COLUMN_INDEX": "W"
},
"INCHI" : {
"COLUMN_INDEX": "X"
},
"INCHI_KEY" : {
"COLUMN_INDEX": "Y"
}
}
]
},
"PROTOCOL_APPLICATIONS": [
{
"PROTOCOL_APPLICATION_UUID" : {
"COLUMN_INDEX": "H"
},
"INVESTIGATION_UUID" : "test-investigation-uuid",
"ASSAY_UUID" : "test-assay-uuid",
"PROTOCOL_ENDPOINT" : "test-protocol-endpoint",
"PROTOCOL_TOP_CATEGORY" : {
"ITERATION" : "VARIABLE",
"VARIABLE_KEY" : "topCategory"
},
"PROTOCOL_CATEGORY_CODE" : "test-category-code",
"CITATION_TITLE": {
"COLUMN_INDEX": "I"
},
"CITATION_YEAR": {
"COLUMN_INDEX": "J"
},
"CITATION_OWNER": {
"COLUMN_INDEX": "K"
},
"INTERPRETATION_RESULT": {
"COLUMN_INDEX": "L"
},
"INTERPRETATION_CRITERIA": {
"COLUMN_INDEX": "M"
},
"PROTOCOL_GUIDELINE": {
"guideline1": {
"COLUMN_INDEX": "N"
},
"guideline2": {
"COLUMN_INDEX": "O"
}
},
"PARAMETERS": {
"par1": {
"COLUMN_INDEX": "P"
},
"par2": {
"COLUMN_INDEX": "Q"
}
},
"EFFECTS":
[
{
"ENDPOINT": "Size",
"ENDPOINT_TYPE": "Average",
"VALUE": {
"COLUMN_INDEX": "R"
},
"CONDITIONS": {
"cond1": "cond1-val"
}
},
{
"ENDPOINT": "Eff1",
"VALUE": {
"COLUMN_INDEX": "AD"
},
"ERR_VALUE": {
"COLUMN_INDEX": "AE"
},
"CONDITIONS": {
"cond11": {
"COLUMN_INDEX": "AF"
},
"cond12": {
"COLUMN_INDEX": "AG"
}
}
},
{
"ENDPOINT": "Eff2",
"VALUE": {
"COLUMN_INDEX": "AH"
},
"ERR_VALUE": {
"COLUMN_INDEX": "AE"
},
"CONDITIONS": {
"cond21": {
"COLUMN_INDEX": "AI"
}
}
}
]
}
]
}
PARALLEL_SHEETS is an array of sections similar to section DATA_ACCESS that define a simultaneous reading of several sheets together with the primary sheet. Data access for other sheets (additional secondary sheets) is set in JSON array section PARALLEL_SHEETS analogously to this section. For simultaneous access to several sheet from one excel file, we use PARALLEL_SHEET section. PARALLEL_SHEETS is intended to be used in mode ROW_SINGLE.
"PARALLEL_SHEETS": [{
"ITERATION": "ROW_SINGLE",
"SHEET_INDEX": 2,
"START_ROW": 2,
"START_HEADER_ROW": 1,
"END_HEADER_ROW": 1,
"ALLOW_EMPTY": true,
"RECOGNITION": "BY_INDEX"
}
],
It is expected that there is a 'perfect' synchronization between the parallel sheets. This means that for each parallel sheet, the n-th row counted from the START_ROW will be assigned to the n-th Substance Record imported. It is allowed different starting rows (attribute START_ROW) for the different parallel sheets.
SUBSTANCE_RECORD_MAP is a specialized section used only for iteration mode SUBSTANCE_RECORD_MAP.
"SUBSTANCE_RECORD_MAP" :
{
"MAP_ELEMENT" : "SUBSTANCE_NAME",
"SUBSTANCE_NAME": ["C1","C2","C3","C4"]
},
In this case section SUBSTANCE_RECORD is not used.
- Home
- Quick start
- Data templates
- eNanoMapper Data Model
- Parser configuration via JSON
- Available templates
- How to
- Additional information