The ADAT files included in this repository are intended to provide existing and prospective SomaLogic customers an example data file to enable analysis preparation prior to receipt of SomaScan data, and also for those generally curious about the SomaScan data deliverable. Data in this file is not intended for biological analysis purposes or to provide any metrics for SomaScan data in general.
example_data.adat
example_data_v4.1_plasma.adat
example_data_v5.0_plasma.adat
The example ADAT files in this repository can be retrieved in one of two ways:
-
Cloning the repository to your local machine
-
Using
wget
to retrieve individual ADAT files from the repository; see examples below
# Retrieve just the 5k (v4.0) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data.adat
# Retrieve just the 7k (v4.1) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data_v4.1_plasma.adat
# Retrieve just the 11k (v5.0) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data_v5.0_plasma.adat
commercial name | menu version | size | example file |
---|---|---|---|
5k | V4 | 5284 | example_data.adat |
7k | v4.1 | 7596 | example_data_v4.1_plasma.adat |
11k | v5.0 | 11083 | example_data_v5.0_plasma.adat |
The ADAT file format is a SomaLogic-specific, tab-delimited text file designed to store SomaScan study data. This format is intended to be flexible and self-describing. The fields in this example file may be different than the fields in the *.adat file for your study. However, all *.adat files are comprised of four main sections arranged in the following order:
HEADER
- Study-level information about the SomaScan experiment and how the data was processed.COL_DATA
- Field names and type associated with the SOMAmer reagents (columns).ROW_DATA
- Field names and type associated with sample information (rows).TABLE_BEGIN
- This section contains the experimental data organized into a data matrix of SOMAmer Reagents (columns) by samples (rows). SomaScan measurements are in relative flourescent units (RFU). The data block directly above the measurement matrix describes the SOMAmer reagents and the data block to its left contains sample-specific (e.g. clinical) information.
The file, example_data.adat
, contains a SomaScan V4.0 study from a set of
human samples. The RFU measurements themselves and other identifiers
have been altered to protect personally identifiable information (PII),
but also retain underlying biological signal as much as possible.
There are 192 total EDTA-plasma samples from four (4) plate runs
which are broken down by the following types:
- 170 clinical samples
- 10 calibrators (replicate controls for combining data across runs)
- 6 QC samples (replicate controls used to assess run quality)
- 6 Buffer samples (no protein controls)
The second file, example_data_v4.1_plasma.adat
, contains a SomaScan v4.1
study from the same set of human samples.
RFU measurements have been altered protect PII in this file as well.
There are 163 EDTA Plasma samples from four (4) 96-well plate runs which
include the following:
- 163 clinical samples
- 20 calibrators
- 12 QC samples
- 12 Buffer samples
The third file, example_data_v5.0_plasma.adat
, contains a SomaScan v5.0
study from the same set of human samples.
RFU measurements have been altered protect PII in this file as well.
There are 163 EDTA Plasma samples from twelve (12) 96-well plate runs which
include the following:
- 163 clinical samples
- 60 calibrators
- 36 QC samples
- 36 Buffer samples
The standard data normalization procedure for EDTA-plasma samples was applied to all three (3) datasets.
In a standard SomaLogic ADAT, the section of information that
sits directly above the measurement data (RFU data matrix) is
the column meta data, which contains detailed information
and annotations about the analytes, SeqIds
, and their targets.
See section below for further information about available
fields and their descriptions.
Information describing the analytes is found to the above the data matrix in a standard SomaLogic ADAT. This information may consist of the any or all of the following:
Field | Description | Example |
---|---|---|
SeqId | SomaLogic sequence identifier | 2182-54_1 |
SeqidVersion | Version of SOMAmer sequence | 2 |
SomaId | Target identifier, of the form SLnnnnnn (8 characters in length) | SL000318 |
TargetFullName | Target name curated for consistency with UniProt name | Complement C4b |
Target | SomaLogic Target Name | C4b |
UniProt | UniProt identifier(s) | P0C0L4 P0C0L5 |
EntrezGeneID | Entrez Gene Identifier(s) | 720 721 |
EntrezGeneSymbol | Entrez Gene Symbol names | C4A C4B |
Organism | Protein Source Organism | Human |
Units | Relative Fluorescence Units | RFU |
Type | SOMAmer target type | Protein |
Dilution | Dilution mix assignment | 0.01% |
PlateScale_Reference | PlateScale reference value | 1378.85 |
CalReference | Calibration sample reference value | 1378.85 |
medNormRef_ReferenceRFU | Median normalization reference value | 490.342 |
Cal_V4_<YY>_<SSS>_<PPP> | Calibration scale factor (for given Year_Study_Plate) | 0.64 |
ColCheck | QC acceptance criteria across all plates/sets | PASS |
QcReference_<LLLLL> | QC sample reference value (for given QC lot) | PASS |
CalQcRatio_V4_<YY>_<SSS>_<PPP> | Post calibration median QC ratio to reference (for given Year_Study_Plate) | 1.04 |
Information describing the samples is typically found to the left of the data matrix in a standard SomaLogic ADAT. This information may consist of clinical information provided by the client, or run-specific diagnostic information included for assay quality control. Below are some examples of what may be present in this section:
Field | Description | Examples |
---|---|---|
PlateId | Plate identifier | V4-18-004_001, V4-18-004_002 |
ScannerID | Scanner used to analyze slide | SG12064173, SG14374437 |
PlatePosition | Location on 96 well plate (A1-H12) | A1, H12 |
SlideId | Agilent slide barcode | 258495800001 |
Subarray | Agilent subarray (1 – 8) | 1,8 |
SampleId | 1st form is Subject Identifier, 2nd form (calibrators, buffers) | 2031 |
SampleType | 1st form for clinical samples (Sample), 2nd form as above | Sample, QC, Calibrator, Buffer |
PercentDilution | Highest concentration the SOMAmer dilution groups | 20 |
SampleMatrix | Sample matrix | Plasma-PPT |
Barcode | 1D Barcode of aliquot | S622225 |
Barcode2d | 2D Barcode of aliquot | 9876543210 |
SampleNotes | Assay team sample observation | Cloudy, Low sample volume, Reddish |
SampleDescription | Supplemental sample information | Plasma QC 1 |
AssayNotes | Assay team run observation | Beads aspirated, Leak/Hole, Smear |
TimePoint | Sample time point | Baseline |
ExtIdentifier | Primary key for Subarray | EXID40000000032037 |
SsfExtId | Primary key for sample | EID102733 |
SampleGroup | Sample group | A, B |
SiteId | Collection site | SomaLogic |
TubeUniqueID | Unique tube identifier | 2031 |
CLI | Cohort definition identifier | CLI6006F001 |
HybControlNormScale | Hybridization control scale factor | 0.948304 |
RowCheck | Normalization acceptance criteria for all row scale factors | PASS, FLAG |
NormScale_0_5 | Median signal normalization scale factor (0.5% mix) | 1.02718 |
NormScale_0_005 | Median signal normalization scale factor (0.005% mix) | 1.119754 |
NormScale_20 | Median signal normalization scale factor (20% mix) | 0.996148 |
- R package: SomaDataIO
- Python module: Canopy
- Digital Tools: DataDelve Statistics
SomaLogic-Data was developed by the Bioinformatics Dept. at SomaLogic Operating Co., Inc.