This schema defines the format of tabular data, the relationships between columns, rows, headers, and metadata, and the binding of tables to charts.
Overview | Motivation | Schema | Table Features
Tables can be more complex than a two-dimensional array of data. There are structural relationships that are important for understanding how data is related in order to facilitate human consumption. By retaining this information, it allows users to build these relationships in one application and carry them over to another, without having to redo this work. Once those relationships are defined, they can drive how tables can be bound to charts.
To make FactSet data agnostic by standardizing the format of tabular data and broadly integrating the standard across FactSet's services.
| File | Description | Note |
|---|---|---|
| Package.proto | A package of tables and charts | |
| MetadataItem.proto | A metadata item | |
| Table Messages | ||
| Table.proto | A table | |
| TableDefinition.proto | The definition of a table | |
| TableData.proto | The data of a table | |
| SeriesDefinition.proto | The definition of a series | A series is how a column of data is represented |
| SeriesData.proto | The data of a series | A column of data is stored in a columnar format |
| SeriesFormat.proto | The format of a series | Makes tables look human readable |
| DataType.proto | The type of data in a series | |
| Wrappers.proto | Wrappers around data types | |
| Metadata Messages | ||
| MetadataCollection.proto | A collection of metadata | All metadata for a table |
| MetadataLocations.proto | Where metadata items are located | |
| ListOfMetadata.proto | A list of metadata item ids | |
| Reference.proto | References a table, column, or row |
The following sections will walk you through the various features of the table schema and how they work.
Note: All JSON properties are serialized using camel casing even though the protobuf schema uses snake casing due to protobuf's JSON Mapping.
Message field names are mapped to lowerCamelCase and become JSON object keys.
Figure 1 is a simple two-dimensional array of data that you might retrieve from a relational database. It has columns and rows, and all data is denormalized. This is the most basic representation of a table, and is typically what travels through services that process tabular data.
This same table may also be rendered by grouping columns and/or creating a parent/child relationship between columns. By grouping these columns into one column, like in Figure 2, it reduces the visual complexity of repetitive information and makes it easier to consume for a human.
In order to make a group, the SeriesDefinition.next_sibling_id and/or the SeriesDefinition.parent_id property is specified. Using these two properties, a tree of columns can be specified. Columns in the same tree are in the same group.
In order to create a group level, the SeriesDefinition.parent_id property is specified. A parent is at a level one higher than its child. For example, the values in the region column are one level higher than the values in the continent1 column and the continent2 column. The values in the continent1 and continent2 column are at the same level because they both have the same parent column.
Note: Any columns can be grouped, as long as all columns in a group are either a dimension or not a dimension.
Figure 3: Grouped columns with parent/child relationships
{
"definition": {
"columns": [
{
"id": "c_function",
"name": "function",
"description": "Function",
"type": "STRING",
"isDimension": true,
"nextSiblingId": "c_region" // In the same group as c_region, c_continent1, and c_continent2
},
{
"id": "c_region",
"name": "region",
"description": "Region",
"type": "STRING",
"isDimension": true,
"nextSiblingId": "" // In the same group as c_function, c_continent1, and c_continent2
},
{
"id": "c_continent1",
"name": "continent1",
"description": "Continent 1",
"type": "STRING",
"isDimension": true,
"parentId": "c_region" // In the same group as c_function, c_region, and c_continent2
},
{
"id": "c_continent2",
"name": "continent2",
"description": "Continent 2",
"type": "STRING",
"isDimension": true,
"parentId": "c_region" // In the same group as c_function, c_region, and c_continent1
}
]
}
}
Headers can be added to further describe the columns of data. Headers data is stored in a separate table that is rotated 90 degrees where the rows in the headers table are linked to the columns in the main table by using the SeriesDefinition.header_id property.
The first four columns, function, region, continent1, and continent2, play a special role. These columns are known as dimensions, specified by the SeriesDefinition.is_dimension property, because they describe the data.
Dimensions are not required, however. Simple datasets, like Figure 1, can also be described using this schema without specifying any dimensions, making it flexible enough to accommodate simpler use cases where dimension information is not specified.
{
"definition": {
"columns": [
{
"id": "c_continent1",
"name": "continent1",
"description": "Continent 1",
"type": "STRING",
"isDimension": true, // A dimenion if true, otherwise not a dimension
"parentId": "c_region"
}
]
}
}
It is possible to transpose the dimensions in the main table with the headers. Figure 8 demonstrates how the table in Figure 4 would be transposed. Notice that all information is still grouped in the same way. No information was lost. We do see some new column names though, namely year, category, and portfolioType. The function, region, continent1, and continent2 dimension columns still exist, but they are now columns in the headers table instead of columns in the main table. If you were to transpose this table once more, you would get Figure 4 again.
Note: In order to transpose a table, all non-dimension columns must have the same
DataTypein theSeriesDefintion.typeproperty.
Figure 9: Transposed table without headers
When transferring data over the wire, the smaller it is the better, which is why this schema also supports compressing the data. By using the SeriesData.ranges property, each item specifies the uncompressed index (as the key) and the length (as the value) to repeat the data.
Note: The index is referring to the uncompressed index because it requires less memory to compress and decompress.
Figure 10 and Figure 11 illustrate the difference between uncompressed and compressed data.
"c_continent1": {
"stringArray": {
"values": [
"null", // All of these "null" values are unnecessarily repeated
"null",
"Americas",
"Asia Pacific",
"Europe",
"Middle East and Africa",
"null",
"null",
"null",
"null",
"null",
"null",
"null",
"null"
]
}
}
"c_continent1": {
"ranges": {
{ 0, 2 },
{ 6, 7 }
},
"stringArray": {
"values": [
"null", // All of these "null" values are compressed
"Americas",
"Asia Pacific",
"Europe",
"Middle East and Africa",
"null"
],
}
}
The SeriesFormat message is used to facilitate rendering a human readable table. It is intended as a strong suggestion on how to render the data. By storing this information along with the data, applications can immediately render the table intelligently without having knowledge of the data.
{
"definition": {
"columns": [
{
"id": "c_f0",
"name": "fund0",
"description": "Fund 0",
"type": "DOUBLE",
"format": {
"format": "{0:0.00}",
"nullFormat": "--",
"halign": "RIGHT"
}
}
]
}
}
The SeriesFormat.format property is used to render a raw value into a human readable string. Refer to Formatting Types for what strings can be used.
The SeriesFormat.null_format property to is used to render a null value with a special string, like -- or @NA.
The protobuf schema does not allow null values in a repeated, so in order to denote a null value, a specific value is used. Most values are the max value for that DataType unless there is a more appropriate value, like "NaN" for doubles and floats.
| Type | Null Value |
|---|---|
| double | "NaN" |
| Duration | "3652500.00:00:00.9999999" |
| float | "NaN" |
| int32 | 2147483647 |
| int64 | 9223372036854775807 |
| string | "null" |
| Timestamp | "12/31/9999 11:59:59 PM" |
Use the SeriesFormat.halign and SeriesFormat.valign properties to specify how the data in a series should be aligned within the table cell.
In order for tabular data to mean something, often metadata needs to be attributed to the entire table, or to a particular column, row, or cell. This is useful in providing functionality beyond the raw data. For instance, a URL might be attributed to one of the numbers that links to research notes about that number, or settings might be attributed to one of the columns that documents what engine calculated that data and how that column was calculated.
You can also reference data from another table by using the MetadataItem.refValue property, including a whole table, or a specific row or column.
In order to add metadata, you have to
- Add a
MetadataItemto theMetadataCollection.itemsproperty - Create a reference to that
MetadataItemin theMetadataCollection.locationsproperty
Metadata can be attributed to the entire table by referencing a MetadataItem in the MetadataLocations.table property.
Figure 13: Table level metadata
{
"data": {
"metadata": {
"items": {
"m_h": { // The id of the metadata item
"name": "header",
"stringValue": "Risk Analysis"
}
},
"locations": {
"table": [ // The location is the entire table
"m_h" // The id of the metadata item
]
}
}
}
}
Metadata can be attributed to an entire row by referencing a MetadataItem in the MetadataLocations.rows property.
Metadata can be attributed to an entire column by referencing a MetadataItem in the MetadataLocations.columns property.
{
"data": {
"metadata": {
"items": {
"m_foobar": { // The id of the metadata item
"name": "foobar",
"stringValue": "Foo Bar"
}
},
"locations": {
"rows": { // The location is an entire row,
// (or "columns" to target an entire column)
"r_0": { // The id of the row
// (or column)
"ids": [
"m_foobar" // The id of the metadata item
]
}
}
}
}
}
}
Metadata can be attributed to the cells in an entire row by referencing a MetadataItem that references another table's rows or columns in the MetadataLocations.rows property.
Metadata can be attributed to the cells in an entire column by referencing a MetadataItem that references another table's rows or columns in the MetadataLocations.columns property.
Figure 15: Cells level metadata
{
"data": {
"metadata": {
"items": {
"m_regionUrl": {
"name": "regionUrl",
"refValue": {
"tableId": "main",
"columnId": "c_regionUrl" // If the reference is a column
// or row, it indicates cell level metadata
}
}
},
"locations": {
"columns": {
"c_region": {
"ids": [
"m_regionUrl"
]
}
}
}
}
}
}
The Reference message allows us to specify a relationship between metadata and data that exists elsewhere, either in this table or another table.
Reference.table_id should always be specified.
If Reference.column_id is specified, then the data exists in a column, which means that it must match the same number of rows as there are in this table.
If Reference.row_id is specified, then the data exists in a row, which means that it must match the same number of columns as there are in this table.
{
"data": {
"metadata": {
"items": {
"m_regionUrl": {
"name": "regionUrl",
"refValue": { // This object is a Reference
"tableId": "main", // References a table's id
"columnId": "c_regionUrl" // References a column's id
}
}
}
]
}
}
Hidden
Since a column of data can represent metadata, it it likely the case that that data should not be rendered directly, but rather in an interactive way, like a tooltip, or a hyperlink. In this scenario, you can make a column hidden by setting the SeriesDefinition.is_hidden property to true.
Note: Do not use this property to indicate that a row is expanded or collapsed. Consider adding a metadata column to indicate that.
{
"definition": {
"columns": [
{
"id": "c_regionUrl",
"name": "regionUrl",
"description": "Region Url",
"type": "STRING",
"isDimension": true,
"isHidden": true // Column will not be rendered because it is hidden
}
]
}
}







