Skip to content

egor2 format specification

Pavel N. Krivitsky edited this page May 11, 2019 · 2 revisions

The purpose of this page is to have and discuss the current detailed specification for the relational egor's internal representation.

Table of Contents

Note that this table does not update automatically. Rather, you need to go to https://tableofcontents.herokuapp.com/, paste the contents of the page after the table into it, then replace the table with the output. (You might also need to delete the first four spaces on each line.)

egor data structure

Base class

An egor object is a subclass of list with certain special-purpose elements and additional attributes.

egor list elements

Ego table (egos)

This table is a srvyr object containing information about how the egos were selected. The columns of egos contain ego attributes.

Special colums

.egoID integer: a unique identifier for each ego. Always the last column.

Alter table (alts)

This table is a tibble containing the alter data, with columns containg alter attributes or attributes of the ego-alter relation.

Special colums

.egoID integer: an identifier of the ego that had nominated that alter. Joins with egos$.egoID. Always the penultimate column.

.altID integer: a unique (within a given .egoID) identifier for each alter. Always the last column.

Alter-alter ties table (aaties)

This table is a tibble containing the alter data, with columns containg attributes of the alter-alter relation.

Special colums

.egoID integer: an identifier of the ego that had nominated that alter. Joins with egos$.egoID and alts$.egoID. Always the third-to-last column.

.srcID, .tgtID integer: identifiers of the two alts whose relation is being stored. Joins with alts$.altID. Always the last column.

Alter design (alt.design)

A list containing information about how the data about alts were collected. Currently, this includes:

  • max (required): Maximum number of alters an ego was allowed to nominate. Set to +Inf if no limit.

Invariants

Since the special columns are meant to be keys for joining the tables, accessors and modifiers must preserve certain invariants:

  • No two egos rows may have the same .egoIDs: any operations that duplicate ego rows must also create new ego IDs.
  • No two alts rows may have the same (.egoID,altID) combination: any operations that duplicate ego rows must also create new ego IDs and copy their alters.
  • No two aaties rows may have the same (.egoID,srcID,.tgtID) combination: any operations that duplicate ego or alter rows must also create new ego IDs and copy their alters.
  • Special columns must always be the last columns in their respective tibble. Transformation and subsetting methods must resist attempts to remove or reorder them.

Reserved column names

In general, the end-user should not have persistent data columns whose names begin with a dot (.). This will help ensure that data columns will not accidentally mask variables when using non-standard evaluation like subset.egor() does. When using placeholder or ephemeral variables, the user should also be aware that the following have been reserved for egor's use:

  • .egoID, .altID, .srcID, .tgtID, .egoRow, .altRow, .srcRow, .tgtRow

Current questions

  1. Should alter design be an egor list element or an attribute?
  2. To implement tidygraph-style semantics, should the currently activated attribute be a list element or an attribute?
  3. Are the invariants too strict?
  4. Should the user be able to manually specify the .egoID, .altID, etc., and should they be allowed to be characters as well?