Description
Term description
This stems from this question on Stack Overflow: generalising it, suppose that each node i
has some set A[i]
of properties (I am avoiding "attributes", since we use that term elsewhere.). We wish to specify a dyadic predictor that, in pseudocode, can be represented x[i,j] = length(intersect(A[i], A[j]))
(the number of properties i
and j
have in common) or x[i,j] = length(intersect(A[i], A[j])) > 0
(whether i
and j
have any properties in common).
Some examples:
A[i]
is the set of languagesi
speaks, and we wish to use an indicator of whetheri
andj
speak at least one common language as a predictor of their interaction. (This is from Stack Overflow.)A[i]
is a list ofi
's hobbies, and we wish to use the number of hobbiesi
andj
have in common to predict acquaintance.A[i]
is a list of placesi
visited over the course of a day (e.g., from a contact diary), and we wish to use the number of common areas visited byi
andj
to predict whether they had a contact.
This seems like something that can be useful in a variety of circumstances.
A further generalisation of this concept is to make A[i]
a mapping that maps property k
to some value (e.g., proficiency in a language) so that, e.g., x[i,j] = max[k](min(A[i][k], A[j][k]))
(or some other "interacting" and "combining" functions in place of min()
and max[k]()
, respectively). In the language example, this predictor represents the proficiency of the less-proficient actor in the two actors' best common language (where "best common language" is the language in which the less-proficient actor has the highest proficiency).
In all cases, this would be a dyad-independent term, so in principle representable with edgecov()
.
Questions
- How broadly useful would this be? I suspect @CarterButts and @mbojan might have some applications I hadn't thought about.
- Would the generalisation to a mapping be useful? What "interacting" and "combining" functions would be useful?
- What would be an efficient way to implement these?
- What kind of a user interface (required data format and syntax) would we want for this term?