Description
Generalized Distributional TV
Overview
This is a proposal for a new TV type that encompasses probabilistic,
fuzzy, distributional TV types and more. In short it is a
distributional TV that may work over any domain, not just
boolean/fuzzy, equipped with counts to quantify uncertainty.
Disclaimer: things are presented under the angle of Kolmogorov
probability theory, mostly cause I feel more comfortable with it.
Omega
Omega is a probability space representing the subjective universe. We
learn about Omega via random variables. A random variable X: Omega->E
is a measurable function from domain Omega to co-domain E.
Any atom in the AtomSpace associated with a TV represents a random
variable. The TV (or GDTV as defined below) describes the distribution
over the co-domain of that random variable, and the atom defines the
name (or the structure) of that random variable and its relationships
to other random variables, i.e. other atoms.
When talking about the co-domain of an atom A we mean the co-domain of
the associated random variable of the TV of that atom.
Definition
We introduce a new TV type called Generalized Distributional Truth
Value, or GDTV for short. It could also be called merely
Distributional Value, or DV for short.
Unconditional GDTV
An unconditional GDTV of atom A is a pair <D,N> where
- D is a probability distribution over E, the co-domain of A.
- N is a count, a non negative real number.
In practice, discrete case, the probability measure will be described
as a mass function over a partition of E summing up to 1, following
the format
<{k1:p1, ..., kn:pn}, N>
where k1 to kn are the partition blocks (often singletons or
intervals), and p1 to pn are probabilities summing up to 1.
Conditional GDTV
More generally a GDTV is a conditional probability distribution. In
the discrete case, a GDTV over 2 atoms A and B is defined as a map
from a partition of the co-domain of A to a unconditional GDTV of B, as
defined above. The whole thing forms a conditional probability matrix,
equipped with a count on each row. The format may look like
{ka1: <{kb1:p11, ..., kbm:p1m}, N1>,
...
kan: <{kb1:pn1, ..., kbm:pnm}, Nn>}
where ka1 to kan are partition blocks over the co-domain of A, and kb1
to kbm are partition blocks over the co-domain of B.
Examples
Unconditional GDTV
Perfect boolean sensor
Let's assume our window to the universe is a perfect and timeless
boolean sensor A. Since the sensor is boolean the co-domain E of the
associated random variable is {0,1}. One may represent an active
sensor as follows
Evaluation <{1:1}, +inf>
Predicate "sensor"
Concept "A"
meaning that the random variable with name/structure
Evaluation
Predicate "sensor"
Concept "A"
has probability
P(X=1)=1 with count +inf.
The first 1 in {1:1} indicates the sensor's value, the second
indicates its probability. There is a bit of notational abuse because
the first 1 represents in fact a singleton {1}.
Likewise an inactive sensor would be represented as
Evaluation <{0:1}, +inf>
Predicate "sensor"
Concept "A"
The count is +inf as we assume that the sensor is perfect, i.e. based
on infinite evidences.
Imperfect boolean sensor
Let's assume that sensor B is imperfect and is faulty one percent of
the time based on a million observations. One may represent such a
random variable as follows
Evaluation <{1:0.99, 0:0.01}, 1M>
Predicate "sensor"
Concept "B"
Perfect fuzzy sensor
Let's assume that perfect sensor C can take any value within [0,1] and
is true with degree 0.9786 with absolute certainty.
Evaluation <{0.9786:1}, +inf>
Predicate "sensor"
Concept "C"
Imperfect boolean sensor with infinite certainty
Let's assume that imperfect boolean sensor D has probability 0.9786 of
being active, and we're absolutely sure of that probability.
Evaluation <{1:0.9786}, {0:0.0214} +inf>
Predicate "sensor"
Concept "D"
Imperfect fuzzy sensor
Let's assume that imperfect sensor E can take any value within [0,1],
and background knowledge suggests (after testing this sensor a
thousand of times) we may safely discretize it in 4 bins.
Evaluation <{[0,0.25):0.002,
[0.25,0.5):0.008,
[0.5,0.75):0.2,
[0.75,1]:0.7}, 1K>
Predicate "sensor"
Concept "E"
where [x,y) represent the left-closed right-open interval from x to y.
Imperfect non-boolean, non-fuzzy sensors
What if we want to describe random variable with co-domain other than
[0,1]? Then we can no longer use predicates, but we can still use
schemata!
Say we want to represent the distribution of heights in a population
of 10K individuals. We may describe it as follows
ExecutionOutput <{[0cm,100cm):0.2,
[100cm,160cm):0.3,
[160cm,240cm):0.5}, 10K>
Schema "in-cm"
Concept "height"
We may actually represent the intervals using atoms. For instance
[0cm,100cm) might be described as
LeftClosedRightOpenIntervalLink
QuantityLink
NumberNode 0
UnitNode "cm"
QuantityLink
NumberNode 100
UnitNode "cm"
or equivalently
QuantityLink
LeftClosedRightOpenIntervalLink
NumberNode 0
NumberNode 100
UnitNode "cm"
Note: QuantityLink and UnitNode are currently undefined types and used
for expository purpose.
Conditional GDTV
SubSet
Assume we have 2 crisp concepts A and B. The exact probability
distribution of P(B|A), with P(B=0|A=0)=0.3 and P(B=1|A=0)=0.7,
P(B=0|A=1)=0.1 and P(B=1|A=1), may be represented as
SubSet {0:<{0:0.3, 1:0.7}, +inf>,
1:<{0:0.1, 1:0.9}, +inf>}
Concept "A"
Concept "B"
ExtensionalImplication over schemata
Assume we have 2 schemata over individuals, one that outputs the age,
another that outputs the height. There are 3 individuals, I1, I2 and I3
with ages 10, 50 and 100 yo respectively, and with heights 120, 172
and 169 cm respectively.
The schema "age" is defined as follows
Execution <{1:1}, +inf>
Schema "age"
Concept "I1"
Number 10
Execution <{1:1}, +inf>
Schema "age"
Concept "I2"
Number 50
Execution <{1:1}, +inf>
Schema "age"
Concept "I3"
Number 100
One may also use the equivalent representation
ExecutionOuput <{10:1}, +inf>
Schema "age"
Concept "I1"
ExecutionOuput <{50:1}, +inf>
Schema "age"
Concept "I2"
ExecutionOuput <{100:1}, +inf>
Schema "age"
Concept "I3"
The schema "height" is defined as follows
Execution <{1:1}, +inf>
Schema "height"
Concept "I1"
Number 120
Execution <{1:1}, +inf>
Schema "height"
Concept "I2"
Number 173
Execution <{1:1}, +inf>
Schema "height"
Concept "I3"
Number 169
One may also use the equivalent representation
ExecutionOuput <{120:1}, +inf>
Schema "height"
Concept "I1"
ExecutionOuput <{172:1}, +inf>
Schema "height"
Concept "I2"
ExecutionOuput <{169:1}, +inf>
Schema "height"
Concept "I3"
The conditional distributional probabilities of the height given the
age, assuming 2 partitions for age (0 to 40, and 40 to 130) and 2
partitions for height (0 to 150, and 150 to 200) may be represented as
ExtensionalImplication {[0,40):<{[0,150):1}, 1>,
[40,130):<{[150,200):1}, 2>}
Schema "age"
Schema "height"
meaning that based on 1 individual, being below 40 implies being under
150cm, and based on 2 individuals, being 40 or more implies being
150cm or more.
Note: we may want to introduce a new link type for schemata instead of
using ExtensionalImplication. At least there is a small rational for
using ExtensionalImplication instead of SubSet as a predicate is a
kind of schema.
Relationship with existing TVs
Probabilistic simple TV
A probabilistic simple TV <s, N>
is represented by the following GDTV
<{1:s}, N>
Fuzzy simple TV
A fuzzy simple TV <s, N>
is represented by the following GDTV
<{s:1}, N>
Distributional TV
As we've seen a GDTV is essentially a DTV + a count, with its
distribution over any co-domains.
Generic TV
I've never used Generic TV but I recall having a discussing with Linas
as to why he introduced them and one of the reasons was to store the
mutual entropy of pairs of random variables. With GDTV this is
possible, see Section "Other operators".
Uncertainty
The uncertainty of a GDTV is represented by its count, or counts if it
is a conditional GDTV. Counts are real non negative numbers. Although
integers will be used most of the time, having real numbers allow
counts derived from inferences to not be unnecessarily truncated into
integers (this may or not turn out to be useful). The count reflects a
second order distribution over the distributions over the co-domain,
and should be used to estimate the counts of inferred conclusions. We
could start experimenting with Dirichlet distributions as second order
distributions.
Of course a confidence number can be used as well, however I believe
the count has a clearer semantics and should be the default.
I suspect we want to drop the lookahead K as defined in the indefinite
truth value and use plain Dirichlet prior instead (corresponding to K
tends to +inf). The PLN books claims that using small Ks helps to
decrease the premise/conclusion interval expansion rate. Since the PLN
book doesn't back it up with data I challenge that claim. Ultimately
experiments will have to be carried out to determine what is best.
ExtensionalImplication over non-predicate
As we've seen in the example one may define ExtensionalImplication not
just over predicates but also schemata. One may push this even further
and define extensional implication over almost any atom type, using
Omega as underlying probability space, even though the elements of the
partitions taken into consideration are undefined.
For instance let's define
Evaluation <{0:0.5, 1:0.5}, 200>
P
A
Evaluation <{0:0.5, 1:0.5}, 200>
Q
B
That is P(A) (resp. Q(B)) is true half of the time, based on 200
observations each. However we may also know that they are extremely
correlated and when P(A) is true P(B) has 99% change of being
true. One may represent that with
ExtensionalImplication {0:<{0:0.99, 1:0.01}, 100>,
1:<{0:0.01, 1:0.99}, 100>
Evaluation
P
A
Evaluation
Q
B
Note: in the PLN book this is conceptually handled by defining
associated predicates of these evaluations over the domain of
contexts. However I don't think it is necessary as conditional
probabilities can be directly defined based on the probability space
Omega.
Again instead of using ExtensionalImplication we might want to define
a new link type.
Relationships between random variables (atoms and GDTVs)
As we know the AtomSpace is not just a way to represent random
variables, but how they relate to each others. These relations allow
us to create abstract structures, not necessarily true (most of them
define concepts or predicates with null probabilities in Omega), but
useful to reason with and ultimately infer new knowledge about Omega
itself.
Examples
Funny people at the party
Let's build abstract sets of people at a party, funny, and how they
relate to each others.
Let's define the members of concept "at-the-party"
Member <{1:1}, +inf>
Concept "John"
Concept "at-the-party"
Member <{1:1}, +inf>
Concept "Mary"
Concept "at-the-party"
Member <{1:1}, +inf>
Concept "Claude"
Concept "at-the-party"
Member <{0:1}, +inf>
Concept "David"
Concept "at-the-party"
saying that we live in a universe where John, Mary and Clause are at
the party.
Let's define the members of concept "funny"
Member <{1:1}, +inf>
Concept "John"
Concept "funny"
Member <{1:1}, +inf>
Concept "Mary"
Concept "funny"
Member <{0:1}, +inf>
Concept "Claude"
Concept "funny"
Member <{1:1}, +inf>
Concept "David"
Concept "funny"
saying that we live in a universe where John, Mary and David are
funny.
The concepts "at-the-party" or "funny" are very likely not meaningful
subsets of Omega. We can give them null TVs because the set of people
a party is in no way a description of the possible world we're in. But
that doesn't matter, what matters is the abstract relationships we can
build between them. So we may have
Concept "funny" <{0:1}, +inf>
Concept "at-the-party" <{0:1}, +inf>
meaning that "funny" and "at-the-party' are almost surely false in our
subjective universe (neither John, nor Mary, etc, are possible
worlds). However we can still calculate the following
SubSet <{0:<{0: 1}, 1>,
1:<{0: 0.3333, 1:0.6666}, 3>}, 4>
Concept "at-the-party"
Concept "funny"
saying that based on one evidence (David), people not at the party are
funny, and based on 3 evidences, people at the party are funny with
probability 2/3.
The formula used is to calculate the later is obviously
(funny(John) + funny(Mary) + funny(Clause))
/
(at-the-part(John) + at-the-party(Mary) + at-the-party(Clause))
Let's now assume that the knowledge of these people being at the party
or funny is uncertain, so we have for instance
Member <(1: 0.6, 0:0.4), 1K>
Concept "John"
Concept "funny"
etc.
Then the calculation above would differ because funny(John), etc, in
fact random variables, thus their sum turns into a convolution product
(assuming they are independent).
Explicitly defining Omega
Above we defined abstract concepts like at-the-party and funny, and we
assumed that these concepts were almost surely false, but we may well
assume the opposite.
Let's assume the following Atom in the AtomSpace
Concept "A" <{1:1}, +inf>
meaning that the random variable with name/structure
Concept "A"
is almost surely true.
Now let's add the following
Member <{1:1}, +inf>
Number 1
Concept "A"
...
Member <{1:1}, +inf>
Number 10
Concept "A"
As well as
Member <{0:1}, +inf>
Number 11
Concept "A"
...
Member <{0:1}, +inf>
<anything-but-numbers-from-1-to-10>
Concept "A"
I'm using this trick so I don't
have to deal with universal quantifiers. Basically what it says is
that A contains the numbers from 1 to 10 and nothing else. But given
that A almost surely true, this allows us to actually define Omega (or
at least the subjective part that matters to us)!
I'm not sure how useful that is that but I thought I'd mention it. In
practice we don't know what Omega is, it is open-ended, as is our
universe.
Fuzzy Operators
And, Or and Not links can be seen as fuzzy operators over random
variables with co-domains [0,1]. For instance in
And <GDTV>
Evaluation <GDTV-1>
P
A
Evaluation <GDTV-2>
Q
B
GDTV corresponds to the distribution of the random variable defined as
a minimum between (Evaluation P A) and (Evaluation Q B). If for
instance
GDTV-1 = <{0.7:1}, +inf>
GDTV-2 = <{0.5:1}, +inf>
then
GDTV = <{0.5:1}, +inf>
If these variables have independent distributions then the resulting
probabilities for each resulting key min(h, k) are multiplied and
added, etc. For instance
And
Evaluation <{0.2:0.1, 0.4:0.4, 0.6:0.3, 0.8:0.2}, +inf>
P
A
Evaluation <{0.1:0.2, 0.2:0.3, 0.4:0.3, 0.6:0.2}, +inf>
Q
B
will have as resulting GDTV
<{0.1:0.1*0.2 + 0.4*0.2 + ...,
0.2:0.1+0.3 + 0.4*0.3 + ...,
...}, +inf>
Other Operators
Operators over any random variables' co-domain can be used. For
instance
Plus <DGTV-1 + DGTV-2>
ExecutionOutput <DGTV-1>
Schema "S1"
Concept "A1"
ExecutionOutput <DGTV-2>
Schema "S2"
Concept "A2"
where <DGTV-1 + DGTV-2> corresponds to the TV of the random variable
which is the sum of the random variables of GDTV-1 and GDTV-2. For
instance assuming
GDTV-1 = <{10:0.9, 0:0.1}, +inf>
GDTV-2 = <{32:0.6, 0:0.4}, +inf>
and are independent then
GDTV-1 + GDTV-2 = <{42:0.54, 32:0.06, 10:0.36, 0:0.04}, +inf>
Operators like List constructions are very useful and can be used to
build random vectors, such as random pairs of variables to store the
necessary information to compute their mutual entropies, as shown
below.
Bayesian network
GDTV allows us to represent Bayesian networks. Consider the BN in
https://en.wikipedia.org/wiki/Bayesian_network#Example
S = Sprinkler
R = Rain
G = Grass wet
Each random variables are boolean, however not all are conditionally
independent, for instance G depends on S and R combined. GDTV allows
us to represent these dependencies.
Extensional implication between Rain and Sprinkler:
ExtImplication <{0:{1:0.4, 0:0.6},
1:{1:0.01, 0:0.99}}, +inf>
R
S
Extensional implication between (Sprinkler, Rain) and Grass. Note the
use of List to build a random pair of variables:
ExtImplication <{(0, 0):{1:0.0, 0:1.0},
(0, 1):{1:0.8, 0:0.2},
(1, 0):{1:0.9, 0:0.1},
(1, 1):{1:0.99, 0:0.01}, +inf>
List
S
R
G
Where (0, 0) to (1, 1) enumerate all possible boolean pairs, that is
the possible values of the co-domain of (List S R). One may see how
that information is sufficient to compute the mutual information
between S and R.
Probabilistic Programming Language
Likewise one can represent the conditional distribution of a program's
output, or program subfunctions using GDTV. In that case a GDTV query
using the BC chainer would amount to a PPL variable query. One may
probably design PLN rules to perform the kind of Monte-Carlo
simulations used by PPL to answer queries.
Mixed extensional-intensional relationships
In the PLN book, mixed implication is the disjunction of extensional
and intensional implication (likewise for inheritance). The definition
of intensional implication using GDTV naturally follows as it is
merely an extensional implication in pattern space. The disjunction
between two conditional GDTVs however is not trivial. In the PLN book
the way to go about that would be to turn the conditional DGTV into a
simple TV according to the formula Section 2.4.1.1
s = sum_x f(P(x), Q(x)) / sum_x P(x)
where f is min for instance. Then whether the simple TV is fuzzy
<{s:1}, N>
or probabilistic
<{1:s, 0:1-s}, N>
the disjunctions between these is trivially defined.
There other ways to go about, as there was many ways to turn a
conditional GDTV into a unconditional one, and then disjunction is
trivially defined as well. I'm personally not sure what is the right
way, and ultimately this might may require experimentations.