Generalized Distributional TV proposal

# Generalized Distributional TV
## Overview

This is a proposal for a new TV type that encompasses probabilistic,
fuzzy, distributional TV types and more. In short it is a
distributional TV that may work over any domain, not just
boolean/fuzzy, equipped with counts to quantify uncertainty.

Disclaimer: things are presented under the angle of Kolmogorov
probability theory, mostly cause I feel more comfortable with it.
## Omega

Omega is a probability space representing the subjective universe. We
learn about Omega via random variables. A random variable X: Omega->E
is a measurable function from domain Omega to co-domain E.

Any atom in the AtomSpace associated with a TV represents a random
variable. The TV (or GDTV as defined below) describes the distribution
over the co-domain of that random variable, and the atom defines the
name (or the structure) of that random variable and its relationships
to other random variables, i.e. other atoms.

When talking about the co-domain of an atom A we mean the co-domain of
the associated random variable of the TV of that atom.
## Definition

We introduce a new TV type called Generalized Distributional Truth
Value, or GDTV for short. It could also be called merely
Distributional Value, or DV for short.
### Unconditional GDTV

An unconditional GDTV of atom A is a pair <D,N> where
1. D is a probability distribution over E, the co-domain of A.
2. N is a count, a non negative real number.

In practice, discrete case, the probability measure will be described
as a mass function over a partition of E summing up to 1, following
the format

```
<{k1:p1, ..., kn:pn}, N>
```

where k1 to kn are the partition blocks (often singletons or
intervals), and p1 to pn are probabilities summing up to 1.
### Conditional GDTV

More generally a GDTV is a conditional probability distribution. In
the discrete case, a GDTV over 2 atoms A and B is defined as a map
from a partition of the co-domain of A to a unconditional GDTV of B, as
defined above. The whole thing forms a conditional probability matrix,
equipped with a count on each row. The format may look like

```
{ka1: <{kb1:p11, ..., kbm:p1m}, N1>,
 ...
 kan: <{kb1:pn1, ..., kbm:pnm}, Nn>}
```

where ka1 to kan are partition blocks over the co-domain of A, and kb1
to kbm are partition blocks over the co-domain of B.
## Examples
### Unconditional GDTV
#### Perfect boolean sensor

Let's assume our window to the universe is a perfect and timeless
boolean sensor A. Since the sensor is boolean the co-domain E of the
associated random variable is {0,1}. One may represent an active
sensor as follows

```
Evaluation <{1:1}, +inf>
  Predicate "sensor"
  Concept "A"
```

meaning that the random variable with name/structure

```
Evaluation
  Predicate "sensor"
  Concept "A"
```

has probability

P(X=1)=1 with count +inf.

The first 1 in {1:1} indicates the sensor's value, the second
indicates its probability. There is a bit of notational abuse because
the first 1 represents in fact a singleton {1}.

Likewise an inactive sensor would be represented as

```
Evaluation <{0:1}, +inf>
  Predicate "sensor"
  Concept "A"
```

The count is +inf as we assume that the sensor is perfect, i.e. based
on infinite evidences.
#### Imperfect boolean sensor

Let's assume that sensor B is imperfect and is faulty one percent of
the time based on a million observations. One may represent such a
random variable as follows

```
Evaluation <{1:0.99, 0:0.01}, 1M>
  Predicate "sensor"
  Concept "B"
```
#### Perfect fuzzy sensor

Let's assume that perfect sensor C can take any value within [0,1] and
is true with degree 0.9786 with absolute certainty.

```
Evaluation <{0.9786:1}, +inf>
  Predicate "sensor"
  Concept "C"
```
#### Imperfect boolean sensor with infinite certainty

Let's assume that imperfect boolean sensor D has probability 0.9786 of
being active, and we're absolutely sure of that probability.

```
Evaluation <{1:0.9786}, {0:0.0214} +inf>
  Predicate "sensor"
  Concept "D"
```
#### Imperfect fuzzy sensor

Let's assume that imperfect sensor E can take any value within [0,1],
and background knowledge suggests (after testing this sensor a
thousand of times) we may safely discretize it in 4 bins.

```
Evaluation <{[0,0.25):0.002,
             [0.25,0.5):0.008,
             [0.5,0.75):0.2,
             [0.75,1]:0.7}, 1K>
  Predicate "sensor"
  Concept "E"
```

where [x,y) represent the left-closed right-open interval from x to y.
#### Imperfect non-boolean, non-fuzzy sensors

What if we want to describe random variable with co-domain other than
[0,1]? Then we can no longer use predicates, but we can still use
schemata!

Say we want to represent the distribution of heights in a population
of 10K individuals. We may describe it as follows

```
ExecutionOutput <{[0cm,100cm):0.2,
                  [100cm,160cm):0.3,
                  [160cm,240cm):0.5}, 10K>
  Schema "in-cm"
  Concept "height"
```

We may actually represent the intervals using atoms. For instance
[0cm,100cm) might be described as

```
LeftClosedRightOpenIntervalLink
  QuantityLink
    NumberNode 0
    UnitNode "cm"
  QuantityLink
    NumberNode 100
    UnitNode "cm"
```

or equivalently

```
QuantityLink
  LeftClosedRightOpenIntervalLink
    NumberNode 0
    NumberNode 100
  UnitNode "cm"
```

Note: QuantityLink and UnitNode are currently undefined types and used
for expository purpose.
### Conditional GDTV
#### SubSet

Assume we have 2 crisp concepts A and B. The exact probability
distribution of P(B|A), with P(B=0|A=0)=0.3 and P(B=1|A=0)=0.7,
P(B=0|A=1)=0.1 and P(B=1|A=1), may be represented as

```
SubSet {0:<{0:0.3, 1:0.7}, +inf>,
        1:<{0:0.1, 1:0.9}, +inf>}
  Concept "A"
  Concept "B"
```
#### ExtensionalImplication over schemata

Assume we have 2 schemata over individuals, one that outputs the age,
another that outputs the height. There are 3 individuals, I1, I2 and I3
with ages 10, 50 and 100 yo respectively, and with heights 120, 172
and 169 cm respectively.

The schema "age" is defined as follows

```
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I1"
  Number 10
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I2"
  Number 50
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I3"
  Number 100
```

One may also use the equivalent representation

```
ExecutionOuput <{10:1}, +inf>
  Schema "age"
  Concept "I1"
ExecutionOuput <{50:1}, +inf>
  Schema "age"
  Concept "I2"
ExecutionOuput <{100:1}, +inf>
  Schema "age"
  Concept "I3"
```

The schema "height" is defined as follows

```
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I1"
  Number 120
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I2"
  Number 173
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I3"
  Number 169
```

One may also use the equivalent representation

```
ExecutionOuput <{120:1}, +inf>
  Schema "height"
  Concept "I1"
ExecutionOuput <{172:1}, +inf>
  Schema "height"
  Concept "I2"
ExecutionOuput <{169:1}, +inf>
  Schema "height"
  Concept "I3"
```

The conditional distributional probabilities of the height given the
age, assuming 2 partitions for age (0 to 40, and 40 to 130) and 2
partitions for height (0 to 150, and 150 to 200) may be represented as

```
ExtensionalImplication {[0,40):<{[0,150):1}, 1>,
                        [40,130):<{[150,200):1}, 2>}
  Schema "age"
  Schema "height"
```

meaning that based on 1 individual, being below 40 implies being under
150cm, and based on 2 individuals, being 40 or more implies being
150cm or more.

Note: we may want to introduce a new link type for schemata instead of
using ExtensionalImplication. At least there is a small rational for
using ExtensionalImplication instead of SubSet as a predicate is a
kind of schema.
## Relationship with existing TVs
### Probabilistic simple TV

A probabilistic simple TV `<s, N>` is represented by the following GDTV

```
<{1:s}, N>
```
### Fuzzy simple TV

A fuzzy simple TV `<s, N>` is represented by the following GDTV

```
<{s:1}, N>
```
### Distributional TV

As we've seen a GDTV is essentially a DTV + a count, with its
distribution over any co-domains.
### Generic TV

I've never used Generic TV but I recall having a discussing with Linas
as to why he introduced them and one of the reasons was to store the
mutual entropy of pairs of random variables. With GDTV this is
possible, see Section "Other operators".
## Uncertainty

The uncertainty of a GDTV is represented by its count, or counts if it
is a conditional GDTV. Counts are real non negative numbers. Although
integers will be used most of the time, having real numbers allow
counts derived from inferences to not be unnecessarily truncated into
integers (this may or not turn out to be useful). The count reflects a
second order distribution over the distributions over the co-domain,
and should be used to estimate the counts of inferred conclusions. We
could start experimenting with Dirichlet distributions as second order
distributions.

Of course a confidence number can be used as well, however I believe
the count has a clearer semantics and should be the default.

I suspect we want to drop the lookahead K as defined in the indefinite
truth value and use plain Dirichlet prior instead (corresponding to K
tends to +inf). The PLN books claims that using small Ks helps to
decrease the premise/conclusion interval expansion rate. Since the PLN
book doesn't back it up with data I challenge that claim. Ultimately
experiments will have to be carried out to determine what is best.
## ExtensionalImplication over non-predicate

As we've seen in the example one may define ExtensionalImplication not
just over predicates but also schemata. One may push this even further
and define extensional implication over almost any atom type, using
Omega as underlying probability space, even though the elements of the
partitions taken into consideration are undefined.

For instance let's define

```
Evaluation <{0:0.5, 1:0.5}, 200>
  P
  A

Evaluation <{0:0.5, 1:0.5}, 200>
  Q
  B
```

That is P(A) (resp. Q(B)) is true half of the time, based on 200
observations each. However we may also know that they are extremely
correlated and when P(A) is true P(B) has 99% change of being
true. One may represent that with

```
ExtensionalImplication {0:<{0:0.99, 1:0.01}, 100>,
                        1:<{0:0.01, 1:0.99}, 100>
  Evaluation
    P
    A
  Evaluation
    Q
    B
```

Note: in the PLN book this is conceptually handled by defining
associated predicates of these evaluations over the domain of
contexts. However I don't think it is necessary as conditional
probabilities can be directly defined based on the probability space
Omega.

Again instead of using ExtensionalImplication we might want to define
a new link type.
## Relationships between random variables (atoms and GDTVs)

As we know the AtomSpace is not just a way to represent random
variables, but how they relate to each others. These relations allow
us to create abstract structures, not necessarily true (most of them
define concepts or predicates with null probabilities in Omega), but
useful to reason with and ultimately infer new knowledge about Omega
itself.
### Examples
#### Funny people at the party

Let's build abstract sets of people at a party, funny, and how they
relate to each others.

Let's define the members of concept "at-the-party"

```
Member <{1:1}, +inf>
  Concept "John"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Claude"
  Concept "at-the-party"

Member <{0:1}, +inf>
  Concept "David"
  Concept "at-the-party"
```

saying that we live in a universe where John, Mary and Clause are at
the party.

Let's define the members of concept "funny"

```
Member <{1:1}, +inf>
  Concept "John"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "funny"

Member <{0:1}, +inf>
  Concept "Claude"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "David"
  Concept "funny"
```

saying that we live in a universe where John, Mary and David are
funny.

The concepts "at-the-party" or "funny" are very likely not meaningful
subsets of Omega. We can give them null TVs because the set of people
a party is in no way a description of the possible world we're in. But
that doesn't matter, what matters is the abstract relationships we can
build between them. So we may have

```
Concept "funny" <{0:1}, +inf>
Concept "at-the-party" <{0:1}, +inf>
```

meaning that "funny" and "at-the-party' are almost surely false in our
subjective universe (neither John, nor Mary, etc, are possible
worlds). However we can still calculate the following

```
SubSet <{0:<{0: 1}, 1>, 
         1:<{0: 0.3333, 1:0.6666}, 3>}, 4>
  Concept "at-the-party"
  Concept "funny"
```

saying that based on one evidence (David), people not at the party are
funny, and based on 3 evidences, people at the party are funny with
probability 2/3.

The formula used is to calculate the later is obviously

```
(funny(John) + funny(Mary) + funny(Clause))
/
(at-the-part(John) + at-the-party(Mary) + at-the-party(Clause))
```

Let's now assume that the knowledge of these people being at the party
or funny is uncertain, so we have for instance

```
Member <(1: 0.6, 0:0.4), 1K>
  Concept "John"
  Concept "funny"
```

etc.

Then the calculation above would differ because funny(John), etc, in
fact random variables, thus their sum turns into a convolution product
(assuming they are independent).
#### Explicitly defining Omega

Above we defined abstract concepts like at-the-party and funny, and we
assumed that these concepts were almost surely false, but we may well
assume the opposite.

Let's assume the following Atom in the AtomSpace

```
Concept "A" <{1:1}, +inf>
```

meaning that the random variable with name/structure

```
Concept "A"
```

is almost surely true.

Now let's add the following

```
Member <{1:1}, +inf>
  Number 1
  Concept "A"
...
Member <{1:1}, +inf>
  Number 10
  Concept "A"
```

As well as

```
Member <{0:1}, +inf>
  Number 11
  Concept "A"
...
Member <{0:1}, +inf>
  <anything-but-numbers-from-1-to-10>
  Concept "A"
```

I'm using this <anything-but-numbers-from-1-to-10> trick so I don't
have to deal with universal quantifiers. Basically what it says is
that A contains the numbers from 1 to 10 and nothing else. But given
that A almost surely true, this allows us to actually define Omega (or
at least the subjective part that matters to us)!

I'm not sure how useful that is that but I thought I'd mention it. In
practice we don't know what Omega is, it is open-ended, as is our
universe.
## Fuzzy Operators

And, Or and Not links can be seen as fuzzy operators over random
variables with co-domains [0,1]. For instance in

```
And <GDTV>
  Evaluation <GDTV-1>
    P
    A
  Evaluation <GDTV-2>
    Q
    B
```

GDTV corresponds to the distribution of the random variable defined as
a minimum between (Evaluation P A) and (Evaluation Q B). If for
instance

```
GDTV-1 = <{0.7:1}, +inf>
GDTV-2 = <{0.5:1}, +inf>
```

then

```
GDTV = <{0.5:1}, +inf>
```

If these variables have independent distributions then the resulting
probabilities for each resulting key min(h, k) are multiplied and
added, etc. For instance

```
And
  Evaluation <{0.2:0.1, 0.4:0.4, 0.6:0.3, 0.8:0.2}, +inf>
    P
    A
  Evaluation <{0.1:0.2, 0.2:0.3, 0.4:0.3, 0.6:0.2}, +inf>
    Q
    B
```

will have as resulting GDTV

```
<{0.1:0.1*0.2 + 0.4*0.2 + ...,
  0.2:0.1+0.3 + 0.4*0.3 + ...,
  ...}, +inf>
```
## Other Operators

Operators over any random variables' co-domain can be used. For
instance

```
Plus <DGTV-1 + DGTV-2>
  ExecutionOutput <DGTV-1>
    Schema "S1"
    Concept "A1"
  ExecutionOutput <DGTV-2>
    Schema "S2"
    Concept "A2"
```

where <DGTV-1 + DGTV-2> corresponds to the TV of the random variable
which is the sum of the random variables of GDTV-1 and GDTV-2. For
instance assuming

```
GDTV-1 = <{10:0.9, 0:0.1}, +inf>
GDTV-2 = <{32:0.6, 0:0.4}, +inf>
```

and are independent then

```
GDTV-1 + GDTV-2 = <{42:0.54, 32:0.06, 10:0.36, 0:0.04}, +inf>
```

Operators like List constructions are very useful and can be used to
build random vectors, such as random pairs of variables to store the
necessary information to compute their mutual entropies, as shown
below.
## Bayesian network

GDTV allows us to represent Bayesian networks. Consider the BN in
https://en.wikipedia.org/wiki/Bayesian_network#Example

```
S = Sprinkler
R = Rain
G = Grass wet
```

Each random variables are boolean, however not all are conditionally
independent, for instance G depends on S and R combined. GDTV allows
us to represent these dependencies.

Extensional implication between Rain and Sprinkler:

```
ExtImplication <{0:{1:0.4, 0:0.6},
                 1:{1:0.01, 0:0.99}}, +inf>
  R
  S
```

Extensional implication between (Sprinkler, Rain) and Grass. Note the
use of List to build a random pair of variables:

```
ExtImplication <{(0, 0):{1:0.0, 0:1.0}, 
                 (0, 1):{1:0.8, 0:0.2},
                 (1, 0):{1:0.9, 0:0.1},
                 (1, 1):{1:0.99, 0:0.01}, +inf>
  List
    S
    R
  G
```

Where (0, 0) to (1, 1) enumerate all possible boolean pairs, that is
the possible values of the co-domain of (List S R). One may see how
that information is sufficient to compute the mutual information
between S and R.
## Probabilistic Programming Language

Likewise one can represent the conditional distribution of a program's
output, or program subfunctions using GDTV. In that case a GDTV query
using the BC chainer would amount to a PPL variable query. One may
probably design PLN rules to perform the kind of Monte-Carlo
simulations used by PPL to answer queries.
## Mixed extensional-intensional relationships

In the PLN book, mixed implication is the disjunction of extensional
and intensional implication (likewise for inheritance). The definition
of intensional implication using GDTV naturally follows as it is
merely an extensional implication in pattern space. The disjunction
between two conditional GDTVs however is not trivial. In the PLN book
the way to go about that would be to turn the conditional DGTV into a
simple TV according to the formula Section 2.4.1.1

```
s = sum_x f(P(x), Q(x)) / sum_x P(x)
```

where f is min for instance. Then whether the simple TV is fuzzy

```
<{s:1}, N>
```

or probabilistic

```
<{1:s, 0:1-s}, N>
```

the disjunctions between these is trivially defined.

There other ways to go about, as there was many ways to turn a
conditional GDTV into a unconditional one, and then disjunction is
trivially defined as well. I'm personally not sure what is the right
way, and ultimately this might may require experimentations.


Generalized Distributional TV proposal #833

Description

Generalized Distributional TV

Overview

Omega

Definition

Unconditional GDTV

Conditional GDTV

Examples

Unconditional GDTV

Perfect boolean sensor

Imperfect boolean sensor

Perfect fuzzy sensor

Imperfect boolean sensor with infinite certainty

Imperfect fuzzy sensor

Imperfect non-boolean, non-fuzzy sensors

Conditional GDTV

SubSet

ExtensionalImplication over schemata

Relationship with existing TVs

Probabilistic simple TV

Fuzzy simple TV

Distributional TV

Generic TV

Uncertainty

ExtensionalImplication over non-predicate

Relationships between random variables (atoms and GDTVs)

Examples

Funny people at the party

Explicitly defining Omega

Fuzzy Operators

Other Operators

Bayesian network

Probabilistic Programming Language

Mixed extensional-intensional relationships

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions