Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized Distributional TV proposal #833

Open
ngeiswei opened this issue Jul 19, 2016 · 9 comments
Open

Generalized Distributional TV proposal #833

ngeiswei opened this issue Jul 19, 2016 · 9 comments
Assignees

Comments

@ngeiswei
Copy link
Member

Generalized Distributional TV

Overview

This is a proposal for a new TV type that encompasses probabilistic,
fuzzy, distributional TV types and more. In short it is a
distributional TV that may work over any domain, not just
boolean/fuzzy, equipped with counts to quantify uncertainty.

Disclaimer: things are presented under the angle of Kolmogorov
probability theory, mostly cause I feel more comfortable with it.

Omega

Omega is a probability space representing the subjective universe. We
learn about Omega via random variables. A random variable X: Omega->E
is a measurable function from domain Omega to co-domain E.

Any atom in the AtomSpace associated with a TV represents a random
variable. The TV (or GDTV as defined below) describes the distribution
over the co-domain of that random variable, and the atom defines the
name (or the structure) of that random variable and its relationships
to other random variables, i.e. other atoms.

When talking about the co-domain of an atom A we mean the co-domain of
the associated random variable of the TV of that atom.

Definition

We introduce a new TV type called Generalized Distributional Truth
Value, or GDTV for short. It could also be called merely
Distributional Value, or DV for short.

Unconditional GDTV

An unconditional GDTV of atom A is a pair <D,N> where

  1. D is a probability distribution over E, the co-domain of A.
  2. N is a count, a non negative real number.

In practice, discrete case, the probability measure will be described
as a mass function over a partition of E summing up to 1, following
the format

<{k1:p1, ..., kn:pn}, N>

where k1 to kn are the partition blocks (often singletons or
intervals), and p1 to pn are probabilities summing up to 1.

Conditional GDTV

More generally a GDTV is a conditional probability distribution. In
the discrete case, a GDTV over 2 atoms A and B is defined as a map
from a partition of the co-domain of A to a unconditional GDTV of B, as
defined above. The whole thing forms a conditional probability matrix,
equipped with a count on each row. The format may look like

{ka1: <{kb1:p11, ..., kbm:p1m}, N1>,
 ...
 kan: <{kb1:pn1, ..., kbm:pnm}, Nn>}

where ka1 to kan are partition blocks over the co-domain of A, and kb1
to kbm are partition blocks over the co-domain of B.

Examples

Unconditional GDTV

Perfect boolean sensor

Let's assume our window to the universe is a perfect and timeless
boolean sensor A. Since the sensor is boolean the co-domain E of the
associated random variable is {0,1}. One may represent an active
sensor as follows

Evaluation <{1:1}, +inf>
  Predicate "sensor"
  Concept "A"

meaning that the random variable with name/structure

Evaluation
  Predicate "sensor"
  Concept "A"

has probability

P(X=1)=1 with count +inf.

The first 1 in {1:1} indicates the sensor's value, the second
indicates its probability. There is a bit of notational abuse because
the first 1 represents in fact a singleton {1}.

Likewise an inactive sensor would be represented as

Evaluation <{0:1}, +inf>
  Predicate "sensor"
  Concept "A"

The count is +inf as we assume that the sensor is perfect, i.e. based
on infinite evidences.

Imperfect boolean sensor

Let's assume that sensor B is imperfect and is faulty one percent of
the time based on a million observations. One may represent such a
random variable as follows

Evaluation <{1:0.99, 0:0.01}, 1M>
  Predicate "sensor"
  Concept "B"

Perfect fuzzy sensor

Let's assume that perfect sensor C can take any value within [0,1] and
is true with degree 0.9786 with absolute certainty.

Evaluation <{0.9786:1}, +inf>
  Predicate "sensor"
  Concept "C"

Imperfect boolean sensor with infinite certainty

Let's assume that imperfect boolean sensor D has probability 0.9786 of
being active, and we're absolutely sure of that probability.

Evaluation <{1:0.9786}, {0:0.0214} +inf>
  Predicate "sensor"
  Concept "D"

Imperfect fuzzy sensor

Let's assume that imperfect sensor E can take any value within [0,1],
and background knowledge suggests (after testing this sensor a
thousand of times) we may safely discretize it in 4 bins.

Evaluation <{[0,0.25):0.002,
             [0.25,0.5):0.008,
             [0.5,0.75):0.2,
             [0.75,1]:0.7}, 1K>
  Predicate "sensor"
  Concept "E"

where [x,y) represent the left-closed right-open interval from x to y.

Imperfect non-boolean, non-fuzzy sensors

What if we want to describe random variable with co-domain other than
[0,1]? Then we can no longer use predicates, but we can still use
schemata!

Say we want to represent the distribution of heights in a population
of 10K individuals. We may describe it as follows

ExecutionOutput <{[0cm,100cm):0.2,
                  [100cm,160cm):0.3,
                  [160cm,240cm):0.5}, 10K>
  Schema "in-cm"
  Concept "height"

We may actually represent the intervals using atoms. For instance
[0cm,100cm) might be described as

LeftClosedRightOpenIntervalLink
  QuantityLink
    NumberNode 0
    UnitNode "cm"
  QuantityLink
    NumberNode 100
    UnitNode "cm"

or equivalently

QuantityLink
  LeftClosedRightOpenIntervalLink
    NumberNode 0
    NumberNode 100
  UnitNode "cm"

Note: QuantityLink and UnitNode are currently undefined types and used
for expository purpose.

Conditional GDTV

SubSet

Assume we have 2 crisp concepts A and B. The exact probability
distribution of P(B|A), with P(B=0|A=0)=0.3 and P(B=1|A=0)=0.7,
P(B=0|A=1)=0.1 and P(B=1|A=1), may be represented as

SubSet {0:<{0:0.3, 1:0.7}, +inf>,
        1:<{0:0.1, 1:0.9}, +inf>}
  Concept "A"
  Concept "B"

ExtensionalImplication over schemata

Assume we have 2 schemata over individuals, one that outputs the age,
another that outputs the height. There are 3 individuals, I1, I2 and I3
with ages 10, 50 and 100 yo respectively, and with heights 120, 172
and 169 cm respectively.

The schema "age" is defined as follows

Execution <{1:1}, +inf>
  Schema "age"
  Concept "I1"
  Number 10
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I2"
  Number 50
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I3"
  Number 100

One may also use the equivalent representation

ExecutionOuput <{10:1}, +inf>
  Schema "age"
  Concept "I1"
ExecutionOuput <{50:1}, +inf>
  Schema "age"
  Concept "I2"
ExecutionOuput <{100:1}, +inf>
  Schema "age"
  Concept "I3"

The schema "height" is defined as follows

Execution <{1:1}, +inf>
  Schema "height"
  Concept "I1"
  Number 120
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I2"
  Number 173
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I3"
  Number 169

One may also use the equivalent representation

ExecutionOuput <{120:1}, +inf>
  Schema "height"
  Concept "I1"
ExecutionOuput <{172:1}, +inf>
  Schema "height"
  Concept "I2"
ExecutionOuput <{169:1}, +inf>
  Schema "height"
  Concept "I3"

The conditional distributional probabilities of the height given the
age, assuming 2 partitions for age (0 to 40, and 40 to 130) and 2
partitions for height (0 to 150, and 150 to 200) may be represented as

ExtensionalImplication {[0,40):<{[0,150):1}, 1>,
                        [40,130):<{[150,200):1}, 2>}
  Schema "age"
  Schema "height"

meaning that based on 1 individual, being below 40 implies being under
150cm, and based on 2 individuals, being 40 or more implies being
150cm or more.

Note: we may want to introduce a new link type for schemata instead of
using ExtensionalImplication. At least there is a small rational for
using ExtensionalImplication instead of SubSet as a predicate is a
kind of schema.

Relationship with existing TVs

Probabilistic simple TV

A probabilistic simple TV <s, N> is represented by the following GDTV

<{1:s}, N>

Fuzzy simple TV

A fuzzy simple TV <s, N> is represented by the following GDTV

<{s:1}, N>

Distributional TV

As we've seen a GDTV is essentially a DTV + a count, with its
distribution over any co-domains.

Generic TV

I've never used Generic TV but I recall having a discussing with Linas
as to why he introduced them and one of the reasons was to store the
mutual entropy of pairs of random variables. With GDTV this is
possible, see Section "Other operators".

Uncertainty

The uncertainty of a GDTV is represented by its count, or counts if it
is a conditional GDTV. Counts are real non negative numbers. Although
integers will be used most of the time, having real numbers allow
counts derived from inferences to not be unnecessarily truncated into
integers (this may or not turn out to be useful). The count reflects a
second order distribution over the distributions over the co-domain,
and should be used to estimate the counts of inferred conclusions. We
could start experimenting with Dirichlet distributions as second order
distributions.

Of course a confidence number can be used as well, however I believe
the count has a clearer semantics and should be the default.

I suspect we want to drop the lookahead K as defined in the indefinite
truth value and use plain Dirichlet prior instead (corresponding to K
tends to +inf). The PLN books claims that using small Ks helps to
decrease the premise/conclusion interval expansion rate. Since the PLN
book doesn't back it up with data I challenge that claim. Ultimately
experiments will have to be carried out to determine what is best.

ExtensionalImplication over non-predicate

As we've seen in the example one may define ExtensionalImplication not
just over predicates but also schemata. One may push this even further
and define extensional implication over almost any atom type, using
Omega as underlying probability space, even though the elements of the
partitions taken into consideration are undefined.

For instance let's define

Evaluation <{0:0.5, 1:0.5}, 200>
  P
  A

Evaluation <{0:0.5, 1:0.5}, 200>
  Q
  B

That is P(A) (resp. Q(B)) is true half of the time, based on 200
observations each. However we may also know that they are extremely
correlated and when P(A) is true P(B) has 99% change of being
true. One may represent that with

ExtensionalImplication {0:<{0:0.99, 1:0.01}, 100>,
                        1:<{0:0.01, 1:0.99}, 100>
  Evaluation
    P
    A
  Evaluation
    Q
    B

Note: in the PLN book this is conceptually handled by defining
associated predicates of these evaluations over the domain of
contexts. However I don't think it is necessary as conditional
probabilities can be directly defined based on the probability space
Omega.

Again instead of using ExtensionalImplication we might want to define
a new link type.

Relationships between random variables (atoms and GDTVs)

As we know the AtomSpace is not just a way to represent random
variables, but how they relate to each others. These relations allow
us to create abstract structures, not necessarily true (most of them
define concepts or predicates with null probabilities in Omega), but
useful to reason with and ultimately infer new knowledge about Omega
itself.

Examples

Funny people at the party

Let's build abstract sets of people at a party, funny, and how they
relate to each others.

Let's define the members of concept "at-the-party"

Member <{1:1}, +inf>
  Concept "John"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Claude"
  Concept "at-the-party"

Member <{0:1}, +inf>
  Concept "David"
  Concept "at-the-party"

saying that we live in a universe where John, Mary and Clause are at
the party.

Let's define the members of concept "funny"

Member <{1:1}, +inf>
  Concept "John"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "funny"

Member <{0:1}, +inf>
  Concept "Claude"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "David"
  Concept "funny"

saying that we live in a universe where John, Mary and David are
funny.

The concepts "at-the-party" or "funny" are very likely not meaningful
subsets of Omega. We can give them null TVs because the set of people
a party is in no way a description of the possible world we're in. But
that doesn't matter, what matters is the abstract relationships we can
build between them. So we may have

Concept "funny" <{0:1}, +inf>
Concept "at-the-party" <{0:1}, +inf>

meaning that "funny" and "at-the-party' are almost surely false in our
subjective universe (neither John, nor Mary, etc, are possible
worlds). However we can still calculate the following

SubSet <{0:<{0: 1}, 1>, 
         1:<{0: 0.3333, 1:0.6666}, 3>}, 4>
  Concept "at-the-party"
  Concept "funny"

saying that based on one evidence (David), people not at the party are
funny, and based on 3 evidences, people at the party are funny with
probability 2/3.

The formula used is to calculate the later is obviously

(funny(John) + funny(Mary) + funny(Clause))
/
(at-the-part(John) + at-the-party(Mary) + at-the-party(Clause))

Let's now assume that the knowledge of these people being at the party
or funny is uncertain, so we have for instance

Member <(1: 0.6, 0:0.4), 1K>
  Concept "John"
  Concept "funny"

etc.

Then the calculation above would differ because funny(John), etc, in
fact random variables, thus their sum turns into a convolution product
(assuming they are independent).

Explicitly defining Omega

Above we defined abstract concepts like at-the-party and funny, and we
assumed that these concepts were almost surely false, but we may well
assume the opposite.

Let's assume the following Atom in the AtomSpace

Concept "A" <{1:1}, +inf>

meaning that the random variable with name/structure

Concept "A"

is almost surely true.

Now let's add the following

Member <{1:1}, +inf>
  Number 1
  Concept "A"
...
Member <{1:1}, +inf>
  Number 10
  Concept "A"

As well as

Member <{0:1}, +inf>
  Number 11
  Concept "A"
...
Member <{0:1}, +inf>
  <anything-but-numbers-from-1-to-10>
  Concept "A"

I'm using this trick so I don't
have to deal with universal quantifiers. Basically what it says is
that A contains the numbers from 1 to 10 and nothing else. But given
that A almost surely true, this allows us to actually define Omega (or
at least the subjective part that matters to us)!

I'm not sure how useful that is that but I thought I'd mention it. In
practice we don't know what Omega is, it is open-ended, as is our
universe.

Fuzzy Operators

And, Or and Not links can be seen as fuzzy operators over random
variables with co-domains [0,1]. For instance in

And <GDTV>
  Evaluation <GDTV-1>
    P
    A
  Evaluation <GDTV-2>
    Q
    B

GDTV corresponds to the distribution of the random variable defined as
a minimum between (Evaluation P A) and (Evaluation Q B). If for
instance

GDTV-1 = <{0.7:1}, +inf>
GDTV-2 = <{0.5:1}, +inf>

then

GDTV = <{0.5:1}, +inf>

If these variables have independent distributions then the resulting
probabilities for each resulting key min(h, k) are multiplied and
added, etc. For instance

And
  Evaluation <{0.2:0.1, 0.4:0.4, 0.6:0.3, 0.8:0.2}, +inf>
    P
    A
  Evaluation <{0.1:0.2, 0.2:0.3, 0.4:0.3, 0.6:0.2}, +inf>
    Q
    B

will have as resulting GDTV

<{0.1:0.1*0.2 + 0.4*0.2 + ...,
  0.2:0.1+0.3 + 0.4*0.3 + ...,
  ...}, +inf>

Other Operators

Operators over any random variables' co-domain can be used. For
instance

Plus <DGTV-1 + DGTV-2>
  ExecutionOutput <DGTV-1>
    Schema "S1"
    Concept "A1"
  ExecutionOutput <DGTV-2>
    Schema "S2"
    Concept "A2"

where <DGTV-1 + DGTV-2> corresponds to the TV of the random variable
which is the sum of the random variables of GDTV-1 and GDTV-2. For
instance assuming

GDTV-1 = <{10:0.9, 0:0.1}, +inf>
GDTV-2 = <{32:0.6, 0:0.4}, +inf>

and are independent then

GDTV-1 + GDTV-2 = <{42:0.54, 32:0.06, 10:0.36, 0:0.04}, +inf>

Operators like List constructions are very useful and can be used to
build random vectors, such as random pairs of variables to store the
necessary information to compute their mutual entropies, as shown
below.

Bayesian network

GDTV allows us to represent Bayesian networks. Consider the BN in
https://en.wikipedia.org/wiki/Bayesian_network#Example

S = Sprinkler
R = Rain
G = Grass wet

Each random variables are boolean, however not all are conditionally
independent, for instance G depends on S and R combined. GDTV allows
us to represent these dependencies.

Extensional implication between Rain and Sprinkler:

ExtImplication <{0:{1:0.4, 0:0.6},
                 1:{1:0.01, 0:0.99}}, +inf>
  R
  S

Extensional implication between (Sprinkler, Rain) and Grass. Note the
use of List to build a random pair of variables:

ExtImplication <{(0, 0):{1:0.0, 0:1.0}, 
                 (0, 1):{1:0.8, 0:0.2},
                 (1, 0):{1:0.9, 0:0.1},
                 (1, 1):{1:0.99, 0:0.01}, +inf>
  List
    S
    R
  G

Where (0, 0) to (1, 1) enumerate all possible boolean pairs, that is
the possible values of the co-domain of (List S R). One may see how
that information is sufficient to compute the mutual information
between S and R.

Probabilistic Programming Language

Likewise one can represent the conditional distribution of a program's
output, or program subfunctions using GDTV. In that case a GDTV query
using the BC chainer would amount to a PPL variable query. One may
probably design PLN rules to perform the kind of Monte-Carlo
simulations used by PPL to answer queries.

Mixed extensional-intensional relationships

In the PLN book, mixed implication is the disjunction of extensional
and intensional implication (likewise for inheritance). The definition
of intensional implication using GDTV naturally follows as it is
merely an extensional implication in pattern space. The disjunction
between two conditional GDTVs however is not trivial. In the PLN book
the way to go about that would be to turn the conditional DGTV into a
simple TV according to the formula Section 2.4.1.1

s = sum_x f(P(x), Q(x)) / sum_x P(x)

where f is min for instance. Then whether the simple TV is fuzzy

<{s:1}, N>

or probabilistic

<{1:s, 0:1-s}, N>

the disjunctions between these is trivially defined.

There other ways to go about, as there was many ways to turn a
conditional GDTV into a unconditional one, and then disjunction is
trivially defined as well. I'm personally not sure what is the right
way, and ultimately this might may require experimentations.

@linas
Copy link
Member

linas commented Aug 13, 2016

I strongly suggest that instead of inventing a new TV, that the work on ProtoAtom be finished, instead. I think the theory for ProtoAtom is mostly done, see http://wiki.opencog.org/wikihome/index.php/ProtoAtom and also the status in bug #513.

The ProtoAtom is intended to be the ultimate do-all, be-all replacement for TV's and any other generic values to be associated with atoms, as attributes. Of course, this won't happen, if people hat the API to it, or find it unsatisfying, and so a review of the wiki page, and of the bug #513 is strongly encouraged.

@ngeiswei
Copy link
Member Author

ngeiswei commented Sep 2, 2016

Agreed. I suppose then the current AtomSpace class will be replaced by a AtomSpaceTVAV class or such that inherits an AtomSpace class that has no TV or AV and just a generic ProtoAtom instead.

Anyway, I don't intend to look into this for now, I prefer to focus on the Backward Chainer, but once we start getting deeper into PLN reasoning, this will have to be carefully addressed.

@linas
Copy link
Member

linas commented Sep 2, 2016

Re backwards-chainng, plese review the blog post about multiverses, and see if you can steal ideas from it. It is not necessary, but it could be useful. Some day, in the more distant future, if it's done right, it could even be important, but for right now, its something to keep in mind, and try to apply.

blog post: http://blog.opencog.org/2016/08/31/many-worlds-reasoning-about-reasoning/

@linas
Copy link
Member

linas commented Mar 7, 2017

FYI, the protoatom work #513 is mostly done; some extra polish and some utilities may be needed but the core works. It works like so: for any (key,atom) pair, you can associate a (mutable) array of floats (FloatValue), array of strings (StringValue), or array of these arrays (ListValue). The current, existing TV's all inherit from FloatValue.

The key can be any atom, although PredicateNode is recommended. The current TruthValues are associated with PredicateNode "*-TruthValueKey-*"

The triple (key, atom, value) is called a "Valuation" in the C++ code; its kind-of-ish like an EvaluationLink, and its kind-of-ish like the concept of "valuation" used in model theory books. Since the Valutions are not atoms, they are not indexed in the AtomSpace, and the pattern matcher cannot be used to find them. (in the same way that the pattern matcher cannot be used to search for TV's or AV's)

The valuations are saved/restored from the SQL tables, If you build the Distributional TV from these values, it will be handled automatically. New kinds of values can be added too; they use the same atom_types.script and classserver() mechanism as atoms.

@ngeiswei
Copy link
Member Author

ngeiswei commented Mar 7, 2017

Cool! I haven't looked at how it has affected the C++ TV code, I suppose it's rather profound. It feels that all the C++ TV could now be replaced by scheme code. Not suggesting to do that, but it shows how much more flexibility is at our disposal. Well done!

@linas
Copy link
Member

linas commented Mar 10, 2017

Yes, I guess that the "all the C++ TV could now be replaced by scheme code". For now, I did keep the TruthValuePtr inside of atoms, since I figured that maybe this makes some difference to performance ... except that ... it probably doesn't, in any practical use.

It would be nice to have some sort of PLN performance suite, to see what changes, if any, affect performance. The benchmark measures this, but its not clear how much certain subsystems matter.

@linas
Copy link
Member

linas commented Jun 4, 2017

FYI, I recently added some scheme utilities to compute distributions aka "bin counting". See #1248. The resulting distributions are currently not attached to any atom, or otherwise integrated into the atomspace; I only use it to print TSV files to create graphs. https://github.com/linas/atomspace/blob/master/opencog/analysis/bin-count.scm

@linas
Copy link
Member

linas commented Oct 16, 2018

I suspect you have not noticed/realized this, but the code in https://github.com/linas/atomspace/blob/master/opencog/matrix implements generalized distributional TV's and an extensive framework for working with them, and manipulating them in various ways.

The actual representational format is completely different from what is sketched above. The biggest feature is this: unlike the above, one does NOT have to define new atom types, new truth value types, new ways of representing things. Instead, one writes some access routines that fish out the GDTV's from whatever locations they might be currently stored at. Once these are written, all of the rest of the compute infrastructure comes "for free": you can compute marginals and conditionals and ... well, piles of stuff, anything that I found useful, so far.

The lesson that I learned is that this way of treating GDTV's works really really well: it was a trial by fire, of actually having to use it and make it work. It works. Now is the time to re-write it in C++, (or R, or something) because the existing scheme API is .. not that fast, and maybe hard to understand, if you are used to existing statistics packages like R (or SciPy). (I don't know how hard a port to SciPy would be. Python/cython is f**'ed up, on the inside, R is much cleaner.)

@ngeiswei
Copy link
Member Author

Thanks for the reference @linas, we'll have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@linas @ngeiswei @rTreutlein and others