-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized Distributional TV proposal #833
Comments
I strongly suggest that instead of inventing a new TV, that the work on ProtoAtom be finished, instead. I think the theory for ProtoAtom is mostly done, see http://wiki.opencog.org/wikihome/index.php/ProtoAtom and also the status in bug #513. The ProtoAtom is intended to be the ultimate do-all, be-all replacement for TV's and any other generic values to be associated with atoms, as attributes. Of course, this won't happen, if people hat the API to it, or find it unsatisfying, and so a review of the wiki page, and of the bug #513 is strongly encouraged. |
Agreed. I suppose then the current AtomSpace class will be replaced by a AtomSpaceTVAV class or such that inherits an AtomSpace class that has no TV or AV and just a generic ProtoAtom instead. Anyway, I don't intend to look into this for now, I prefer to focus on the Backward Chainer, but once we start getting deeper into PLN reasoning, this will have to be carefully addressed. |
Re backwards-chainng, plese review the blog post about multiverses, and see if you can steal ideas from it. It is not necessary, but it could be useful. Some day, in the more distant future, if it's done right, it could even be important, but for right now, its something to keep in mind, and try to apply. blog post: http://blog.opencog.org/2016/08/31/many-worlds-reasoning-about-reasoning/ |
FYI, the protoatom work #513 is mostly done; some extra polish and some utilities may be needed but the core works. It works like so: for any (key,atom) pair, you can associate a (mutable) array of floats (FloatValue), array of strings (StringValue), or array of these arrays (ListValue). The current, existing TV's all inherit from FloatValue. The The triple (key, atom, value) is called a "Valuation" in the C++ code; its kind-of-ish like an EvaluationLink, and its kind-of-ish like the concept of "valuation" used in model theory books. Since the Valutions are not atoms, they are not indexed in the AtomSpace, and the pattern matcher cannot be used to find them. (in the same way that the pattern matcher cannot be used to search for TV's or AV's) The valuations are saved/restored from the SQL tables, If you build the Distributional TV from these values, it will be handled automatically. New kinds of values can be added too; they use the same |
Cool! I haven't looked at how it has affected the C++ TV code, I suppose it's rather profound. It feels that all the C++ TV could now be replaced by scheme code. Not suggesting to do that, but it shows how much more flexibility is at our disposal. Well done! |
Yes, I guess that the "all the C++ TV could now be replaced by scheme code". For now, I did keep the TruthValuePtr inside of atoms, since I figured that maybe this makes some difference to performance ... except that ... it probably doesn't, in any practical use. It would be nice to have some sort of PLN performance suite, to see what changes, if any, affect performance. The benchmark measures this, but its not clear how much certain subsystems matter. |
FYI, I recently added some scheme utilities to compute distributions aka "bin counting". See #1248. The resulting distributions are currently not attached to any atom, or otherwise integrated into the atomspace; I only use it to print TSV files to create graphs. https://github.com/linas/atomspace/blob/master/opencog/analysis/bin-count.scm |
I suspect you have not noticed/realized this, but the code in https://github.com/linas/atomspace/blob/master/opencog/matrix implements generalized distributional TV's and an extensive framework for working with them, and manipulating them in various ways. The actual representational format is completely different from what is sketched above. The biggest feature is this: unlike the above, one does NOT have to define new atom types, new truth value types, new ways of representing things. Instead, one writes some access routines that fish out the GDTV's from whatever locations they might be currently stored at. Once these are written, all of the rest of the compute infrastructure comes "for free": you can compute marginals and conditionals and ... well, piles of stuff, anything that I found useful, so far. The lesson that I learned is that this way of treating GDTV's works really really well: it was a trial by fire, of actually having to use it and make it work. It works. Now is the time to re-write it in C++, (or R, or something) because the existing scheme API is .. not that fast, and maybe hard to understand, if you are used to existing statistics packages like R (or SciPy). (I don't know how hard a port to SciPy would be. Python/cython is f**'ed up, on the inside, R is much cleaner.) |
Thanks for the reference @linas, we'll have a look at it. |
Generalized Distributional TV
Overview
This is a proposal for a new TV type that encompasses probabilistic,
fuzzy, distributional TV types and more. In short it is a
distributional TV that may work over any domain, not just
boolean/fuzzy, equipped with counts to quantify uncertainty.
Disclaimer: things are presented under the angle of Kolmogorov
probability theory, mostly cause I feel more comfortable with it.
Omega
Omega is a probability space representing the subjective universe. We
learn about Omega via random variables. A random variable X: Omega->E
is a measurable function from domain Omega to co-domain E.
Any atom in the AtomSpace associated with a TV represents a random
variable. The TV (or GDTV as defined below) describes the distribution
over the co-domain of that random variable, and the atom defines the
name (or the structure) of that random variable and its relationships
to other random variables, i.e. other atoms.
When talking about the co-domain of an atom A we mean the co-domain of
the associated random variable of the TV of that atom.
Definition
We introduce a new TV type called Generalized Distributional Truth
Value, or GDTV for short. It could also be called merely
Distributional Value, or DV for short.
Unconditional GDTV
An unconditional GDTV of atom A is a pair <D,N> where
In practice, discrete case, the probability measure will be described
as a mass function over a partition of E summing up to 1, following
the format
where k1 to kn are the partition blocks (often singletons or
intervals), and p1 to pn are probabilities summing up to 1.
Conditional GDTV
More generally a GDTV is a conditional probability distribution. In
the discrete case, a GDTV over 2 atoms A and B is defined as a map
from a partition of the co-domain of A to a unconditional GDTV of B, as
defined above. The whole thing forms a conditional probability matrix,
equipped with a count on each row. The format may look like
where ka1 to kan are partition blocks over the co-domain of A, and kb1
to kbm are partition blocks over the co-domain of B.
Examples
Unconditional GDTV
Perfect boolean sensor
Let's assume our window to the universe is a perfect and timeless
boolean sensor A. Since the sensor is boolean the co-domain E of the
associated random variable is {0,1}. One may represent an active
sensor as follows
meaning that the random variable with name/structure
has probability
P(X=1)=1 with count +inf.
The first 1 in {1:1} indicates the sensor's value, the second
indicates its probability. There is a bit of notational abuse because
the first 1 represents in fact a singleton {1}.
Likewise an inactive sensor would be represented as
The count is +inf as we assume that the sensor is perfect, i.e. based
on infinite evidences.
Imperfect boolean sensor
Let's assume that sensor B is imperfect and is faulty one percent of
the time based on a million observations. One may represent such a
random variable as follows
Perfect fuzzy sensor
Let's assume that perfect sensor C can take any value within [0,1] and
is true with degree 0.9786 with absolute certainty.
Imperfect boolean sensor with infinite certainty
Let's assume that imperfect boolean sensor D has probability 0.9786 of
being active, and we're absolutely sure of that probability.
Imperfect fuzzy sensor
Let's assume that imperfect sensor E can take any value within [0,1],
and background knowledge suggests (after testing this sensor a
thousand of times) we may safely discretize it in 4 bins.
where [x,y) represent the left-closed right-open interval from x to y.
Imperfect non-boolean, non-fuzzy sensors
What if we want to describe random variable with co-domain other than
[0,1]? Then we can no longer use predicates, but we can still use
schemata!
Say we want to represent the distribution of heights in a population
of 10K individuals. We may describe it as follows
We may actually represent the intervals using atoms. For instance
[0cm,100cm) might be described as
or equivalently
Note: QuantityLink and UnitNode are currently undefined types and used
for expository purpose.
Conditional GDTV
SubSet
Assume we have 2 crisp concepts A and B. The exact probability
distribution of P(B|A), with P(B=0|A=0)=0.3 and P(B=1|A=0)=0.7,
P(B=0|A=1)=0.1 and P(B=1|A=1), may be represented as
ExtensionalImplication over schemata
Assume we have 2 schemata over individuals, one that outputs the age,
another that outputs the height. There are 3 individuals, I1, I2 and I3
with ages 10, 50 and 100 yo respectively, and with heights 120, 172
and 169 cm respectively.
The schema "age" is defined as follows
One may also use the equivalent representation
The schema "height" is defined as follows
One may also use the equivalent representation
The conditional distributional probabilities of the height given the
age, assuming 2 partitions for age (0 to 40, and 40 to 130) and 2
partitions for height (0 to 150, and 150 to 200) may be represented as
meaning that based on 1 individual, being below 40 implies being under
150cm, and based on 2 individuals, being 40 or more implies being
150cm or more.
Note: we may want to introduce a new link type for schemata instead of
using ExtensionalImplication. At least there is a small rational for
using ExtensionalImplication instead of SubSet as a predicate is a
kind of schema.
Relationship with existing TVs
Probabilistic simple TV
A probabilistic simple TV
<s, N>
is represented by the following GDTVFuzzy simple TV
A fuzzy simple TV
<s, N>
is represented by the following GDTVDistributional TV
As we've seen a GDTV is essentially a DTV + a count, with its
distribution over any co-domains.
Generic TV
I've never used Generic TV but I recall having a discussing with Linas
as to why he introduced them and one of the reasons was to store the
mutual entropy of pairs of random variables. With GDTV this is
possible, see Section "Other operators".
Uncertainty
The uncertainty of a GDTV is represented by its count, or counts if it
is a conditional GDTV. Counts are real non negative numbers. Although
integers will be used most of the time, having real numbers allow
counts derived from inferences to not be unnecessarily truncated into
integers (this may or not turn out to be useful). The count reflects a
second order distribution over the distributions over the co-domain,
and should be used to estimate the counts of inferred conclusions. We
could start experimenting with Dirichlet distributions as second order
distributions.
Of course a confidence number can be used as well, however I believe
the count has a clearer semantics and should be the default.
I suspect we want to drop the lookahead K as defined in the indefinite
truth value and use plain Dirichlet prior instead (corresponding to K
tends to +inf). The PLN books claims that using small Ks helps to
decrease the premise/conclusion interval expansion rate. Since the PLN
book doesn't back it up with data I challenge that claim. Ultimately
experiments will have to be carried out to determine what is best.
ExtensionalImplication over non-predicate
As we've seen in the example one may define ExtensionalImplication not
just over predicates but also schemata. One may push this even further
and define extensional implication over almost any atom type, using
Omega as underlying probability space, even though the elements of the
partitions taken into consideration are undefined.
For instance let's define
That is P(A) (resp. Q(B)) is true half of the time, based on 200
observations each. However we may also know that they are extremely
correlated and when P(A) is true P(B) has 99% change of being
true. One may represent that with
Note: in the PLN book this is conceptually handled by defining
associated predicates of these evaluations over the domain of
contexts. However I don't think it is necessary as conditional
probabilities can be directly defined based on the probability space
Omega.
Again instead of using ExtensionalImplication we might want to define
a new link type.
Relationships between random variables (atoms and GDTVs)
As we know the AtomSpace is not just a way to represent random
variables, but how they relate to each others. These relations allow
us to create abstract structures, not necessarily true (most of them
define concepts or predicates with null probabilities in Omega), but
useful to reason with and ultimately infer new knowledge about Omega
itself.
Examples
Funny people at the party
Let's build abstract sets of people at a party, funny, and how they
relate to each others.
Let's define the members of concept "at-the-party"
saying that we live in a universe where John, Mary and Clause are at
the party.
Let's define the members of concept "funny"
saying that we live in a universe where John, Mary and David are
funny.
The concepts "at-the-party" or "funny" are very likely not meaningful
subsets of Omega. We can give them null TVs because the set of people
a party is in no way a description of the possible world we're in. But
that doesn't matter, what matters is the abstract relationships we can
build between them. So we may have
meaning that "funny" and "at-the-party' are almost surely false in our
subjective universe (neither John, nor Mary, etc, are possible
worlds). However we can still calculate the following
saying that based on one evidence (David), people not at the party are
funny, and based on 3 evidences, people at the party are funny with
probability 2/3.
The formula used is to calculate the later is obviously
Let's now assume that the knowledge of these people being at the party
or funny is uncertain, so we have for instance
etc.
Then the calculation above would differ because funny(John), etc, in
fact random variables, thus their sum turns into a convolution product
(assuming they are independent).
Explicitly defining Omega
Above we defined abstract concepts like at-the-party and funny, and we
assumed that these concepts were almost surely false, but we may well
assume the opposite.
Let's assume the following Atom in the AtomSpace
meaning that the random variable with name/structure
is almost surely true.
Now let's add the following
As well as
I'm using this trick so I don't
have to deal with universal quantifiers. Basically what it says is
that A contains the numbers from 1 to 10 and nothing else. But given
that A almost surely true, this allows us to actually define Omega (or
at least the subjective part that matters to us)!
I'm not sure how useful that is that but I thought I'd mention it. In
practice we don't know what Omega is, it is open-ended, as is our
universe.
Fuzzy Operators
And, Or and Not links can be seen as fuzzy operators over random
variables with co-domains [0,1]. For instance in
GDTV corresponds to the distribution of the random variable defined as
a minimum between (Evaluation P A) and (Evaluation Q B). If for
instance
then
If these variables have independent distributions then the resulting
probabilities for each resulting key min(h, k) are multiplied and
added, etc. For instance
will have as resulting GDTV
Other Operators
Operators over any random variables' co-domain can be used. For
instance
where <DGTV-1 + DGTV-2> corresponds to the TV of the random variable
which is the sum of the random variables of GDTV-1 and GDTV-2. For
instance assuming
and are independent then
Operators like List constructions are very useful and can be used to
build random vectors, such as random pairs of variables to store the
necessary information to compute their mutual entropies, as shown
below.
Bayesian network
GDTV allows us to represent Bayesian networks. Consider the BN in
https://en.wikipedia.org/wiki/Bayesian_network#Example
Each random variables are boolean, however not all are conditionally
independent, for instance G depends on S and R combined. GDTV allows
us to represent these dependencies.
Extensional implication between Rain and Sprinkler:
Extensional implication between (Sprinkler, Rain) and Grass. Note the
use of List to build a random pair of variables:
Where (0, 0) to (1, 1) enumerate all possible boolean pairs, that is
the possible values of the co-domain of (List S R). One may see how
that information is sufficient to compute the mutual information
between S and R.
Probabilistic Programming Language
Likewise one can represent the conditional distribution of a program's
output, or program subfunctions using GDTV. In that case a GDTV query
using the BC chainer would amount to a PPL variable query. One may
probably design PLN rules to perform the kind of Monte-Carlo
simulations used by PPL to answer queries.
Mixed extensional-intensional relationships
In the PLN book, mixed implication is the disjunction of extensional
and intensional implication (likewise for inheritance). The definition
of intensional implication using GDTV naturally follows as it is
merely an extensional implication in pattern space. The disjunction
between two conditional GDTVs however is not trivial. In the PLN book
the way to go about that would be to turn the conditional DGTV into a
simple TV according to the formula Section 2.4.1.1
where f is min for instance. Then whether the simple TV is fuzzy
or probabilistic
the disjunctions between these is trivially defined.
There other ways to go about, as there was many ways to turn a
conditional GDTV into a unconditional one, and then disjunction is
trivially defined as well. I'm personally not sure what is the right
way, and ultimately this might may require experimentations.
The text was updated successfully, but these errors were encountered: