This document is very much a work-in-progress braindump, so please judge it accordingly.
Adjointness expresses a condition that is essentially universal in mathematics, category theory, probability, logic, optimization, machine learning. It captures this difference between global and local. Adjointness operates locally, norm operates globally.
What is adjointness? Depends on the context.
This page is illuminating.
I like this definition:
"adjoint operator ... back-projects information from data to the underlying model."
Essentially, it carries information about possible states at a point in time so that we can go back and forth. This allows you do optimizations that are otherwise not possible.
One can also see it as the adversarial relationship between two things. This relationship is so common that once you see it, you can't unsee it.
Adjointness expresses a condition that allows for inference, causality, and optimization. If the adjoint condition is met, information from one dimension can be used to make predictions about another dimension.
Nilpotence is another way of viewing adjointness.
Adjointness is only a part of the story. One often sees equations of the following form inverse(x) = adjoint(x)/norm(x)
. Once you have adjoint
and norm
, you magically get inverse
.
Norm essentially allows you do a 1-on-1 comparisons. Yes, database normalization falls in this category. Another way of looking at this is that by dividing by norm, you get some in some sense canonical form of the object in question. What is a canonical form? Computation.
Systems of formal logic, such as type theory, try to transform expressions into a canonical form which then serves as the end result of the given computation or deduction.
from ncatlab.
Every diagonal argument -- Cantor's theorem, Russell's paradox, Gödel's incompleteness theorem, the halting problem, the Y combinator, Curry's paradox, etc. -- uses the same basic trick. This trick is expressed explicitly by Lawvere's fixed point theorem: https://ncatlab.org/nlab/show/Lawvere%27s+fixed+point+theorem
As stated above inverse(x) = adjoint(x)/norm(x)
. We add an additional constraint, the unitarity identity(x) = x/inv(x)
.
The inner and outer product can be expressed succinctly in terms of adjoints.
inner(a) = a'*a
outer(a) = a*a'
You get the inverse of a Fourier transform by taking the adjoint of the sum (in this case, you do conjugate the exponents) and divide it by a norm 1/length(signal)
. Notice how adjoint operates locally (on the single exponents) and the norms operates globally (it knows how many elements are in the signal.
Adjoint functors are the foundation of "category theory". Adjoint functors capture solutions to optimization problems.
The slogan is "Adjoint functors arise everywhere".
— Saunders Mac Lane, Categories for the Working Mathematician
Monads can be expressed in terms of adjoint functors.
Linear logic is a logic of resources, i.e. you have to use facts to derive new facts. E.g. if you have a fact, I have $1
and a rule $1 -> ice cream
, you have to spend the dollar to get an ice cream.
It can also be seen as a "the interpretation of classical logic by replacing Boolean algebras by C*-algebras". Linear logic can therefore be as a logic with adjoint
and norm
. See how linear logic maps to Chu spaces for further detail. Linear logic has been futher expanded into adjoint logic.
The Rust borrow checker is based on a subset of linear logic. This is what allows Rust to reason about resource ownership statically.
In the Linear logic section, we mention the idea of having to use axioms to derive new axioms. This is the core idea behind the no-cloning theorem as well. Futhermore the connections between C* (which has closure under adjoint and norm) and quantum mechanics are well documented (https://www.math.uchicago.edu/~may/VIGRE/VIGRE2009/REUPapers/Gleason.pdf).
This might be a stretch but I see some of these ideas present even in the context of entity component systems. ECS is used heavily in games but it has applications beyond that.
In ECS, instead of using arrays of structures, you use structure of arrays which to me sounds like an adjoint. In ECS, your data is normalized so that it allows for 1-on-1
comparisons.
Chu spaces are essentially matrices with certain limitation (no duplicate rows or columns). They can be used to model linear logic.
I think of Chu spaces as objects with the good properties of tensors (covariance, contravariance) without being a pain-in-the-ass to work with.
Check out this blog post.
Constructive mathematics is mathematics without the law of excluded middle. To prove something, you need to show it's existence. These is a connection between constructive mathematics and linear logic.
Adjointness shows up in the context of automatic differentiation.
The algebraic foundation of automatic differentiation relies on dual numbers, i.e. numbers of the form a + b * epsilon
. Epsilon
is a constant such that epsilon^2=0 but epsilon != 0
. This is similar to imaginary unit except it squares to 0 instead of -1.
The problem with differentiation as it's taught in school is that the expression length of the derivative grows exponentially which makes it prohibitively slow for computation.
This blog post is pretty decent.
Vladimir Vovk has developed theory of probability that can be easily recast in terms of linear logic.
Roughtly, translated Kolmogorov's 3 axioms can be read as follows: 1.) norm 2.) unitarity 3.) adjointness (notice how we turn the inside product into the outside sum)
I've been thinking about a new representation of polynomials. Instead of using zeros, one would use a sequence of dual numbers that capture the minima and maxima of the polynomial. Theoretically, one could then evaluate polynomials by some smooth interpolation between two neighbouring points. Dual numbers allow you to capture the curvature at point naturally so it would make sense that this should be possible.
- Young tableau
- Non-commutative logic
- Message passing + adjoint logic
- backpropagation (and others)
- spectral theorem