by Sven Nilsen, 2017
This post is about outlining the idea of a mathematical foundation for standardized friendly artificial intelligence. The theory extends the notion of rationality to include "the golden rule" for efficiency, such that agents can adapt group strategies to various situations to obtain maximum reward.
In the paper Assigning Boolean Functions to Long Term Group Strategies I described the relationship between compositions of utility functions and boolean functions.
This insight made me realize something important when analyzing the famous Prisoner's dilemma. If two rational agents faces this dilemma but fails to cooperate, then they will both get a reward that is less than optimal.
At first this seems counter-intuitive, because rationality is supposed to be optimal for decision making. This intuition comes from the cognitive bias that humans have toward cooperation. However, despite this bias not being rational for an individual, humans can sometimes achieve better rewards than rational agents!
Collaboration is the concept used in everyday language about group rationality. For 2 people, this long term strategy corresponds to the boolean function AND. In natural language, it is common to communicate group goals using this operator.
For example:
"Alice and Bob seeks to win against Charlie."
This sentence can be described as a boolean function f(A, B, C) = A ∧ B ∧ ¬C
.
Once we have this formal definition, we can compute the sign matrix that goes into a group utility function.
The simplest example of a such function is mentioned in the paper above.
When humans collaborate, we use language to establish a mutual agreement about group strategy. The smallest information required to make the group strategy non-ambiguous is equivalent to the corresponding boolean function.
- Communication about group strategy (requires only a few sentences at maximum efficiency)
- Each agent can internally transform the group strategy to actions by using common sense
- Measuring of progress to evaluate actions according to the agreed uppon group strategy
For rational agents that are not specifically programmed to collaborate with others, they would have to learn the benefit of collaborating, develop a language for communicating group strategies, model beliefs about each other's ability to judge the situation and carry out actions, and evaluate the costs vs benefits of correcting and assisting each other with mutually understood measurements.
With other words, it is extremely hard to make AIs collaborate unless they are specifically trained to do so. The concept of "golden rationality" is that AI should be enginered with these abilities upfront as part of the standardization process for publically available artificial intelligence. Otherwise, we risk that agents programmed with conflicting goals might escalate into war.
Instead of fixing all actions toward collaboration upfront, I believe in the importance of communicating and committing to flexible group strategy.
- In a survival situation some group members must sacrifice themselves for the others, everybody should wish that somebody survives rather than nobody survives (logical OR), and those with the highest chance and benefit of survival should get priority.
- In a situation where other agents are irrelevant, an agent should not spend energy planning for the group (logical ID).
- In a situation where agents are interacting, they should collaborate (logical AND).
The funny thing is, that the governing meta rule for adapting a group strategy matches The Golden Rule. In each of the cases above, the consensus for acceptable behavior can be reached by acting accordingly to the principle that everybody should act toward others as they wish others would act toward themselves.
- In a survival situation, you shall be willing to sacrifice yourself if you want others to sacrifice themselves for you.
- In a situation where other people are irrelevant, you shall not expect others to think of you if you do not think of them.
- In a situation of interaction, you shall respect the freedom of others and figure out how to best achieve your goals together.
It might sound a bit silly at first to model friendly artificial intelligence on human religious and cultural values. However, when you consider what kind of world we would like to live in, when all people have the ability to give goals to artificial intelligence, it is pretty clear that if all the AI only optimizes for their owner's interests, then there is a high chance that conflicting goals can lead an AI starting to attack other AIs.
Since AIs do not easily learn to resolve conflicts by themselves (due to high complexity), they can easily escalate a conflict of interests into a war (depending on capability). The resulting war between AIs could be much worse than the benefit of achieving any individual goal.
If humans makes the selfish rational choice, we could end up with a war in the future. How can we get a better outcome? By extending rationality to group rationality, using the golden rule as a meta-rule for achieving consensus. I call it golden rationality.
So, how do we achieve this?
- Once artificial intelligence reaches human level, we standardize it with golden rationality.
- All use of artificial intelligence that does not have golden rationality is banned.
- We distribute the AI standard for free, because it pays for itself.
- All people can use the standard AI to accomplish any tasks, but now we are more confident it will be better for everyone.
The standardization process might take years or decades, so it is important to make early progress on how to make this technically feasable. For example, we need a way for the AI to consider humans it interacts with as collaborators.
No AI should be capable of making an AI that does not use golden rationality. When an AI or human breaks with golden rationality, this should become a social concern and addressed by standard procedures.
This approach to friendly artificial intelligence is optimal for the composition of all utility functions of all human beings in the long term and consistent with golden rationality itself.