Executive Summary
Game theory reveals that successful coordination requires more than aligned incentives or mutual knowledge—it requires common knowledge: infinite hierarchies of "I know that you know that I know..." This explains why transparency mechanisms, public announcements, and visible commitments are essential for AI safety coordination, even when all parties privately share the same information.
Drawing on Lewis (1969), Schelling (1960), and the formal analysis of common knowledge, this paper identifies three lessons:
1. Mutual knowledge is insufficient - Each party knowing the same fact doesn't enable coordination 2. Public announcements create common knowledge - This is why transparency matters even when everyone already knows 3. Lewis conventions require common knowledge - Following a coordination equilibrium requires infinite-order knowledge
---
The Problem: Why Don't Labs Coordinate?
Suppose every AI lab privately agrees that racing is dangerous and coordination would be better. Why don't they coordinate?
Standard game theory says: if coordination is a Nash equilibrium and all parties prefer it, they should coordinate. But they don't.
The reason: lack of common knowledge.
---
Common Knowledge vs. Mutual Knowledge
Mutual Knowledge
Proposition A is mutually known among agents if each agent knows A.
Example: Each lab knows that racing is dangerous. But they don't know if others know that they know.
Common Knowledge
Proposition A is commonly known if:
- Each agent knows A
- Each agent knows that each agent knows A
- Each agent knows that each agent knows that each agent knows A
- ... ad infinitum
This infinite hierarchy is essential for coordination.
---
The Barbecue Problem: Why Announcements Matter
Littlewood's barbecue problem illustrates the power of common knowledge:
N diners have barbecue sauce on their faces. Each can see the others' faces but not their own. The cook announces: "At least one of you has sauce on your face."
Before the announcement: Everyone already knew that at least one person had sauce on their face (they could see the others). The announcement told them nothing new.
After the announcement: The fact became common knowledge. This enabled the messy diners to eventually deduce their own status through iterative reasoning.
Application to AI Safety:
A public commitment to safety practices might seem redundant—"everyone already knows racing is risky." But the announcement creates common knowledge, which enables coordination reasoning that was impossible before.
---
Schelling's Department Store: Coordination Requires Common Knowledge
Schelling's famous example: Two people separated in a department store with no prior agreement on where to meet. They need to find a "focal point"—an obvious location that both will think of.
But what makes a focal point work? Not just that both people think of it, but that each knows the other will think of it, and knows the other knows they will think of it, etc.
Robert's reasoning: > "I should go to the 2nd floor if I expect Liz to go there. But I expect Liz to go there only if she expects me to go there. And she expects me to go there only if she expects me to expect her to go there..."
For this reasoning to converge, they need common knowledge that both will go to the 2nd floor.
Application to AI Safety:
Labs trying to coordinate face the same problem. Each wants to coordinate if others will coordinate. But without common knowledge of which equilibrium to follow, they can't converge.
---
Lewis Conventions: The Formal Requirement
David Lewis (1969) defined a convention as a Nash equilibrium that agents follow because they have common knowledge of: 1. The game structure 2. Each other's rationality 3. Their intentions to follow this equilibrium (and no other)
This is a demanding requirement. It means:
- Labs need common knowledge of the coordination game structure
- Labs need common knowledge that others are rational
- Labs need common knowledge that others intend to coordinate on the same equilibrium
Without all three, coordination fails.
---
Why This Matters for AI Safety
1. Secret Agreements Are Fragile
Bilateral agreements between labs create mutual knowledge but not common knowledge. The broader community doesn't know about them, so they can't reinforce coordination norms.
Implication: Public commitments are more robust than private agreements.
2. Transparency Creates Common Knowledge
When labs publicly disclose safety practices:
- Everyone knows what each lab is doing
- Everyone knows that everyone knows
- Everyone can reason about coordination
This is why transparency mechanisms matter even when "everyone already knows" the risks.
Implication: Transparency reporting should be public, not just among labs.
3. Announcements Are More Than Signaling
When a lab announces a safety commitment:
- It's not just signaling intent
- It's creating common knowledge of intent
- This enables others to reason about coordination
Implication: Labs should make explicit, public commitments rather than relying on implicit understanding.
4. Third-Party Verification Matters
External audits and verification create common knowledge that self-reporting doesn't. A lab saying "we're safe" creates mutual knowledge. An auditor saying "they're safe" creates common knowledge.
Implication: Third-party oversight mechanisms are valuable for coordination, not just compliance.
---
A Common Knowledge Framework for AI Safety
Level 1: Mutual Knowledge (Insufficient)
- Labs privately agree racing is bad
- Labs privately want coordination
- Result: No coordination (each unsure if others will follow through)
Level 2: Limited Common Knowledge
- Labs publicly announce safety commitments
- Community knows intentions
- Result: Limited coordination (some trust, some uncertainty)
Level 3: Full Common Knowledge
- Public, verified safety practices
- Common knowledge of game structure (who has what capabilities)
- Common knowledge of intentions (explicit commitments)
- Common knowledge of compliance (audited outcomes)
- Result: Robust coordination possible
---
Practical Implications
For Labs
1. Make public commitments - Not just to your partners, but to the world 2. Disclose capabilities - Common knowledge of the game structure 3. Accept verification - Third-party oversight creates common knowledgeFor Governance
1. Design transparency mechanisms - Not just for monitoring, but for coordination 2. Create public registries - Make safety practices common knowledge 3. Facilitate public commitments - Forums where labs can publicly commitFor Researchers
1. Study common knowledge creation - How do public announcements affect behavior? 2. Measure common knowledge - Survey not just beliefs, but beliefs about beliefs 3. Model coordination under incomplete common knowledge - What happens when common knowledge is partial?---
Limitations
Common Knowledge Is Hard to Achieve
- Requires infinite hierarchies of knowledge
- In practice, approximated by "high enough" mutual knowledge
- Unclear how many levels are "enough"
Common Knowledge Can Be Destabilized
- A single public defection can destroy common knowledge of cooperation
- Fragile in dynamic environments
Alternative Approaches
- Credible commitment (burn the boats) may be more robust than common knowledge
- Legal enforcement creates coordination without requiring common knowledge
---
Conclusion
The problem of AI safety coordination is partly a problem of creating common knowledge. Labs may privately agree on the need for coordination, but without common knowledge—without infinite hierarchies of "I know that you know that I know..."—coordination fails.
This explains why:
- Public commitments matter even when everyone already agrees
- Transparency mechanisms are essential for coordination, not just monitoring
- Third-party verification is valuable beyond compliance checking
- Secret agreements are fragile
The path to coordination runs through common knowledge. We should design mechanisms that create it.
---
References
- Aumann, R. (1976). Agreeing to Disagree.
- Lewis, D. (1969). Convention: A Philosophical Study.
- Schelling, T. (1960). The Strategy of Conflict.
- Littlewood, J. (1953). A Mathematician's Miscellany.
---
This paper draws on the Stanford Encyclopedia of Philosophy entry on Common Knowledge.