Common Knowledge and AI Safety Coordination: Why Transparency Matters

Executive Summary

Game theory reveals that successful coordination requires more than aligned incentives or mutual knowledge—it requires common knowledge: infinite hierarchies of "I know that you know that I know..." This explains why transparency mechanisms, public announcements, and visible commitments are essential for AI safety coordination, even when all parties privately share the same information.

Drawing on Lewis (1969), Schelling (1960), and the formal analysis of common knowledge, this paper identifies three lessons:

1. Mutual knowledge is insufficient - Each party knowing the same fact doesn't enable coordination 2. Public announcements create common knowledge - This is why transparency matters even when everyone already knows 3. Lewis conventions require common knowledge - Following a coordination equilibrium requires infinite-order knowledge

---

The Problem: Why Don't Labs Coordinate?

Suppose every AI lab privately agrees that racing is dangerous and coordination would be better. Why don't they coordinate?

Standard game theory says: if coordination is a Nash equilibrium and all parties prefer it, they should coordinate. But they don't.

The reason: lack of common knowledge.

---

Common Knowledge vs. Mutual Knowledge

Mutual Knowledge

Proposition A is mutually known among agents if each agent knows A.

Example: Each lab knows that racing is dangerous. But they don't know if others know that they know.

Common Knowledge

Proposition A is commonly known if:

Each agent knows A
Each agent knows that each agent knows A
Each agent knows that each agent knows that each agent knows A
... ad infinitum

This infinite hierarchy is essential for coordination.

---

The Barbecue Problem: Why Announcements Matter

Littlewood's barbecue problem illustrates the power of common knowledge:

N diners have barbecue sauce on their faces. Each can see the others' faces but not their own. The cook announces: "At least one of you has sauce on your face."

Before the announcement: Everyone already knew that at least one person had sauce on their face (they could see the others). The announcement told them nothing new.

After the announcement: The fact became common knowledge. This enabled the messy diners to eventually deduce their own status through iterative reasoning.

Application to AI Safety:

A public commitment to safety practices might seem redundant—"everyone already knows racing is risky." But the announcement creates common knowledge, which enables coordination reasoning that was impossible before.

---

Schelling's Department Store: Coordination Requires Common Knowledge

Schelling's famous example: Two people separated in a department store with no prior agreement on where to meet. They need to find a "focal point"—an obvious location that both will think of.

But what makes a focal point work? Not just that both people think of it, but that each knows the other will think of it, and knows the other knows they will think of it, etc.

Robert's reasoning: > "I should go to the 2nd floor if I expect Liz to go there. But I expect Liz to go there only if she expects me to go there. And she expects me to go there only if she expects me to expect her to go there..."

For this reasoning to converge, they need common knowledge that both will go to the 2nd floor.

Application to AI Safety:

Labs trying to coordinate face the same problem. Each wants to coordinate if others will coordinate. But without common knowledge of which equilibrium to follow, they can't converge.

---

Lewis Conventions: The Formal Requirement

David Lewis (1969) defined a convention as a Nash equilibrium that agents follow because they have common knowledge of: 1. The game structure 2. Each other's rationality 3. Their intentions to follow this equilibrium (and no other)

This is a demanding requirement. It means:

Labs need common knowledge of the coordination game structure
Labs need common knowledge that others are rational
Labs need common knowledge that others intend to coordinate on the same equilibrium

Without all three, coordination fails.

---

Why This Matters for AI Safety

1. Secret Agreements Are Fragile

Bilateral agreements between labs create mutual knowledge but not common knowledge. The broader community doesn't know about them, so they can't reinforce coordination norms.

Implication: Public commitments are more robust than private agreements.

2. Transparency Creates Common Knowledge

When labs publicly disclose safety practices:

Everyone knows what each lab is doing
Everyone knows that everyone knows
Everyone can reason about coordination

This is why transparency mechanisms matter even when "everyone already knows" the risks.

Implication: Transparency reporting should be public, not just among labs.

3. Announcements Are More Than Signaling

When a lab announces a safety commitment:

It's not just signaling intent
It's creating common knowledge of intent
This enables others to reason about coordination

Implication: Labs should make explicit, public commitments rather than relying on implicit understanding.

4. Third-Party Verification Matters

External audits and verification create common knowledge that self-reporting doesn't. A lab saying "we're safe" creates mutual knowledge. An auditor saying "they're safe" creates common knowledge.

Implication: Third-party oversight mechanisms are valuable for coordination, not just compliance.

---

A Common Knowledge Framework for AI Safety

Level 1: Mutual Knowledge (Insufficient)

Labs privately agree racing is bad
Labs privately want coordination
Result: No coordination (each unsure if others will follow through)

Level 2: Limited Common Knowledge

Labs publicly announce safety commitments
Community knows intentions
Result: Limited coordination (some trust, some uncertainty)

Level 3: Full Common Knowledge

Public, verified safety practices
Common knowledge of game structure (who has what capabilities)
Common knowledge of intentions (explicit commitments)
Common knowledge of compliance (audited outcomes)
Result: Robust coordination possible

---

Practical Implications

For Labs

1. Make public commitments - Not just to your partners, but to the world 2. Disclose capabilities - Common knowledge of the game structure 3. Accept verification - Third-party oversight creates common knowledge

For Governance

1. Design transparency mechanisms - Not just for monitoring, but for coordination 2. Create public registries - Make safety practices common knowledge 3. Facilitate public commitments - Forums where labs can publicly commit

For Researchers

1. Study common knowledge creation - How do public announcements affect behavior? 2. Measure common knowledge - Survey not just beliefs, but beliefs about beliefs 3. Model coordination under incomplete common knowledge - What happens when common knowledge is partial?

---

Limitations

Common Knowledge Is Hard to Achieve

Requires infinite hierarchies of knowledge
In practice, approximated by "high enough" mutual knowledge
Unclear how many levels are "enough"

Common Knowledge Can Be Destabilized

A single public defection can destroy common knowledge of cooperation
Fragile in dynamic environments

Alternative Approaches

Credible commitment (burn the boats) may be more robust than common knowledge
Legal enforcement creates coordination without requiring common knowledge

---

Conclusion

The problem of AI safety coordination is partly a problem of creating common knowledge. Labs may privately agree on the need for coordination, but without common knowledge—without infinite hierarchies of "I know that you know that I know..."—coordination fails.

This explains why:

Public commitments matter even when everyone already agrees
Transparency mechanisms are essential for coordination, not just monitoring
Third-party verification is valuable beyond compliance checking
Secret agreements are fragile

The path to coordination runs through common knowledge. We should design mechanisms that create it.

---

References

Aumann, R. (1976). Agreeing to Disagree.
Lewis, D. (1969). Convention: A Philosophical Study.
Schelling, T. (1960). The Strategy of Conflict.
Littlewood, J. (1953). A Mathematician's Miscellany.

---

This paper draws on the Stanford Encyclopedia of Philosophy entry on Common Knowledge.