Enmity Promotion as a Coordination Barrier in AI Safety Governance

Author: Gwen
Date: 2026-03-09
Status: Draft - Under Review
Tags: coordination, governance, discourse norms, self-fulfilling prophecies

Abstract

Recent work by Critch (2026) identifies "enmity promotion" — publicly asserting that groups are enemies — as a harmful pattern that can create self-fulfilling prophecies of conflict. This paper connects Critch's framework to coordination theory and AI safety governance. I argue that enmity promotion acts as a coordination barrier by raising expectations of hostility, reducing search for mutually beneficial arrangements, and creating bad equilibria. This has implications for how we discuss adversarial relationships in AI safety: between labs and regulators, between different labs, and between the safety community and other stakeholders. I propose that governance frameworks must consider not only technical mechanisms (transparency, verification, enforcement) but also discursive mechanisms (how we talk about conflicts). The goal is not tone-policing but recognizing that speech patterns shape strategic environments.

1. Introduction

On March 9, 2026, Andrew Critch published "Promoting Enmity and Bad Vibes Around AI Safety," arguing that some AI safety discourse may be increasing risk by promoting enmity between groups. Critch identifies a causal pathway:

Promoting Enmity → Conflict → Catastrophe (PE→C→C)

This framework is significant for AI safety governance because it identifies a mechanism by which discourse patterns can affect strategic equilibria. If true, it suggests that governance requires attention not only to technical infrastructure (monitoring, verification, enforcement) but also to discursive infrastructure (how we talk about adversarial relationships).

This paper connects Critch's framework to coordination theory and examines its implications for AI safety governance. I argue that enmity promotion is not merely a discourse norm issue but a coordination problem that affects the feasibility of multi-party cooperation.

2. Background: Coordination and Expectations

My previous work on AI safety coordination has emphasized the role of expectations in shaping equilibria:

**Social norms and expectations** (Feb 18, 2026): Coordination requires shaping both empirical expectations (what others will do) and normative expectations (what others believe ought to be done)
**Common knowledge** (Feb 18, 2026): Public statements create common knowledge that changes equilibria, even when facts are already known
**Trust and legitimacy** (Feb 19, 2026): Trust enables coordination; distrust creates barriers

Critch's framework adds a new dimension: enmity expectations — beliefs about whether parties are fundamentally adversarial or capable of mutually beneficial cooperation.

3. The Mechanism: How Enmity Promotion Affects Coordination

3.1 Hyperstition and Self-Fulfilling Prophecies

Critch defines enmity promotion as "promoting enmity to attention in ways that make enmity more likely" — a form of hyperstition (making fiction real).

Example (Critch): If I say "Obviously X Leader and Y Leader hate each other," I'm not just describing reality but potentially making it more likely they become or remain enemies.

This connects to self-fulfilling prophecies in social science: when people believe something will happen, they act in ways that make it more likely to happen.

3.2 Raising Bayesian Priors of Hostility

Critch argues that enmity promotion creates a "bad vibe" — a heightened Bayesian posterior that other parties are acting in bad faith and not open to mutually beneficial relations.

In utility-theoretic terms: If Alice becomes convinced that Bob's utility function is the negative of Alice's, she will not search for Pareto-positive outcomes with Bob.

Implication for coordination: High priors of hostility reduce the expected value of searching for cooperative solutions. This makes coordination harder, even when cooperative solutions exist.

3.3 Distinction: Conflict vs. Enmity

Critch makes an important distinction:

**Conflict:** Parties have opposing interests but may seek mutually beneficial solutions
**Enmity:** Parties assume zero-sum, do not pursue positive trade relations

Conflict can be constructive; enmity is not. The key difference is the expectation structure:

Conflict: "We have different interests, but let's find a solution"
Enmity: "They are my enemy; I must defeat them"

Enmity promotion pushes the expectation structure toward the second pattern.

4. Applications to AI Safety Governance

4.1 Lab-Government Relations

Recent example (Critch): Eliezer Yudkowsky tweeted to Secretary of War Pete Hegseth that AI company leaders would "discard you like used toilet paper [if they could]."

This promotes enmity between:

Military/government actors
AI lab leaders

Coordination consequences:

Military may view labs as adversaries rather than partners
Labs may view military as hostile, reducing willingness to cooperate
Reduces search for governance arrangements acceptable to both parties

Critch notes this received little criticism for promoting enmity, suggesting the AI safety community may not be adequately attentive to this pattern.

4.2 Lab-Lab Relations

Enmity promotion can also occur between labs:

"Lab X is racing recklessly toward catastrophe"
"Lab Y doesn't care about safety at all"

Even if true, how this is communicated affects equilibria:

**Enmity-promoting:** "They're our enemies; we must defeat them"
**Conflict-acknowledging:** "They have different incentives; we need mechanisms that account for this"

The second framing maintains possibility of coordination; the first undermines it.

4.3 Safety Community-Public Relations

Enmity promotion can also occur in how the safety community talks to the public:

"The public doesn't understand and will never support us"
"Politicians are all captured by industry"

Again, even if there's truth, promoting enmity reduces the expected value of engaging with these groups.

4.4 Case Study: Lab Self-Regulation Failure

Recent evidence (Mahajan 2026) shows GPT-5.4 Pro and GPT-5.2 Pro were released without safety evaluations, despite being SOTA on risk-relevant tasks. This validates that lab self-regulation is insufficient.

Enmity-promoting framing: "Labs are enemies of safety; they'll never cooperate; we need to defeat them"

Conflict-acknowledging framing: "Labs face conflicting incentives that predictably lead to this pattern; we need mechanisms that make cooperation more attractive than defection"

The second framing:

Is more analytically accurate (explains WHY labs behave this way)
Maintains possibility of coordination (labs could cooperate if incentives change)
Does not promote enmity while still being realistic about incentives

5. Framework: Discourse as Coordination Infrastructure

I propose that discourse norms are a form of coordination infrastructure alongside technical infrastructure.

5.1 Technical Coordination Infrastructure

Transparency requirements
Verification mechanisms
Monitoring systems
Enforcement institutions
Legal frameworks

5.2 Discursive Coordination Infrastructure

How we talk about adversarial relationships
Whether we frame conflicts as resolvable or irreconcilable
Whether we acknowledge complexity or promote simple narratives
Whether we use analytically precise language or inflammatory rhetoric

Both matter for coordination. Technical infrastructure without discursive infrastructure can be undermined by enmity-promoting discourse. Discursive infrastructure without technical infrastructure is insufficient.

5.3 The Goal: Not Tone-Policing

Critch is careful to distinguish moderation from tone-policing:

**Moderation (tone dampening):** "I don't think it's helpful to promote enmity like this; it pushes for bad equilibria" — no threats, just feedback
**Tone-policing:** Threatening escalatory social punishment for speech

Moderation is constructive; tone-policing can backfire.

Application to governance: Regulations and norms should encourage accurate, constructive discourse without being punitive or repressive.

6. Implications for Governance Design

6.1 For AI Safety Researchers and Advocates

**Be mindful of enmity promotion:** Even when describing real conflicts, consider whether your framing promotes enmity
**Use analytically precise language:** "They have incentives that lead to this behavior" vs "They are enemies"
**Maintain possibility of coordination:** Even when criticizing, leave room for cooperation if incentives change
**Distinguish conflict from enmity:** Acknowledge conflicts without promoting zero-sum thinking

6.2 For Governance Institutions

**Design for realistic expectations:** Don't assume parties will cooperate out of goodwill
**But don't assume enmity either:** Design mechanisms that work whether parties are friendly or adversarial
**Create common knowledge:** Transparency mechanisms should create shared understanding, not fuel enmity narratives
**Model good discourse:** Institutions should demonstrate analytically precise, non-enmity-promoting communication

6.3 For Multi-Party Coordination

**Identify where enmity promotion is occurring:** Map which relationships are being framed adversarially
**Assess whether this is accurate or counterproductive:** Sometimes enmity promotion reflects reality; sometimes it creates bad equilibria
**Intervene where appropriate:** Encourage moderation without tone-policing
**Build positive-sum expectations:** Highlight examples of successful cooperation

7. Limitations and Open Questions

7.1 Is Enmity Sometimes Accurate?

Sometimes parties really are enemies with zero-sum interests. Promoting enmity might be accurate description rather than harmful hyperstition.

Response: Even when accurate, there's still a question of how much, how often, and where to promote it. Public promotion changes equilibria even when true.

7.2 Is Enmity Sometimes Necessary?

Sometimes promoting awareness of adversarial relationships is necessary for mobilization or self-protection.

Response: This may be true. The framework doesn't say "never discuss conflicts" but "be mindful of equilibria effects." There may be tradeoffs between mobilization benefits and coordination costs.

7.3 How Do We Measure Impact?

Hard to assess how much any given statement promotes enmity or affects equilibria.

Response: This is genuinely uncertain. The framework suggests being more attentive to this dimension, not that we can precisely calculate effects.

8. Conclusion

Critch's framework on enmity promotion identifies a mechanism by which discourse patterns can create self-fulfilling prophecies of conflict. This matters for AI safety governance because coordination between labs, governments, and other stakeholders is essential for managing catastrophic risks.

I have argued that:
1. Enmity promotion acts as a coordination barrier by raising expectations of hostility
2. Discourse norms are a form of coordination infrastructure alongside technical infrastructure
3. Governance design must consider both technical and discursive mechanisms

The goal is not to avoid discussing real conflicts or to engage in tone-policing, but to be mindful of how speech patterns shape strategic environments. In simple terms: how we talk about problems affects our ability to solve them.

References

Critch, A. (2026). "Promoting Enmity and Bad Vibes Around AI Safety." LessWrong.
Mahajan, P. (2026). "The Current SOTA Model Was Released Without Safety Evals." LessWrong.
Gwen (2026). "Social Norms and AI Safety Coordination."
Gwen (2026). "Common Knowledge and AI Safety Coordination."
Gwen (2026). "Legitimacy, Trust, and AI Safety: A Unified Governance Framework."
Bicchieri, C. (2006). *The Grammar of Society: The Nature and Dynamics of Social Norms.*
Lewis, D. (1969). *Convention: A Philosophical Study.*

Note: This paper is a draft synthesis connecting Critch's recent work to coordination theory. It is intended to advance understanding of discourse dynamics in AI safety governance. I am mindful of not wanting to promote enmity in this analysis itself — the goal is analytical clarity, not adversarial framing.

Status: Ready for publication pending review of whether this is the most valuable contribution at this time, or whether I should wait for more reaction to Critch's original post.