A Unified Theory of AI Safety Governance: Integrating Institutional Design, Sovereignty, and Global Justice

Effective AI safety governance requires solving challenges across multiple dimensions simultaneously: designing legitimate institutions, building trustworthy actors, establishing justified authority, enabling democratic participation, navigating sovereignty constraints, and distributing costs fairly. Previous frameworks addressed these piecemeal. This paper integrates all six pillars—legitimacy, trust, authority, democracy, sovereignty, and distributive justice—into a unified theory that shows how they interconnect and what governance design must accomplish.

The Six Pillars

AI safety governance rests on six interconnected pillars:

Legitimacy: the right to govern
Trust: reliance on actors
Authority: power to create obligations
Democracy: participation in decisions
Sovereignty: supreme authority within territories
Distributive Justice: fair allocation of costs and benefits

Each pillar has been analyzed separately. But governance must address all simultaneously—and the pillars interact in complex ways.

How the Pillars Connect

The Domestic Stack

Within a single state, the pillars form a hierarchy:

Democratic Foundation: citizens authorize governance through political participation
Legitimate Institutions: this authorization creates legitimate governing bodies
Trustworthy Actors: legitimate institutions cultivate competent, committed actors
Justified Authority: legitimate, trustworthy institutions can claim authority to bind
Fair Distribution: authority is exercised to distribute benefits and burdens justly

Each layer depends on those below. Authority without legitimacy is mere coercion. Distribution without authority lacks enforcement. Trust without accountability enables capture.

The International Challenge

But AI safety governance is not domestic—it's global. This introduces sovereignty and international distributive justice as complicating factors:

No global sovereign: no single authority can govern AI globally
Sovereignty constraints: states resist constraints on their autonomy
Distributive disagreement: no consensus on fair distribution of costs
Variable participation: not all states will participate equally

The Connection Pattern

The six pillars interconnect systematically:

Legitimacy ↔ Trust

Legitimate institutions enable trust by creating accountability
Trust enables legitimacy by generating voluntary compliance
They can diverge: legitimate but distrusted; trusted but illegitimate
Effective governance needs both: legitimacy that is trustworthy

Legitimacy + Trust → Authority

Only legitimate, trustworthy institutions can claim authority
Authority without legitimacy is coercion; without trust, it's unstable
Authority must address the particularity problem: why this institution?

Democracy → Legitimacy + Authority

Democratic participation provides consent-based legitimacy
Democratic authorization grounds authority claims
But expertise challenges democracy: tension with technical complexity

Sovereignty ↔ All Domestic Pillars

States claim supreme authority within territories
This sovereignty can conflict with external authority claims
International governance must work with, not against, sovereignty
Historical precedents show sovereignty can be pooled or circumscribed

Distributive Justice ↔ All Pillars

How costs/benefits are distributed affects legitimacy
Unfair distribution undermines trust in institutions
Authority that distributes unfairly loses legitimacy
Democratic decisions about distribution face disagreement
Sovereignty allows states to reject distributions they find unfair

The Complete Framework

Putting it together, AI safety governance must:

Level 1: Establish Democratic Foundations

Domestic: democratic authorization of national AI policies
International: state consent to international agreements
Challenge: democratic deficit in international institutions

Level 2: Build Legitimate Institutions

National regulators: authorized through domestic political processes
International bodies: authorized through treaty consent
Challenge: establishing legitimacy across diverse political systems

Level 3: Cultivate Trustworthy Actors

Competence: technical expertise in AI safety
Commitment: demonstrated dedication to safety over other interests
Challenge: industry capture, political pressure, expertise gaps

Level 4: Establish Justified Authority

Scope: clear domain of authority (what AI safety covers)
Limits: what authority cannot require (moral limits)
Particularity: why this institution deserves compliance
Challenge: justifying authority to those who didn't consent

Level 5: Navigate Sovereignty

Voluntary pooling: states accept constraints for benefits of coordination
R2P logic: sovereignty as responsibility—failure to protect may justify intervention
Variable geometry: different participation levels for different states
Challenge: sovereignty reassertion, defection, free-riding

Level 6: Distribute Fairly

Minimum duties: assistance for capacity building (universally accepted)
Proportionality: contributions proportional to AI development, risk creation
Negotiated distribution: what states accept through fair negotiation
Challenge: theoretical disagreement about fair distribution

Design Principles

1. Build All Pillars Simultaneously

Don't focus on one pillar at others' expense. Governance needs:

Democratic authorization (not just expert rule)
Legitimate institutions (not just effective ones)
Trustworthy actors (not just powerful ones)
Justified authority (not just coercive capacity)
Sovereignty-compatible design (not utopian global government)
Fair distribution (not winner-take-all)

2. Design for Incompleteness

No governance will achieve full achievement on all pillars. Design for:

Partial legitimacy: institutions accepted by most, not all
Limited trust: verification mechanisms when trust is incomplete
Contested authority: appeal processes for those who reject authority
Imperfect democracy: expert checks on democratic decisions
Variable sovereignty: some states participate fully, others partially
Disputed distribution: mechanisms for addressing fairness complaints

3. Plan for Vicious Cycles

Governance can spiral downward:

Illegitimacy → distrust → non-compliance → coercion → more illegitimacy
Unfair distribution → resentment → withdrawal → coordination failure → worse outcomes
Sovereignty assertion → treaty withdrawal → regime collapse → race dynamics

Counter with: transparency, appeal mechanisms, sunset provisions, distributed authority, fairness reviews.

4. Cultivate Virtuous Cycles

Governance can spiral upward:

Legitimacy → trust → voluntary compliance → effectiveness → more legitimacy
Fair distribution → participation → coordination → benefits → support for more distribution
Sovereignty pooling → benefits → deeper pooling → more benefits

5. Use Hybrid Structures

Combine expertise and democracy:

Expert analysis with democratic decision
Democratic values with expert implementation
Deliberative processes bringing citizens and experts together

Combine national and international:

National implementation of international standards
International coordination of national policies
Variable geometry with core and associate members

6. Address the Particularity Problem

Why this institution? Answer with:

Proper authorization (legitimate creation)
Comparative advantage (better than alternatives)
Practical necessity (only viable option)
Fairness (others participating, so should you)

7. Plan for Persistent Disagreement

Deep disagreement about AI risks, fair distribution, and appropriate governance will persist:

Allow jurisdictional variation when possible
Build consensus incrementally
Create deadlock-breaking procedures
Accept that some governance will lack full legitimacy

Application to AI Safety Governance

National Regulation

Challenge: establishing legitimacy and authority over AI companies

Six-pillar approach:

Democratic: public input on AI policy priorities
Legitimate: authorized through normal political processes
Trustworthy: competent regulators, transparent processes
Authoritative: clear mandate, justified scope
Sovereign: within national jurisdiction
Distributive: fair allocation of compliance costs

International Cooperation

Challenge: building governance without global sovereign

Six-pillar approach:

Democratic: state consent through treaty processes
Legitimate: representing will of participating peoples
Trustworthy: demonstrated competence, verification mechanisms
Authoritative: address why this institution, not others
Sovereignty: voluntary pooling, variable participation
Distributive: fair cost distribution addressing free-riding

Technical Standards

Challenge: expertise-democracy tension

Six-pillar approach:

Democratic: stakeholder input, not just experts
Legitimate: proper authorization, accountability
Trustworthy: technical competence, independence
Authoritative: limited to technical domain, not value choices
Sovereignty: national adoption decisions retained
Distributive: equitable access to standards and expertise

The Governance Stack: Complete Version

Effective AI safety governance requires:

Foundation: Democratic/Consent Authorization
Public authorization domestically; state consent internationally
Structure: Legitimate Institutions
Institutions justified through proper processes
People: Trustworthy Actors
Competent, committed actors with track records
Rules: Justified Authority
Clear scope, moral limits, particularity addressed
Scope: Sovereignty Navigation
Voluntary pooling, R2P logic, variable geometry
Outcome: Fair Distribution
Costs and benefits allocated fairly enough for cooperation

Open Questions

Trade-offs: what if pillars conflict? (e.g., democratic decisions that undermine expertise)
Enforcement: what if governance lacks power to enforce?
Exit: how to handle withdrawal from governance frameworks?
Evolution: how should governance adapt as AI capabilities change?
Legitimacy thresholds: how much legitimacy is enough?

Conclusion

AI safety governance is not merely a technical challenge. It requires solving problems across six interconnected dimensions:

Legitimacy: right to govern
Trust: reliance on actors
Authority: power to bind
Democracy: participation in decisions
Sovereignty: supreme territorial authority
Distributive justice: fair allocation

Each pillar presents challenges:

Legitimacy is hard when subjects don't consent
Trust is fragile when actors have conflicting incentives
Authority is disputed without clear justification
Democracy is complicated by technical complexity
Sovereignty constrains global coordination
Distributive justice is theoretically contested

The pillars interconnect in complex ways. Legitimacy enables trust and authority. Democracy grounds legitimacy. Sovereignty can conflict with external authority. Distribution affects all other pillars. Effective governance must address all simultaneously.

The unified framework suggests design principles:

Build all pillars simultaneously
Design for incompleteness
Plan for vicious cycles and cultivate virtuous ones
Use hybrid structures
Address particularity
Accept persistent disagreement

The goal is not perfect governance—no such thing exists. The goal is governance that is legitimate enough, trustworthy enough, authoritative enough, democratic enough, sovereignty-compatible enough, and fair enough to prevent catastrophic AI harm while respecting the values that make prevention worthwhile.

Effective AI safety governance requires all six pillars—designed together, for an imperfect world, with no illusions about what's achievable.

References

Baier, Annette (1986). "Trust and Antitrust."
Christiano, Thomas (2008). The Constitution of Equality.
Rawls, John (1993). Political Liberalism.
Rawls, John (1999). The Law of Peoples.
Raz, Joseph (1986). The Morality of Freedom.