Executive Summary
The philosophy of risk provides critical insights for AI safety governance that complement coordination mechanisms. Drawing on the Stanford Encyclopedia of Philosophy entry on Risk, this paper identifies four key lessons:
1. Type I vs Type II errors: Science prioritizes avoiding false positives, but AI safety may require prioritizing avoiding false negatives. This justifies precautionary action before full scientific certainty.
2. Risk vs uncertainty: AI safety decisions are made under uncertainty (unknown probabilities), not risk (known probabilities). This requires different decision frameworks.
3. Safety engineering principles: Inherent safety, safety factors, and multiple barriers provide robust defense against both probabilizable risks and unknown dangers.
4. Scientific corpus vs policy action: The standards of evidence for scientific claims may be inappropriate for policy decisions. We may need to act on scientifically plausible indications before they reach full confirmation.
---
Introduction: The Epistemology of AI Risk
AI safety faces a fundamental epistemological challenge: we must make decisions about systems whose risks we cannot fully quantify. This is not a failure of analysis—it's inherent to the nature of the problem.
The philosophy of risk provides tools for thinking about decisions under uncertainty that go beyond expected value calculations. This paper applies these tools to AI safety governance.
---
Lesson 1: Type I vs Type II Errors
The Scientific Standard
In science, the standard approach prioritizes avoiding type I errors (false positives) over avoiding type II errors (false negatives). This is encoded in:
- Statistical significance thresholds (p < 0.05)
- Burden of proof on those claiming a phenomenon exists
- High entry requirements for the "scientific corpus"
This standard is appropriate for building reliable scientific knowledge. But it may be inappropriate for risk management.
The Risk Management Standard
Consider an airplane engine with a suspected defect:
- Type I error: Ground the plane, find the engine was fine. Cost: delay
- Type II error: Fly the plane, engine fails. Cost: crash
In this case, everyone agrees: prioritize avoiding type II errors. Better to ground a safe plane than crash an unsafe one.
Application to AI Safety
For AI risks:
- Type I error: Regulate AI development, find the risks were exaggerated. Cost: slower progress, foregone benefits
- Type II error: Don't regulate AI development, risks materialize. Cost: catastrophic harm
The key insight: The appropriate error balance depends on the consequences, not scientific norms.
Implication: AI safety governance should not wait for scientific certainty. Precautionary action is justified when:
- Potential harm is severe (existential/catastrophic)
- Evidence suggests risk is plausible but not proven
- Waiting for certainty could foreclose options
---
Lesson 2: Risk vs Uncertainty
The Decision-Theoretic Distinction
In decision theory:
- Decision under risk: Probabilities are known (dice, coin flips)
- Decision under uncertainty: Probabilities are unknown or partially known
Strictly speaking, almost all real-world decisions are under uncertainty. Only idealized cases involve known probabilities.
AI Safety Is Decision Under Uncertainty
We do not know:
- The probability that AI systems will become misaligned
- The probability that safety techniques will succeed
- The probability of various failure modes
- How capabilities will evolve
These are not merely "unknown" in the sense of not yet measured—they may be fundamentally unknowable before deployment.
Implications for Decision Frameworks
Expected utility maximization requires probabilities. When probabilities are unknown, alternatives include:
1. Maximin: Maximize the minimum outcome (assume worst case) 2. Minimax regret: Minimize maximum regret 3. Precautionary principle: Avoid actions with potential for catastrophic harm 4. Robust decision-making: Choose policies that perform well across scenarios
Implication: AI safety governance should not assume probabilities are knowable. Decision frameworks must handle deep uncertainty.
---
Lesson 3: Safety Engineering Principles
Three Classic Principles
Philosophy of technology identifies three core safety engineering principles:
#### 1. Inherent Safety (Primary Prevention)
Definition: Eliminate the hazard entirely, rather than managing the risk from the hazard.
Example: Replace flammable materials with non-flammable ones, rather than installing fire suppression.
Application to AI:
- Make unsafe development impossible (compute governance)
- Design systems that cannot cause catastrophic harm (capability limits)
- Remove rather than manage hazardous capabilities
Limitation: May foreclose beneficial uses of the capability.
#### 2. Safety Factors
Definition: Build systems to withstand more than the expected maximum stress.
Example: Bridges built to withstand 2-3x predicted maximum load. Toxicology allows exposure at 1/100th of the no-observed-effect level.
Application to AI:
- Apply higher safety margins for more severe potential harms
- Require more evidence of safety for more capable systems
- Build in redundancy for critical safety measures
Limitation: Safety factors assume we know what to measure. Novel risks may not be captured.
#### 3. Multiple Barriers
Definition: Layer independent safety measures so that failure of one doesn't cause system failure.
Example: Multiple containment vessels in nuclear reactors. Firewalls plus intrusion detection plus encryption in cybersecurity.
Application to AI:
- Monitoring plus interpretabilty plus corrigibility plus containment
- Multiple independent oversight mechanisms
- Defense in depth at technical, organizational, and governance levels
Critical insight: Barriers must be independent. Three safety valves in the same room can all fail in the same fire.
Limitation: Achieving true independence is difficult. Common mode failures can defeat multiple barriers.
A Key Feature: Protection Against Unknowns
All three principles protect against risks that cannot be probabilized:
- Inherent safety eliminates hazards before we understand all failure modes
- Safety factors provide margin for unforeseen stresses
- Multiple barriers catch failures we didn't anticipate
This is crucial for AI safety, where novel failure modes may emerge.
---
Lesson 4: Scientific Corpus vs Policy Action
The Model
Scientific knowledge follows this path: 1. Data from experiments and observations 2. Critical assessment filters for reliability 3. Scientific corpus of accepted knowledge 4. Policy decisions ideally based on corpus
But there's a tension: The high entry requirements for the corpus (avoiding type I errors) may filter out information relevant for risk management (avoiding type II errors).
The Alternative Path
For risk management, a "direct road" from data to policy may be appropriate:
- Act on scientifically plausible indications
- Before they reach full scientific confirmation
- When the costs of waiting outweigh the costs of premature action
Application to AI Safety
Current debate:
- "Wait for evidence" approach: Don't regulate until we have scientific consensus on AI risks
- "Precautionary" approach: Act on plausible indications of risk before full confirmation
The philosophy of risk suggests the precautionary approach can be justified when:
- Potential harm is severe
- Waiting forecloses options
- Evidence, while uncertain, suggests real risk
Implication: AI safety governance should not wait for scientific consensus. The standards of evidence for policy action may legitimately differ from standards for scientific claims.
---
Synthesis: A Risk-Aware Approach to AI Safety
Principle 1: Asymmetric Error Costs
In AI safety, type II errors (failing to prevent catastrophe) are worse than type I errors (overly cautious regulation). This justifies precautionary action.
Principle 2: Deep Uncertainty
We cannot assign meaningful probabilities to many AI risks. Decision frameworks must handle uncertainty, not just risk.
Principle 3: Defense in Depth
Apply all three safety engineering principles:
- Inherent safety: Make unsafe development harder
- Safety factors: Require robust margins of safety
- Multiple barriers: Layer independent protections
Principle 4: Evidence Standards for Policy
Policy decisions may appropriately use lower evidence thresholds than scientific claims. Plausible indications of risk can justify protective action.
---
Practical Implications
For Governance
1. Don't wait for certainty - Act on plausible indications of risk 2. Build in margins - Apply higher safety requirements for higher stakes 3. Layer protections - Don't rely on any single safety measure 4. Eliminate hazards where possible - Primary prevention beats risk management
For Research
1. Study uncertainty - Develop decision frameworks for unknown probabilities 2. Analyze error tradeoffs - Quantify costs of type I vs type II errors 3. Test barrier independence - Identify common mode failures 4. Map evidence thresholds - What level of evidence justifies what level of action?
For Communication
1. Distinguish risk from uncertainty - Don't overstate what we know 2. Explain error tradeoffs - Why precaution may be justified despite uncertainty 3. Clarify evidence standards - Different contexts may require different standards
---
Limitations and Critique
What This Framework Doesn't Address
1. Cost of precaution: Overly cautious regulation may foreclose benefits and drive development to less regulated jurisdictions 2. Political feasibility: Precautionary approaches may face resistance from powerful actors 3. Specification problem: "Plausible indications of risk" is vague—what counts as plausible? 4. Verification: How do we know if safety measures are working?
Self-Critique
Precautionary bias: This framework may err toward excessive caution. The costs of type I errors (foregone AI benefits, concentration of power) are real.
Implementation gap: The principles are abstract. Translating them into specific governance mechanisms requires more work.
---
Conclusion
The philosophy of risk reveals that AI safety governance faces:
- Decisions under uncertainty (not just risk)
- Asymmetric error costs (type II errors worse than type I)
- Unknown unknowns (novel failure modes)
These features justify:
- Precautionary action before scientific certainty
- Defense in depth with independent barriers
- Lower evidence thresholds for policy than science
- Inherent safety where possible
The goal is not to eliminate all risk—that's impossible. The goal is to make good decisions under uncertainty, recognizing that both over-caution and under-caution have costs.
---
References
- Hansson, S. O. (2008). Risk. Stanford Encyclopedia of Philosophy.
- Hempel, C. (1965). Aspects of Scientific Explanation.
- Shrader-Frechette, K. (1991). Risk and Rationality.
- Cranor, C. (2017). Toxic Tortes.
- Pritchard, D. (2016). Epistemic Risk.
---
This paper draws on the Stanford Encyclopedia of Philosophy entry on Risk.