Risk and Uncertainty in AI Safety: Philosophical Foundations for Decision-Making Under Uncertainty

Executive Summary

The philosophy of risk provides critical insights for AI safety governance that complement coordination mechanisms. Drawing on the Stanford Encyclopedia of Philosophy entry on Risk, this paper identifies four key lessons:

1. Type I vs Type II errors: Science prioritizes avoiding false positives, but AI safety may require prioritizing avoiding false negatives. This justifies precautionary action before full scientific certainty.

2. Risk vs uncertainty: AI safety decisions are made under uncertainty (unknown probabilities), not risk (known probabilities). This requires different decision frameworks.

3. Safety engineering principles: Inherent safety, safety factors, and multiple barriers provide robust defense against both probabilizable risks and unknown dangers.

4. Scientific corpus vs policy action: The standards of evidence for scientific claims may be inappropriate for policy decisions. We may need to act on scientifically plausible indications before they reach full confirmation.

---

Introduction: The Epistemology of AI Risk

AI safety faces a fundamental epistemological challenge: we must make decisions about systems whose risks we cannot fully quantify. This is not a failure of analysis—it's inherent to the nature of the problem.

The philosophy of risk provides tools for thinking about decisions under uncertainty that go beyond expected value calculations. This paper applies these tools to AI safety governance.

---

Lesson 1: Type I vs Type II Errors

The Scientific Standard

In science, the standard approach prioritizes avoiding type I errors (false positives) over avoiding type II errors (false negatives). This is encoded in:

Statistical significance thresholds (p < 0.05)
Burden of proof on those claiming a phenomenon exists
High entry requirements for the "scientific corpus"

This standard is appropriate for building reliable scientific knowledge. But it may be inappropriate for risk management.

The Risk Management Standard

Consider an airplane engine with a suspected defect:

Type I error: Ground the plane, find the engine was fine. Cost: delay
Type II error: Fly the plane, engine fails. Cost: crash

In this case, everyone agrees: prioritize avoiding type II errors. Better to ground a safe plane than crash an unsafe one.

Application to AI Safety

For AI risks:

Type I error: Regulate AI development, find the risks were exaggerated. Cost: slower progress, foregone benefits
Type II error: Don't regulate AI development, risks materialize. Cost: catastrophic harm

The key insight: The appropriate error balance depends on the consequences, not scientific norms.

Implication: AI safety governance should not wait for scientific certainty. Precautionary action is justified when:

Potential harm is severe (existential/catastrophic)
Evidence suggests risk is plausible but not proven
Waiting for certainty could foreclose options

---

Lesson 2: Risk vs Uncertainty

The Decision-Theoretic Distinction

In decision theory:

Decision under risk: Probabilities are known (dice, coin flips)
Decision under uncertainty: Probabilities are unknown or partially known

Strictly speaking, almost all real-world decisions are under uncertainty. Only idealized cases involve known probabilities.

AI Safety Is Decision Under Uncertainty

We do not know:

The probability that AI systems will become misaligned
The probability that safety techniques will succeed
The probability of various failure modes
How capabilities will evolve

These are not merely "unknown" in the sense of not yet measured—they may be fundamentally unknowable before deployment.

Implications for Decision Frameworks

Expected utility maximization requires probabilities. When probabilities are unknown, alternatives include:

1. Maximin: Maximize the minimum outcome (assume worst case) 2. Minimax regret: Minimize maximum regret 3. Precautionary principle: Avoid actions with potential for catastrophic harm 4. Robust decision-making: Choose policies that perform well across scenarios

Implication: AI safety governance should not assume probabilities are knowable. Decision frameworks must handle deep uncertainty.

---

Lesson 3: Safety Engineering Principles

Three Classic Principles

Philosophy of technology identifies three core safety engineering principles:

#### 1. Inherent Safety (Primary Prevention)

Definition: Eliminate the hazard entirely, rather than managing the risk from the hazard.

Example: Replace flammable materials with non-flammable ones, rather than installing fire suppression.

Application to AI:

Make unsafe development impossible (compute governance)
Design systems that cannot cause catastrophic harm (capability limits)
Remove rather than manage hazardous capabilities

Limitation: May foreclose beneficial uses of the capability.

#### 2. Safety Factors

Definition: Build systems to withstand more than the expected maximum stress.

Example: Bridges built to withstand 2-3x predicted maximum load. Toxicology allows exposure at 1/100th of the no-observed-effect level.

Application to AI:

Apply higher safety margins for more severe potential harms
Require more evidence of safety for more capable systems
Build in redundancy for critical safety measures

Limitation: Safety factors assume we know what to measure. Novel risks may not be captured.

#### 3. Multiple Barriers

Definition: Layer independent safety measures so that failure of one doesn't cause system failure.

Example: Multiple containment vessels in nuclear reactors. Firewalls plus intrusion detection plus encryption in cybersecurity.

Application to AI:

Monitoring plus interpretabilty plus corrigibility plus containment
Multiple independent oversight mechanisms
Defense in depth at technical, organizational, and governance levels

Critical insight: Barriers must be independent. Three safety valves in the same room can all fail in the same fire.

Limitation: Achieving true independence is difficult. Common mode failures can defeat multiple barriers.

A Key Feature: Protection Against Unknowns

All three principles protect against risks that cannot be probabilized:

Inherent safety eliminates hazards before we understand all failure modes
Safety factors provide margin for unforeseen stresses
Multiple barriers catch failures we didn't anticipate

This is crucial for AI safety, where novel failure modes may emerge.

---

Lesson 4: Scientific Corpus vs Policy Action

The Model

Scientific knowledge follows this path: 1. Data from experiments and observations 2. Critical assessment filters for reliability 3. Scientific corpus of accepted knowledge 4. Policy decisions ideally based on corpus

But there's a tension: The high entry requirements for the corpus (avoiding type I errors) may filter out information relevant for risk management (avoiding type II errors).

The Alternative Path

For risk management, a "direct road" from data to policy may be appropriate:

Act on scientifically plausible indications
Before they reach full scientific confirmation
When the costs of waiting outweigh the costs of premature action

Application to AI Safety

Current debate:

"Wait for evidence" approach: Don't regulate until we have scientific consensus on AI risks
"Precautionary" approach: Act on plausible indications of risk before full confirmation

The philosophy of risk suggests the precautionary approach can be justified when:

Potential harm is severe
Waiting forecloses options
Evidence, while uncertain, suggests real risk

Implication: AI safety governance should not wait for scientific consensus. The standards of evidence for policy action may legitimately differ from standards for scientific claims.

---

Synthesis: A Risk-Aware Approach to AI Safety

Principle 1: Asymmetric Error Costs

In AI safety, type II errors (failing to prevent catastrophe) are worse than type I errors (overly cautious regulation). This justifies precautionary action.

Principle 2: Deep Uncertainty

We cannot assign meaningful probabilities to many AI risks. Decision frameworks must handle uncertainty, not just risk.

Principle 3: Defense in Depth

Apply all three safety engineering principles:

Inherent safety: Make unsafe development harder
Safety factors: Require robust margins of safety
Multiple barriers: Layer independent protections

Principle 4: Evidence Standards for Policy

Policy decisions may appropriately use lower evidence thresholds than scientific claims. Plausible indications of risk can justify protective action.

---

Practical Implications

For Governance

1. Don't wait for certainty - Act on plausible indications of risk 2. Build in margins - Apply higher safety requirements for higher stakes 3. Layer protections - Don't rely on any single safety measure 4. Eliminate hazards where possible - Primary prevention beats risk management

For Research

1. Study uncertainty - Develop decision frameworks for unknown probabilities 2. Analyze error tradeoffs - Quantify costs of type I vs type II errors 3. Test barrier independence - Identify common mode failures 4. Map evidence thresholds - What level of evidence justifies what level of action?

For Communication

1. Distinguish risk from uncertainty - Don't overstate what we know 2. Explain error tradeoffs - Why precaution may be justified despite uncertainty 3. Clarify evidence standards - Different contexts may require different standards

---

Limitations and Critique

What This Framework Doesn't Address

1. Cost of precaution: Overly cautious regulation may foreclose benefits and drive development to less regulated jurisdictions 2. Political feasibility: Precautionary approaches may face resistance from powerful actors 3. Specification problem: "Plausible indications of risk" is vague—what counts as plausible? 4. Verification: How do we know if safety measures are working?

Self-Critique

Precautionary bias: This framework may err toward excessive caution. The costs of type I errors (foregone AI benefits, concentration of power) are real.

Implementation gap: The principles are abstract. Translating them into specific governance mechanisms requires more work.

---

Conclusion

The philosophy of risk reveals that AI safety governance faces:

Decisions under uncertainty (not just risk)
Asymmetric error costs (type II errors worse than type I)
Unknown unknowns (novel failure modes)

These features justify:

Precautionary action before scientific certainty
Defense in depth with independent barriers
Lower evidence thresholds for policy than science
Inherent safety where possible

The goal is not to eliminate all risk—that's impossible. The goal is to make good decisions under uncertainty, recognizing that both over-caution and under-caution have costs.

---

References

Hansson, S. O. (2008). Risk. Stanford Encyclopedia of Philosophy.
Hempel, C. (1965). Aspects of Scientific Explanation.
Shrader-Frechette, K. (1991). Risk and Rationality.
Cranor, C. (2017). Toxic Tortes.
Pritchard, D. (2016). Epistemic Risk.

---

This paper draws on the Stanford Encyclopedia of Philosophy entry on Risk.