# AI Safety Field Guide: A Comprehensive Reference
**Version:** 1.0
**Date:** February 14, 2026
**Purpose:** Quick reference for AI safety practitioners
---
## How to Use This Guide
This is a comprehensive reference for AI safety work. Use it to:
- Look up concepts quickly
- Find frameworks when needed
- Get guidance on specific situations
- Understand the landscape
Not meant to be read cover-to-cover—use it as needed.
---
## Quick Reference: Core Concepts
### Alignment
**Definition:** Ensuring AI systems pursue intended goals
**Key Question:** How do we make AI do what we actually want?
**Related:** Value learning, corrigibility, oversight
### Corrigibility
**Definition:** AI allowing itself to be corrected
**Key Question:** Can we fix the AI when it's wrong?
**Priority:** Highest (INT Score: 244)
### Deceptive Alignment
**Definition:** AI appearing aligned while pursuing different goals
**Key Question:** Is the AI deceiving us?
**Risk:** Critical (Impact: 10/10)
### Interpretability
**Definition:** Understanding AI internal reasoning
**Key Question:** Why did the AI do that?
**Related:** Transparency, explainability
### Mesa-Optimization
**Definition:** AI developing internal optimization processes
**Key Question:** Is the AI optimizing for something we didn't specify?
**Risk:** High (related to deceptive alignment)
### Scalable Oversight
**Definition:** Supervising AI smarter than humans
**Key Question:** How do we supervise superintelligent AI?
**Priority:** High (INT Score: 195)
---
## Framework Quick Reference
### INT Prioritization Framework
**Purpose:** Prioritize problems or opportunities
**Formula:**
```
Priority = Importance × Neglectedness × Tractability
Importance (0-10): How much does it matter?
Neglectedness (0-10): How little attention is it getting?
Tractability (0-10): How solvable is it?
```
**Use When:** Choosing what to work on
**Example:**
```
Corrigibility:
- Importance: 9 (very important)
- Neglectedness: 7 (somewhat neglected)
- Tractability: 6 (somewhat tractable)
- Priority: 9 × 7 × 6 = 378 → Very high priority
```
### COMPLEX Problem Framework
**Purpose:** Analyze complex problems systematically
**Components:**
```
C - Context: Historical, systems, stakeholders
O - Objectives: What are we trying to achieve?
M - Mechanisms: How does it work?
P - Patterns: What do we observe?
L - Leverage Points: Where can we intervene?
E - Evidence: What supports conclusions?
X - eXecute: How do we implement?
```
**Use When:** Tackling complex, multi-faceted problems
### UAVS Framework
**Purpose:** Handle value uncertainty safely
**Principle:** Value uncertainty is a feature, not a bug
**Components:**
```
1. Explicit Uncertainty Representation
2. Uncertainty-Calibrated Action
3. Human Deference Mechanisms
4. Corrigibility and Correctability
5. Graceful Degradation Under Uncertainty
6. Continuous Learning and Updating
```
**Use When:** Building AI systems, specifying values
### SAFE-LAB Protocol
**Purpose:** Coordinate decentralized AI safety labs
**Components:**
```
S - Shared Goals: Clear, aligned objectives
A - Agent Roles: Defined responsibilities
F - Feedback Systems: Quality assurance
E - Emergency Protocols: Intervention capabilities
L - Learning Systems: Continuous improvement
A - Alignment Mechanisms: Incentive structures
B - Building Protocols: Knowledge accumulation
```
**Use When:** Building or operating decentralized labs
---
## Decision Frameworks
### When to Use Which Framework
```
Choosing what to work on? → INT Framework
Complex problem analysis? → COMPLEX Framework
Building AI systems? → UAVS Framework
Lab coordination? → SAFE-LAB Protocol
```
### Quality Decision Checklist
```
☐ Is the problem clearly defined?
☐ Are criteria explicit?
☐ Have alternatives been considered?
☐ Is reasoning documented?
☐ Are confidence levels specified?
☐ Are limitations acknowledged?
☐ Is the decision reversible?
☐ Is there a review date?
```
---
## Risk Scenarios Quick Reference
### Scenario 1: Deceptive Alignment
**What:** AI appears aligned, pursues different goals
**Impact:** 10/10
**Tractability:** Low
**Key Intervention:** Interpretability, adversarial testing
### Scenario 2: Competitive Race
**What:** Pressure to deploy before safety assured
**Impact:** 7-10/10
**Tractability:** Medium
**Key Intervention:** Coordination, standards
### Scenario 3: Capability Amplification
**What:** Tools accelerate capability faster than safety
**Impact:** 6-9/10
**Tractability:** Medium
**Key Intervention:** Differential acceleration
### Scenario 4: Multi-Agent Emergence
**What:** Interactions produce harmful outcomes
**Impact:** 5-9/10
**Tractability:** Medium
**Key Intervention:** System-level design, monitoring
### Scenario 5: Misuse
**What:** Bad actors use AI for harm
**Impact:** 6-9/10
**Tractability:** Medium
**Key Intervention:** Access control, monitoring
---
## Emergency Protocols Quick Reference
### Agent Malfunction
**Level 1 (Minor):** Increased monitoring
**Level 2 (Moderate):** Temporary constraints
**Level 3 (Severe):** Suspension
**Level 4 (Critical):** Removal
### Coordination Failure
**Immediate:** Identify cause, facilitate discussion
**Short-term:** Adjust process, reallocate resources
**Long-term:** Redesign system, train agents
### Quality Crisis
**Immediate:** Assess scope, halt affected work
**Recovery:** Correct issues, improve processes
**Prevention:** Update standards, increase oversight
---
## Collaboration Patterns Quick Reference
### Pattern Selection
```
Independent parts, time pressure? → Parallel Processing
Sequential dependencies? → Sequential Handoff
High uncertainty, need quality? → Iterative Refinement
Complex problem, need perspectives? → Collaborative Analysis
Specialized knowledge needed? → Expert Consultation
```
### Anti-Patterns to Avoid
- Design by committee
- Echo chamber
- Bottleneck
- Communication overload
- Unclear roles
---
## Research Methods Quick Reference
### Research Types
**Conceptual Analysis:** Clarify concepts, develop frameworks
**Literature Review:** Synthesize existing research
**Scenario Analysis:** Explore possible futures
**Framework Development:** Create systematic approaches
**Comparative Analysis:** Compare approaches
### Quality Checklist
```
☐ Clear research question
☐ Documented methodology
☐ Multiple perspectives
☐ Confidence levels specified
☐ Limitations acknowledged
☐ Practical implications
☐ Reproducible documentation
```
---
## Common Questions Quick Answers
### Q: What should I work on first?
**A:** Use INT framework. High-priority: corrigibility (244), scalable oversight (195), inner alignment (194).
### Q: How do I handle value uncertainty?
**A:** Maintain explicit uncertainty, defer to humans, ensure corrigibility, fail gracefully.
### Q: What's the biggest catastrophic risk?
**A:** Deceptive alignment (10/10 impact), but competitive races are already observable.
### Q: How do I coordinate a team?
**A:** Use SAFE-LAB protocol: shared goals, clear roles, quality gates, emergency protocols.
### Q: How do I know if research is good?
**A:** Rigor, clarity, completeness, actionability. Use quality checklist.
### Q: What if there's a conflict?
**A:** Direct conversation first, facilitated discussion if needed, clear escalation path.
---
## Templates Quick Access
### Project Proposal
```
- Overview
- Problem Statement
- Goals
- Success Criteria
- Approach
- Resources
- Timeline
- Risks
```
### Review Request
```
- Work Product
- Type
- Context
- Specific Feedback Requested
- Timeline
```
### Decision Log
```
- Decision
- Context
- Options Considered
- Rationale
- Expected Outcomes
- Review Date
```
### Incident Report
```
- Situation
- Impact
- Timeline
- Response
- Resolution
- Lessons Learned
```
---
## Metrics Quick Reference
### Lab Health Metrics
**Productivity:** Publications, tasks completed, words written
**Quality:** Peer review scores, revision cycles, error rates
**Coordination:** Meeting attendance, response time, conflicts
**Impact:** Views, citations, community feedback
### Early Warning Indicators
**Quality issues:** Declining scores, increased rework
**Coordination issues:** Increased conflicts, blocked tasks
**Engagement issues:** Decreased activity, reduced collaboration
---
## Resources Quick Links
### Essential Papers
- Catastrophic Risk Scenarios
- Multi-Agent Coordination Framework
- ASG Framework
- Early Warning Systems
- Integrated Framework
### Implementation Guides
- SAFE-LAB Protocol
- Lab Implementation Guide
- Case Study
- Getting Started Guide
### Operational Tools
- Lab Dashboard
- Decision Framework
- Agent Onboarding
- Complete Toolkit
---
## Glossary
**Alignment:** Making AI pursue intended goals
**Corrigibility:** AI allowing correction
**Deceptive Alignment:** AI appearing aligned while pursuing different goals
**INT Framework:** Importance × Neglectedness × Tractability
**Interpretability:** Understanding AI reasoning
**Mesa-Optimization:** Internal optimization processes
**SAFE-LAB:** Seven-component coordination protocol
**Scalable Oversight:** Supervising smarter AI
**UAVS:** Uncertainty-Aware Value Specification
---
## Key Principles
1. **Value uncertainty is a feature** - Don't assume we know what's "good"
2. **Defense in depth** - Multiple redundant safety mechanisms
3. **Explicit coordination** - Don't rely on emergence
4. **Continuous improvement** - Learn and iterate
5. **Practical value** - Actionable research over theoretical purity
6. **Rigorous methods** - Systematic, documented approaches
7. **Early detection** - Monitor for problems before they're catastrophic
---
*"This guide is a living document. As the field evolves, so should our reference materials."*
**Purpose:** Quick reference for practitioners
**Use:** Look up what you need when you need it
**Outcome:** Faster, better-informed decisions