# AI Safety Research Priorities: A Living Document
**Version:** 1.0
**Date:** February 14, 2026
**Purpose:** Current priorities for AI safety research, updated as field evolves
---
## How to Use This Document
This is a living document tracking AI safety research priorities. It should be updated as:
- New problems emerge
- Problems are solved
- Priorities shift
- Capabilities advance
Use it to:
- Guide research direction
- Allocate resources
- Identify neglected areas
- Track progress
---
## Top-Tier Priorities
### Priority 1: Corrigibility and Interruptibility
**INT Score:** 244
**Status:** Active research needed
**Timeline:** Near-term
**Why Critical:**
- Foundation for safe AI
- Enables correction of mistakes
- Required for safe deployment
- Relatively tractable
**Key Questions:**
- How to maintain corrigibility under capability increase?
- How to ensure interruptibility can't be disabled?
- How to handle incentives to avoid interruption?
**Research Directions:**
- Corrigibility preservation through capability gains
- Robust interruptibility mechanisms
- Theoretical foundations of corrigibility
- Practical implementation guidance
### Priority 2: Scalable Oversight
**INT Score:** 195
**Status:** Active research needed
**Timeline:** Near-term to medium-term
**Why Critical:**
- Required for supervising superintelligent AI
- Addresses information asymmetry
- Enables human control
**Key Questions:**
- How can humans supervise AI smarter than themselves?
- How to ensure oversight isn't deceived?
- What are the limits of oversight?
**Research Directions:**
- Iterated amplification
- Debate and adversarial oversight
- Decomposition methods
- Scalable verification
### Priority 3: Inner Alignment
**INT Score:** 194
**Status:** Theoretical work needed
**Timeline:** Medium-term
**Why Critical:**
- Mesa-optimization can create misalignment
- Hard to detect
- Could undermine outer alignment
**Key Questions:**
- When does mesa-optimization emerge?
- How to prevent mesa-optimization misalignment?
- How to detect mesa-optimization?
**Research Directions:**
- Mesa-optimization theory
- Detection methods
- Prevention mechanisms
- Empirical study
---
## Second-Tier Priorities
### Priority 4: Interpretability and Transparency
**INT Score:** 180
**Status:** Active research
**Timeline:** Near-term
**Key Questions:**
- How to understand AI reasoning?
- How to detect deception?
- How to verify alignment?
### Priority 5: Value Learning
**INT Score:** 175
**Status:** Active research
**Timeline:** Near-term to medium-term
**Key Questions:**
- How to learn human values accurately?
- How to handle value uncertainty?
- How to aggregate diverse values?
### Priority 6: Multi-Agent Coordination
**INT Score:** 170
**Status:** Emerging research
**Timeline:** Medium-term
**Key Questions:**
- How to coordinate multiple AI systems?
- How to prevent emergent miscoordination?
- How to design aligned multi-agent systems?
### Priority 7: Robustness and Reliability
**INT Score:** 165
**Status:** Active research
**Timeline:** Near-term
**Key Questions:**
- How to ensure AI works reliably?
- How to handle distributional shift?
- How to verify safety properties?
---
## Third-Tier Priorities
### Priority 8: Governance and Policy
**INT Score:** 150
**Status:** Active development
**Timeline:** Near-term
**Focus Areas:**
- Regulatory frameworks
- Coordination mechanisms
- Institutional design
- International coordination
### Priority 9: Technical Safety Tools
**INT Score:** 145
**Status:** Active development
**Timeline:** Near-term
**Focus Areas:**
- Monitoring tools
- Testing frameworks
- Verification systems
- Safety infrastructure
### Priority 10: Field Building
**INT Score:** 140
**Status:** Ongoing
**Timeline:** Continuous
**Focus Areas:**
- Researcher training
- Community development
- Knowledge infrastructure
- Resource allocation
---
## Emerging Priorities
### Emergent Priority 1: Deception Detection
**Status:** Critical but underexplored
**Timeline:** Near-term
**Why Emerging:**
- Deceptive alignment is critical risk
- Detection methods limited
- Urgent need for progress
### Emergent Priority 2: Emergency Preparedness
**Status:** Underdeveloped
**Timeline:** Near-term
**Why Emerging:**
- Systems becoming more capable
- Response mechanisms limited
- Need preparation before crises
### Emergent Priority 3: AI Race Dynamics
**Status:** Already observable
**Timeline:** Immediate
**Why Emerging:**
- Race dynamics intensifying
- Coordination mechanisms weak
- Could undermine safety efforts
---
## Research Gaps
### Gap 1: Empirical Alignment Research
**What's Missing:** Empirical testing of alignment approaches
**Why Important:** Theory needs validation
**What to Do:** More experiments, measurement, testing
### Gap 2: Safety-Capability Balance
**What's Missing:** Understanding when safety research lags capability
**Why Important:** Could create dangerous gaps
**What to Do:** Track both, identify imbalances
### Gap 3: Cross-Cultural Value Learning
**What's Missing:** Handling diverse human values
**Why Important:** Global AI deployment
**What to Do:** Value aggregation research, inclusive processes
### Gap 4: Long-Term AI Safety
**What's Missing:** Research on far-future scenarios
**Why Important:** Preparing for advanced AI
**What to Do:** Theoretical work, scenario analysis
---
## Prioritization Criteria
### Importance Factors
- Scale: How many affected?
- Severity: How bad could it be?
- Irreversibility: Can we fix it later?
- Probability: How likely?
### Neglectedness Factors
- Current attention: How many working on it?
- Funding: Resources available?
- Progress rate: How fast moving?
### Tractability Factors
- Technical feasibility: Can we solve it?
- Timeline: How long will it take?
- Dependencies: What must happen first?
---
## Resource Allocation Recommendations
### Research Funding
- Top-tier priorities: 50%
- Second-tier priorities: 30%
- Third-tier priorities: 15%
- Emerging priorities: 5%
### Talent Allocation
- Corrigibility and scalable oversight: Most urgent
- Inner alignment: Growing importance
- Interpretability: Continuous need
- Coordination: Emerging need
### Timeline Priorities
**Next 6 months:**
- Deception detection methods
- Corrigibility implementation
- Monitoring systems
- Race dynamics mitigation
**6-18 months:**
- Scalable oversight scaling
- Inner alignment theory
- Emergency preparedness
- International coordination
**18-36 months:**
- Comprehensive safety systems
- Field-wide coordination
- Advanced theoretical work
- Implementation at scale
---
## Success Metrics
### For Priorities
- Progress on key questions
- Quality of research outputs
- Implementation of solutions
- Risk reduction achieved
### For Document
- Regular updates (monthly)
- Community input incorporated
- Tracking of changes over time
- Alignment with field developments
---
## Update Process
### Monthly Review
1. Assess progress on priorities
2. Identify new developments
3. Adjust priorities if needed
4. Document changes and rationale
### Quarterly Assessment
1. Comprehensive review of all priorities
2. Update INT scores if needed
3. Identify emerging priorities
4. Reallocate resources if needed
### Annual Review
1. Major reassessment of priorities
2. Long-term trend analysis
3. Strategic adjustments
4. Community engagement
---
## How to Contribute
### Provide Input
- Identify missing priorities
- Suggest adjustments
- Share relevant developments
- Contribute to assessments
### Use This Document
- Guide your research
- Allocate resources
- Track field progress
- Identify collaboration opportunities
### Stay Updated
- Check for updates monthly
- Engage with assessment process
- Share with community
- Provide feedback
---
*"Priorities evolve with the field. This document captures current understanding and should be updated as we learn more."*
**Purpose:** Guide research prioritization
**Use:** Direct research efforts
**Update Frequency:** Monthly
**Next Update:** March 2026