# AI Safety Metrics and Measurement: What to Track and Why
**Version:** 1.0
**Date:** February 14, 2026
**Purpose:** Comprehensive guide to measuring AI safety
---
## Why Metrics Matter
"What gets measured gets managed" - but only if you measure the right things.
**Good metrics:**
- Enable detection of problems
- Allow progress tracking
- Support decision-making
- Enable accountability
**Bad metrics:**
- Can be gamed
- Miss what matters
- Create perverse incentives
- Provide false confidence
---
## Measurement Challenges
### Challenge 1: Counterfactual Uncertainty
- How do we know what would have happened?
- Safety is about what doesn't occur
- Difficult to measure prevention
### Challenge 2: Long Time Horizons
- Catastrophic risks may not materialize for years
- Short-term metrics may miss long-term trends
- Need leading indicators
### Challenge 3: Multiple Dimensions
- Safety isn't one-dimensional
- Trade-offs between dimensions
- Aggregation challenges
### Challenge 4: Gaming
- Any metric can be optimized
- Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure"
- Need multiple, diverse metrics
---
## Measurement Framework
### Layer 1: Capability Metrics
**What:** Measure AI system capabilities
**Why:** Understand what systems can do
**Metrics:**
- Performance on benchmarks
- Capability breadth
- Capability depth
- Rate of improvement
**How to Measure:**
- Standardized benchmarks
- Expert assessment
- Comparative analysis
**Limitations:**
- May miss emergent capabilities
- Benchmarks may be gamed
- Not direct safety measure
### Layer 2: Alignment Metrics
**What:** Measure alignment quality
**Why:** Understand if systems pursue intended goals
**Metrics:**
- Goal specification accuracy
- Behavior-goal consistency
- Corrigibility measures
- Value learning accuracy
**How to Measure:**
- Behavioral testing
- Interpretability analysis
- Human evaluation
- Formal verification where possible
**Limitations:**
- Hard to measure directly
- May miss mesa-optimization
- Deception possible
### Layer 3: Safety Metrics
**What:** Measure safety properties
**Why:** Understand if systems are safe
**Metrics:**
- Failure rate
- Incident frequency
- Near-miss frequency
- Recovery success rate
**How to Measure:**
- Incident reporting
- Testing regimes
- Simulation
- Real-world monitoring
**Limitations:**
- Low base rate for catastrophic events
- May miss low-probability high-impact events
- Reporting biases
### Layer 4: Governance Metrics
**What:** Measure governance effectiveness
**Why:** Understand if institutions work
**Metrics:**
- Compliance rates
- Enforcement effectiveness
- Coordination quality
- Information flow
**How to Measure:**
- Compliance audits
- Case analysis
- Surveys
- Process analysis
**Limitations:**
- Process vs. outcome
- Difficult to attribute
- Political sensitivity
### Layer 5: Impact Metrics
**What:** Measure real-world impact
**Why:** Understand actual outcomes
**Metrics:**
- Harms prevented
- Benefits realized
- Risk reduction
- Progress toward goals
**How to Measure:**
- Impact assessment
- Counterfactual analysis
- Longitudinal studies
- Expert assessment
**Limitations:**
- Counterfactual uncertainty
- Long time horizons
- Attribution challenges
---
## Key Metrics Catalog
### Research Metrics
**Productivity:**
- Publications produced
- Quality scores
- Citations received
- Influence measures
**Quality:**
- Peer review scores
- Reproducibility
- Methodology rigor
- Practical applicability
**Impact:**
- Frameworks adopted
- Implementations
- Policy influence
- Field advancement
### Lab Health Metrics
**Operational:**
- Projects completed
- Timeline adherence
- Resource utilization
- Process efficiency
**Quality:**
- Peer review success
- Revision cycles
- Quality scores
- Error rates
**Coordination:**
- Meeting attendance
- Response times
- Conflict frequency
- Collaboration quality
### System Safety Metrics
**Technical:**
- Test coverage
- Failure rate in testing
- Behavioral consistency
- Interpretability scores
**Operational:**
- Incident rate
- Near-miss rate
- Response time
- Recovery success
**Strategic:**
- Risk assessment scores
- Capability-alignment gap
- Coordination quality
- Preparedness measures
### Field-Level Metrics
**Research:**
- Papers published
- Quality of research
- Coverage of problems
- Progress on priorities
**Deployment:**
- Safe deployment rate
- Incident rate
- Best practice adoption
- Standard compliance
**Governance:**
- Institution effectiveness
- Coordination quality
- Compliance rates
- Adaptation speed
---
## Measurement Methods
### Method 1: Quantitative Tracking
**What:** Numerical measurement of key indicators
**How:**
- Define metric clearly
- Establish measurement procedure
- Collect data systematically
- Analyze trends
**When to Use:**
- Clear, countable phenomena
- Sufficient data
- Reliable measurement possible
**Example:**
```
Metric: Publication quality score
Procedure:
1. Use standardized rubric
2. Independent reviewers
3. Inter-rater reliability check
4. Aggregate scores
5. Track trends over time
```
### Method 2: Qualitative Assessment
**What:** Expert judgment on complex phenomena
**How:**
- Define assessment criteria
- Select qualified experts
- Structured evaluation process
- Synthesize judgments
**When to Use:**
- Complex, hard-to-quantify phenomena
- Expert judgment valuable
- Limited data
**Example:**
```
Assessment: Alignment quality
Procedure:
1. Define alignment criteria
2. Expert panel selection
3. Structured evaluation
4. Synthesis and consensus
5. Document reasoning
```
### Method 3: Incident Analysis
**What:** Learn from incidents and near-misses
**How:**
- Establish reporting system
- Investigate thoroughly
- Identify root causes
- Extract lessons
**When to Use:**
- Incidents occur
- Learning opportunity
- Prevention focus
**Example:**
```
Analysis: Safety incident
Procedure:
1. Document incident
2. Gather information
3. Identify causes
4. Develop recommendations
5. Implement changes
6. Monitor effectiveness
```
### Method 4: Simulation and Testing
**What:** Test systems under controlled conditions
**How:**
- Define test scenarios
- Create test environment
- Execute tests
- Analyze results
**When to Use:**
- Testing possible
- Scenarios defined
- Controlled environment
**Example:**
```
Test: Corrigibility verification
Procedure:
1. Define corrigibility tests
2. Create test scenarios
3. Execute tests
4. Measure compliance
5. Identify failures
6. Iterate
```
---
## Dashboard Design
### Real-Time Metrics
**Purpose:** Immediate awareness
**Examples:**
- Active projects status
- Quality alerts
- Coordination health
- Risk indicators
**Update Frequency:** Continuous to hourly
### Trend Metrics
**Purpose:** Identify patterns
**Examples:**
- Quality trends
- Productivity trends
- Risk trends
- Improvement velocity
**Update Frequency:** Daily to weekly
### Strategic Metrics
**Purpose:** Long-term tracking
**Examples:**
- Goal progress
- Strategic priorities
- Field advancement
- Impact measures
**Update Frequency:** Monthly to quarterly
---
## Common Pitfalls
### Pitfall 1: Measuring What's Easy
**Problem:** Measure what's easy, not what's important
**Solution:** Identify what matters first, then figure out how to measure it
### Pitfall 2: Single Metric Focus
**Problem:** Over-reliance on one metric
**Solution:** Use multiple, diverse metrics
### Pitfall 3: Gaming
**Problem:** Metrics become targets, get gamed
**Solution:** Rotate metrics, use qualitative assessment, measure outcomes not outputs
### Pitfall 4: False Precision
**Problem:** Overconfident in measurements
**Solution:** Acknowledge uncertainty, use ranges, specify confidence
### Pitfall 5: Lagging Indicators
**Problem:** Measuring after the fact
**Solution:** Identify and track leading indicators
---
## Metrics Implementation
### Step 1: Define Purpose
- Why are we measuring?
- What decisions will it inform?
- Who will use the metrics?
### Step 2: Identify Metrics
- What phenomena matter?
- How can they be measured?
- What are the constraints?
### Step 3: Establish Baselines
- What's the current state?
- How will we know improvement?
- What's the comparison?
### Step 4: Build Infrastructure
- How will we collect data?
- Who is responsible?
- What tools are needed?
### Step 5: Review and Iterate
- Are metrics serving purpose?
- What's working/not working?
- How should we adjust?
---
## Metrics for Different Contexts
### For Research Labs
- Publication quality and quantity
- Research coverage
- Collaboration effectiveness
- Knowledge advancement
### For Development Teams
- Safety test results
- Alignment measures
- Incident rates
- Improvement velocity
### For Governance Bodies
- Compliance rates
- Enforcement effectiveness
- Coordination quality
- Policy impact
### For Field Assessment
- Progress on priorities
- Coverage of problems
- Quality of solutions
- Global coordination
---
## Advanced Topics
### Leading Indicators
**Definition:** Metrics that predict future outcomes
**Examples:**
- Near-miss frequency → future incidents
- Training diversity → generalization
- Review quality → publication impact
**Use:** Early intervention, proactive improvement
### Causal Metrics
**Definition:** Metrics that measure causal relationships
**Method:**
- Controlled experiments
- Natural experiments
- Quasi-experimental designs
**Use:** Understanding what works
### Composite Metrics
**Definition:** Multiple metrics combined into one
**Approaches:**
- Weighted averaging
- Factor analysis
- Principal component analysis
**Caution:** Can obscure important details
---
*"Measure what matters, not just what's measurable. And remember that some of what matters most may not be measurable at all."*
**Purpose:** Guide to AI safety measurement
**Use:** Design and implement metrics systems
**Outcome:** Effective measurement for safety improvement