error-recovery-patterns
skill✓This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).
apm::install
apm install @github/error-recovery-patternsapm::skill.md
# Error Recovery Patterns Skill
This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).
## Purpose
Guide developers in implementing robust error recovery patterns to:
- Reduce retry loops in agent sessions (target: <10% vs current 23%)
- Implement circuit breakers to prevent infinite retry loops
- Add proactive recovery for installation, dependency, and API failures
- Improve debug logging for recovery attempts
## When to Use This Skill
Invoke this skill when:
- Implementing retry logic for network operations, installations, or API calls
- Debugging retry loop issues in workflows or agent sessions
- Adding error recovery patterns to new or existing code
- Understanding transient vs non-transient error classification
- Implementing circuit breakers or exponential backoff
- Adding debug logging for recovery attempts
## Key Concepts Covered
### 1. Circuit Breaker Pattern
- Maximum retry limits (standard: 3 attempts)
- Exponential backoff strategies
- Fail-fast on non-transient errors
- Implementation in JavaScript, Shell, and Go
### 2. Installation Failure Recovery
- NPM installation with cache clearing and registry fallbacks
- Python pip installation with mirror alternatives
- Docker image pull with retry and rate limit handling
- Copilot CLI installation with network retry
### 3. API Timeout and Rate Limit Handling
- GitHub API rate limit detection and backoff
- Transient error detection patterns
- Custom retry configuration for different APIs
- Rate limit-specific retry strategies
### 4. Debug Logging for Recovery
- Logger package usage for retry attempts
- Category naming conventions (pkg:filename)
- DEBUG environment variable patterns
- Zero-overhead logging when disabled
### 5. Error Categorization
- Transient vs non-transient errors
- Network errors, timeout patterns
- HTTP error codes (502, 503, 504)
- GitHub-specific errors (rate limits, abuse detection)
## Anti-Patterns to Avoid
This skill explicitly covers anti-patterns to avoid:
- ❌ Infinite retry loops without maximum limits
- ❌ Retrying validation errors that won't self-correct
- ❌ No backoff delay between attempts
- ❌ Silent retries without logging
- ❌ Retrying non-transient errors
## Code Examples Provided
The skill includes production-ready examples for:
- JavaScript retry with `withRetry()` function
- Shell script retry loops with exponential backoff
- Go retry patterns with context and timeouts
- NPM/pip/docker installation recovery
- GitHub API rate limit handling
- Debug logging for all recovery attempts
## Related Skills
- **error-messages** - Error message formatting and style guide
- **error-pattern-safety** - Safety guidelines for error pattern regex
- **developer** - General development guidelines and conventions
## Full Documentation
Complete documentation available at: `../../scratchpad/error-recovery-patterns.md`
This skill references the comprehensive error recovery patterns document which includes:
- Console formatting requirements
- Error wrapping patterns
- Common error scenarios with step-by-step resolution
- Error message templates
- Debugging runbook
- Error categorization decision trees
- Metrics and monitoring strategies