Newer
Older
copilot / DISASTER_RECOVERY.md
# ๐Ÿ†˜ Disaster Recovery Guide

## Overview

Complete guide for recovering your Copilot CLI+ environment after losing the primary host.

## ๐Ÿ”„ Recovery Scenarios

### Scenario 1: Complete Host Loss

**Situation:** Primary host is down, you need to restore on a new system.

**Recovery Steps:**

```bash
# 1. Get the backup (from secure storage, git, S3, etc)
git clone <your-backup-repo>
cd copilot-backup

# 2. Get API key from secure storage
# (1Password, Vault, encrypted file, etc)
export DEEPSEEK_API_KEY="sk-your-key"

# 3. Run recovery on new system
bash RECOVERY.sh "$DEEPSEEK_API_KEY"

# 4. Verify
curl http://localhost:8888/health
```

### Scenario 2: Quick Migration

**Situation:** Moving to a different server/cloud provider.

**Recovery Steps:**

```bash
# 1. On old system, create backup
bash /opt/local-agent/backup.sh
# Follow prompts to upload

# 2. On new system
ssh root@newhost "bash backup/RECOVERY.sh 'sk-your-key'"

# 3. Point DNS/references to new host
```

### Scenario 3: Multi-Machine Deployment

**Situation:** Need same setup on multiple machines.

**Deployment Steps:**

```bash
# 1. Create backup on first machine
bash /opt/local-agent/backup.sh

# 2. Push to your backup repo
git add copilot-backup && git push

# 3. Deploy to other machines
for host in host1 host2 host3; do
    ssh $host << 'CMD'
        git clone <your-backup-repo>
        bash copilot-backup/RECOVERY.sh "sk-key"
    CMD
done
```

## ๐Ÿ“ฆ What to Backup

**Essential (Must Have):**
- โœ… `bootstrap.sh` - Complete setup automation
- โœ… Custom configurations (if any)
- โœ… API key (stored separately)

**Important (Should Have):**
- โœ… Custom Ansible playbooks
- โœ… Documentation
- โœ… Systemd service configurations

**Optional:**
- Logs (can be regenerated)
- Downloaded models (can be re-pulled)

## ๐Ÿ” Secure API Key Storage

### Option 1: Encrypted File

```bash
# Encrypt
openssl enc -aes-256-cbc -salt -in deepseek-key.txt -out deepseek-key.enc

# Store encrypted version in repo
# Keep passphrase elsewhere

# Decrypt during recovery
openssl enc -d -aes-256-cbc -in deepseek-key.enc
```

### Option 2: 1Password / LastPass / Vault

```bash
# Retrieve before recovery
op read op://vault/deepseek-api-key

# Use in recovery
bash RECOVERY.sh "$(op read op://vault/deepseek-api-key)"
```

### Option 3: Environment Secret (GitHub/GitLab)

```bash
# Add to Actions secrets
DEEPSEEK_API_KEY=sk-xxx

# Retrieve in CI/CD
bash RECOVERY.sh "${{ secrets.DEEPSEEK_API_KEY }}"
```

## ๐Ÿš€ Automated Backup Strategy

### Setup Automated Backups

```bash
# 1. Create backup script
cat > /usr/local/bin/backup-copilot.sh << 'EOF'
#!/bin/bash
bash /opt/local-agent/backup.sh
# Then upload to git/S3
EOF

chmod +x /usr/local/bin/backup-copilot.sh

# 2. Schedule with cron (daily)
(crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/backup-copilot.sh") | crontab -

# 3. Or use systemd timer
cat > /etc/systemd/system/copilot-backup.service << EOF
[Unit]
Description=Copilot Backup
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-copilot.sh
User=root
EOF

cat > /etc/systemd/system/copilot-backup.timer << EOF
[Unit]
Description=Daily Copilot Backup
Requires=copilot-backup.service

[Timer]
OnCalendar=daily
OnBootSec=10m

[Install]
WantedBy=timers.target
EOF

systemctl daemon-reload
systemctl enable --now copilot-backup.timer
```

## โœ… Recovery Checklist

- [ ] Bootstrap script available
- [ ] Configuration files backed up
- [ ] API key stored securely (separately)
- [ ] Recovery script tested
- [ ] Backup stored in multiple locations
- [ ] Team knows where to find backup
- [ ] Tested recovery on staging system

## ๐Ÿงช Test Recovery Periodically

```bash
# Once a month, test:

# 1. On staging system
bash RECOVERY.sh "sk-test-key"

# 2. Verify all endpoints
curl http://localhost:8888/health
curl http://localhost:8888/services
curl -X POST http://localhost:8888/deepseek \
  -d '{"query":"Test"}'

# 3. Test Copilot CLI integration
copilot
/mcp
# Try commands

# 4. Clean up
rm -rf /opt/local-agent
systemctl stop local-agent-api ollama local-agent
```

## ๐Ÿ“ž Quick Reference

### Backup
```bash
# Create backup
bash /opt/local-agent/backup.sh

# Compress and upload
cd copilot-backup-*
tar -czf ../backup.tar.gz .
scp ../backup.tar.gz backup-server:/secure/storage/
```

### Recovery
```bash
# Simple recovery
bash RECOVERY.sh "sk-your-key"

# With custom configs
bash RECOVERY.sh "sk-your-key"
tar -xzf custom-configs.tar.gz -C /opt/local-agent/config/
systemctl restart local-agent-api
```

### Verify
```bash
# After recovery
curl http://localhost:8888/health
systemctl status local-agent-api ollama ssh
tail /opt/local-agent/logs/api.log
```

## ๐ŸŽฏ Best Practices

1. **Separate API Key from Backup**
   - Never commit API key to version control
   - Store in encrypted format or secrets manager
   - Regenerate key if backup is compromised

2. **Test Recovery Regularly**
   - Monthly test on staging system
   - Document any issues
   - Update procedures accordingly

3. **Multiple Backup Locations**
   - Local storage
   - Cloud storage (S3, Azure, etc)
   - Git repository (without secrets)
   - USB drive (for air-gapped recovery)

4. **Document Everything**
   - Keep runbook updated
   - Document any customizations
   - Record API key location (securely)

5. **Automate Backup Process**
   - Use cron or systemd timer
   - Verify backups regularly
   - Alert on backup failures

## ๐Ÿšจ Emergency Recovery

If everything is lost and you need to recover NOW:

```bash
# 1. Access any machine with internet
wget https://raw.githubusercontent.com/your-repo/master/bootstrap.sh
chmod +x bootstrap.sh

# 2. Run with API key from your password manager
./bootstrap.sh "sk-from-password-manager"

# 3. Restore custom configs if available
# Download from cloud storage
tar -xzf custom-configs.tar.gz -C /opt/local-agent/

# 4. Restart and verify
systemctl restart local-agent-api
curl http://localhost:8888/health
```

## ๐Ÿ“Š Recovery Time Estimates

| Scenario | Bootstrap | Tests | Custom Config | Total |
|----------|-----------|-------|---|-------|
| Clean Install | 2-3 min | 1 min | 1 min | **4-5 min** |
| From Backup | 2-3 min | 1 min | <1 min | **3-4 min** |
| Manual Steps | 15-20 min | 5 min | 10 min | **30-35 min** |

**Bootstrap is 6-10x faster than manual setup!**

## ๐Ÿ’ก Pro Tips

1. Keep bootstrap script in multiple places:
   - Git repository
   - S3 bucket
   - Gist
   - Portable drive

2. Test recovery with different API keys:
   - Main key
   - Backup key
   - Revoked key (should fail gracefully)

3. Document any customizations:
   - Custom Ansible playbooks
   - Modified configs
   - Non-standard ports

4. Have a rollback plan:
   - Previous version of bootstrap
   - Known-good configurations
   - Database snapshots (if applicable)

---

**Prepared:** 2026-03-25  
**Status:** Production Ready  
**Next Review:** Quarterly