Newer
Older
copilot / DISASTER_RECOVERY.md

๐Ÿ†˜ Disaster Recovery Guide

Overview

Complete guide for recovering your Copilot CLI+ environment after losing the primary host.

๐Ÿ”„ Recovery Scenarios

Scenario 1: Complete Host Loss

Situation: Primary host is down, you need to restore on a new system.

Recovery Steps:

# 1. Get the backup (from secure storage, git, S3, etc)
git clone <your-backup-repo>
cd copilot-backup

# 2. Get API key from secure storage
# (1Password, Vault, encrypted file, etc)
export DEEPSEEK_API_KEY="sk-your-key"

# 3. Run recovery on new system
bash RECOVERY.sh "$DEEPSEEK_API_KEY"

# 4. Verify
curl http://localhost:8888/health

Scenario 2: Quick Migration

Situation: Moving to a different server/cloud provider.

Recovery Steps:

# 1. On old system, create backup
bash /opt/local-agent/backup.sh
# Follow prompts to upload

# 2. On new system
ssh root@newhost "bash backup/RECOVERY.sh 'sk-your-key'"

# 3. Point DNS/references to new host

Scenario 3: Multi-Machine Deployment

Situation: Need same setup on multiple machines.

Deployment Steps:

# 1. Create backup on first machine
bash /opt/local-agent/backup.sh

# 2. Push to your backup repo
git add copilot-backup && git push

# 3. Deploy to other machines
for host in host1 host2 host3; do
    ssh $host << 'CMD'
        git clone <your-backup-repo>
        bash copilot-backup/RECOVERY.sh "sk-key"
    CMD
done

๐Ÿ“ฆ What to Backup

Essential (Must Have):

  • โœ… bootstrap.sh - Complete setup automation
  • โœ… Custom configurations (if any)
  • โœ… API key (stored separately)

Important (Should Have):

  • โœ… Custom Ansible playbooks
  • โœ… Documentation
  • โœ… Systemd service configurations

Optional:

  • Logs (can be regenerated)
  • Downloaded models (can be re-pulled)

๐Ÿ” Secure API Key Storage

Option 1: Encrypted File

# Encrypt
openssl enc -aes-256-cbc -salt -in deepseek-key.txt -out deepseek-key.enc

# Store encrypted version in repo
# Keep passphrase elsewhere

# Decrypt during recovery
openssl enc -d -aes-256-cbc -in deepseek-key.enc

Option 2: 1Password / LastPass / Vault

# Retrieve before recovery
op read op://vault/deepseek-api-key

# Use in recovery
bash RECOVERY.sh "$(op read op://vault/deepseek-api-key)"

Option 3: Environment Secret (GitHub/GitLab)

# Add to Actions secrets
DEEPSEEK_API_KEY=sk-xxx

# Retrieve in CI/CD
bash RECOVERY.sh "${{ secrets.DEEPSEEK_API_KEY }}"

๐Ÿš€ Automated Backup Strategy

Setup Automated Backups

# 1. Create backup script
cat > /usr/local/bin/backup-copilot.sh << 'EOF'
#!/bin/bash
bash /opt/local-agent/backup.sh
# Then upload to git/S3
EOF

chmod +x /usr/local/bin/backup-copilot.sh

# 2. Schedule with cron (daily)
(crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/backup-copilot.sh") | crontab -

# 3. Or use systemd timer
cat > /etc/systemd/system/copilot-backup.service << EOF
[Unit]
Description=Copilot Backup
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-copilot.sh
User=root
EOF

cat > /etc/systemd/system/copilot-backup.timer << EOF
[Unit]
Description=Daily Copilot Backup
Requires=copilot-backup.service

[Timer]
OnCalendar=daily
OnBootSec=10m

[Install]
WantedBy=timers.target
EOF

systemctl daemon-reload
systemctl enable --now copilot-backup.timer

โœ… Recovery Checklist

  • Bootstrap script available
  • Configuration files backed up
  • API key stored securely (separately)
  • Recovery script tested
  • Backup stored in multiple locations
  • Team knows where to find backup
  • Tested recovery on staging system

๐Ÿงช Test Recovery Periodically

# Once a month, test:

# 1. On staging system
bash RECOVERY.sh "sk-test-key"

# 2. Verify all endpoints
curl http://localhost:8888/health
curl http://localhost:8888/services
curl -X POST http://localhost:8888/deepseek \
  -d '{"query":"Test"}'

# 3. Test Copilot CLI integration
copilot
/mcp
# Try commands

# 4. Clean up
rm -rf /opt/local-agent
systemctl stop local-agent-api ollama local-agent

๐Ÿ“ž Quick Reference

Backup

# Create backup
bash /opt/local-agent/backup.sh

# Compress and upload
cd copilot-backup-*
tar -czf ../backup.tar.gz .
scp ../backup.tar.gz backup-server:/secure/storage/

Recovery

# Simple recovery
bash RECOVERY.sh "sk-your-key"

# With custom configs
bash RECOVERY.sh "sk-your-key"
tar -xzf custom-configs.tar.gz -C /opt/local-agent/config/
systemctl restart local-agent-api

Verify

# After recovery
curl http://localhost:8888/health
systemctl status local-agent-api ollama ssh
tail /opt/local-agent/logs/api.log

๐ŸŽฏ Best Practices

  1. Separate API Key from Backup

    • Never commit API key to version control
    • Store in encrypted format or secrets manager
    • Regenerate key if backup is compromised
  2. Test Recovery Regularly

    • Monthly test on staging system
    • Document any issues
    • Update procedures accordingly
  3. Multiple Backup Locations

    • Local storage
    • Cloud storage (S3, Azure, etc)
    • Git repository (without secrets)
    • USB drive (for air-gapped recovery)
  4. Document Everything

    • Keep runbook updated
    • Document any customizations
    • Record API key location (securely)
  5. Automate Backup Process

    • Use cron or systemd timer
    • Verify backups regularly
    • Alert on backup failures

๐Ÿšจ Emergency Recovery

If everything is lost and you need to recover NOW:

# 1. Access any machine with internet
wget https://raw.githubusercontent.com/your-repo/master/bootstrap.sh
chmod +x bootstrap.sh

# 2. Run with API key from your password manager
./bootstrap.sh "sk-from-password-manager"

# 3. Restore custom configs if available
# Download from cloud storage
tar -xzf custom-configs.tar.gz -C /opt/local-agent/

# 4. Restart and verify
systemctl restart local-agent-api
curl http://localhost:8888/health

๐Ÿ“Š Recovery Time Estimates

Scenario Bootstrap Tests Custom Config Total
Clean Install 2-3 min 1 min 1 min 4-5 min
From Backup 2-3 min 1 min <1 min 3-4 min
Manual Steps 15-20 min 5 min 10 min 30-35 min

Bootstrap is 6-10x faster than manual setup!

๐Ÿ’ก Pro Tips

  1. Keep bootstrap script in multiple places:

    • Git repository
    • S3 bucket
    • Gist
    • Portable drive
  2. Test recovery with different API keys:

    • Main key
    • Backup key
    • Revoked key (should fail gracefully)
  3. Document any customizations:

    • Custom Ansible playbooks
    • Modified configs
    • Non-standard ports
  4. Have a rollback plan:

    • Previous version of bootstrap
    • Known-good configurations
    • Database snapshots (if applicable)

Prepared: 2026-03-25
Status: Production Ready
Next Review: Quarterly