Disaster Recovery for Ubuntu VPS: Plan, Test, and Recover

Disaster recovery is not about preventing disasters — it is about surviving them. Hardware fails, humans make mistakes, attackers find vulnerabilities, and software has bugs. The question is not whether something will go wrong, but whether you can recover when it does. A good disaster recovery plan is a document you hope you never need, tested regularly so that when you do need it, it works.

This guide covers what can go wrong, what your hosting provider already handles, what you are responsible for, how to build a recovery plan, and step-by-step recovery procedures for the three most common disaster scenarios.

MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10

Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything

What "Disaster" Means for a VPS

Not all disasters are equal. Classify them by likelihood and impact to prioritize your preparation:

Disaster Type	Likelihood	Impact	Who Handles It
Hardware failure (CPU, memory, motherboard)	Low	Total outage	Hosting provider (HA failover)
Disk failure	Low	Data loss	Hosting provider (Ceph replication)
DDoS attack	Medium	Service unavailability	Hosting provider (DDoS protection)
Network outage (datacenter)	Low	Total outage	Hosting provider
Accidental data deletion (`DROP TABLE`)	High	Data loss	You
Security breach / server compromise	Medium	Data loss + exposure	You
Bad deployment / application corruption	High	Service degradation	You
Configuration mistake (firewall, permissions)	High	Lockout or outage	You
Ransomware / data encryption	Low-Medium	Total data loss	You

Notice the pattern: the disasters most likely to happen are the ones you are responsible for handling.

What MassiveGRID Already Handles

On a MassiveGRID Cloud VPS, three of the five most common infrastructure disasters are already handled — before you do anything:

Hardware Failure: Proxmox HA Cluster

Your VPS runs on a Proxmox high-availability cluster. If the physical server hosting your VPS fails, the cluster automatically migrates your VM to a healthy node. This happens automatically, typically within seconds, with no data loss.

Disk Failure: Ceph 3x Replication

Your VPS storage uses Ceph with 3x replication across NVMe drives. Every block of data is written to three independent disks on different physical hosts. A single disk failure (or even an entire storage node failure) causes zero data loss — Ceph continues serving data from the remaining replicas while rebuilding the lost copy.

DDoS Attacks: 12 Tbps Protection

Volumetric DDoS attacks are absorbed by MassiveGRID's 12 Tbps DDoS mitigation infrastructure. Your VPS continues operating normally while attack traffic is filtered upstream.

Key insight: Infrastructure-level disasters are handled by your hosting provider's architecture. Application-level disasters are your responsibility. Everything in this guide focuses on the disasters that you must plan for.

What You Must Handle

The disasters your hosting provider cannot protect you from:

Accidental data deletion: rm -rf /var/www, DROP TABLE users;, deleted Docker volumes
Security breaches: Compromised SSH keys, exploited application vulnerabilities, stolen database credentials
Application corruption: Bad migrations, corrupted configuration, broken deployments
Human error: Wrong server, wrong database, wrong command
Ransomware: Encrypted files with demands for payment

The common thread: all of these require backups, documentation, and tested recovery procedures.

The Disaster Recovery Plan

A disaster recovery plan is a document, not software. It should be stored outside your VPS (in a Git repository, a shared document, or a password manager note) and be accessible even when your server is completely unavailable.

DR Plan Template

====================================
DISASTER RECOVERY PLAN
Application: [Your App Name]
Last Updated: [Date]
Last Tested: [Date]
====================================

## 1. CONTACTS
- Primary admin: [Name] [Phone] [Email]
- Secondary admin: [Name] [Phone] [Email]
- Hosting support: MassiveGRID 24/7 support
- Domain registrar: [Name] [Login URL]

## 2. INFRASTRUCTURE INVENTORY
- VPS Provider: MassiveGRID
- Server IP: [IP Address]
- Datacenter: [Location]
- VPS Specs: [vCPU/RAM/Storage]
- OS: Ubuntu 24.04 LTS
- Domain: [yourdomain.com]
- DNS Provider: [Provider]
- SSL: Let's Encrypt (auto-renew)

## 3. APPLICATION STACK
- Web server: Nginx 1.x
- Runtime: Node.js 20.x / Python 3.12
- Database: PostgreSQL 16
- Cache: Redis 7
- Process manager: PM2 / systemd
- Containerized: Yes/No (Docker Compose)

## 4. BACKUP LOCATIONS
- Database backups: [Location, retention period]
- File backups: [Location, retention period]
- Configuration backups: [Git repo URL]
- Backup encryption key: [Stored in password manager]

## 5. RECOVERY OBJECTIVES
- RTO (Recovery Time Objective): [Target]
- RPO (Recovery Point Objective): [Target]

## 6. RECOVERY PROCEDURES
- [See sections below]

## 7. TEST SCHEDULE
- Full recovery test: Quarterly
- Backup verification: Monthly
- Last test results: [Date] [Pass/Fail]

RTO and RPO Explained

Two numbers define your disaster recovery requirements:

RTO (Recovery Time Objective): How long can your application be down before it causes unacceptable damage?

RPO (Recovery Point Objective): How much data can you afford to lose? This determines your backup frequency.

Application Type	Typical RTO	Typical RPO	Backup Strategy
Personal blog	24 hours	7 days	Weekly backups
Company website	4 hours	24 hours	Daily backups
SaaS application	1 hour	1 hour	Hourly database backups + WAL archiving
E-commerce store	30 minutes	15 minutes	Continuous replication + frequent snapshots
Financial application	15 minutes	0 (zero data loss)	Synchronous replication + WAL shipping

Be honest about your RTO/RPO. Saying "zero downtime, zero data loss" when you have daily backups and no replication is a fantasy, not a plan.

Backup Verification: Can You Actually Restore?

A backup you have never restored from is not a backup — it is a hope. Follow our automatic backups guide to set up backups, then verify them regularly.

#!/bin/bash
# verify-backup.sh — Monthly backup verification script

set -euo pipefail

BACKUP_DIR="/backups"
TEST_DIR="/tmp/backup-test"
LOG_FILE="/var/log/backup-verification.log"

echo "=== Backup Verification $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" | tee -a "$LOG_FILE"

# Find the most recent backup
LATEST_BACKUP=$(ls -t "$BACKUP_DIR"/db-backup-*.sql.gz 2>/dev/null | head -1)

if [ -z "$LATEST_BACKUP" ]; then
  echo "FAIL: No backup files found" | tee -a "$LOG_FILE"
  exit 1
fi

echo "Testing backup: $LATEST_BACKUP" | tee -a "$LOG_FILE"

# Check backup file integrity
if ! gzip -t "$LATEST_BACKUP" 2>/dev/null; then
  echo "FAIL: Backup file is corrupted (gzip integrity check failed)" | tee -a "$LOG_FILE"
  exit 1
fi

echo "PASS: File integrity check" | tee -a "$LOG_FILE"

# Check backup age
BACKUP_AGE_HOURS=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))
MAX_AGE_HOURS=25  # Should be less than 25 hours for daily backups

if [ "$BACKUP_AGE_HOURS" -gt "$MAX_AGE_HOURS" ]; then
  echo "FAIL: Backup is ${BACKUP_AGE_HOURS} hours old (max: ${MAX_AGE_HOURS})" | tee -a "$LOG_FILE"
  exit 1
fi

echo "PASS: Backup age check (${BACKUP_AGE_HOURS}h old)" | tee -a "$LOG_FILE"

# Test restore to a temporary database
TEST_DB="backup_test_$(date +%s)"
sudo -u postgres createdb "$TEST_DB"

if gunzip -c "$LATEST_BACKUP" | sudo -u postgres psql "$TEST_DB" > /dev/null 2>&1; then
  echo "PASS: Database restore successful" | tee -a "$LOG_FILE"

  # Verify data integrity
  TABLE_COUNT=$(sudo -u postgres psql -t -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public'" "$TEST_DB" | tr -d ' ')
  echo "PASS: Restored database has $TABLE_COUNT tables" | tee -a "$LOG_FILE"

  ROW_COUNT=$(sudo -u postgres psql -t -c "SELECT sum(n_tup_ins) FROM pg_stat_user_tables" "$TEST_DB" | tr -d ' ')
  echo "INFO: Approximate row count: $ROW_COUNT" | tee -a "$LOG_FILE"
else
  echo "FAIL: Database restore failed" | tee -a "$LOG_FILE"
fi

# Cleanup
sudo -u postgres dropdb "$TEST_DB"
echo "=== Verification complete ===" | tee -a "$LOG_FILE"

# Schedule monthly verification
sudo crontab -e
# Add:
0 3 1 * * /home/admin/scripts/verify-backup.sh 2>&1 | mail -s "Backup Verification Report" admin@yourdomain.com

Recovery Scenario #1: "I Accidentally Deleted My Database"

This is the most common disaster. Someone runs DROP TABLE on the wrong database, a migration script deletes data instead of transforming it, or a bulk delete operation has a missing WHERE clause.

Immediate Response (First 5 Minutes)

# 1. STOP THE APPLICATION immediately
# Prevent new writes from making recovery harder
sudo systemctl stop myapp
# or
pm2 stop all
# or
docker compose stop app

# 2. DO NOT restart PostgreSQL
# The WAL (Write-Ahead Log) may still contain the deleted data

# 3. Assess the damage
sudo -u postgres psql -d myapp -c "\dt"         # List remaining tables
sudo -u postgres psql -d myapp -c "SELECT count(*) FROM users;"  # Check specific tables

Recovery from Backup

# 4. Identify the most recent backup
ls -la /backups/db-backup-*.sql.gz

# 5. Create a recovery database (don't overwrite the damaged one yet)
sudo -u postgres createdb myapp_recovery

# 6. Restore the backup
gunzip -c /backups/db-backup-2026-02-28-0300.sql.gz | sudo -u postgres psql myapp_recovery

# 7. Verify the restored data
sudo -u postgres psql myapp_recovery -c "SELECT count(*) FROM users;"
sudo -u postgres psql myapp_recovery -c "SELECT max(created_at) FROM users;"
# ^ This tells you the RPO — how much data you'll lose

# 8. If the restoration looks good, swap databases
sudo -u postgres psql -c "ALTER DATABASE myapp RENAME TO myapp_damaged;"
sudo -u postgres psql -c "ALTER DATABASE myapp_recovery RENAME TO myapp;"

# 9. Restart the application
sudo systemctl start myapp
# or
pm2 start all

# 10. Verify the application works
curl -s https://yourdomain.com/api/health/ready | jq .

Point-in-Time Recovery (If Using WAL Archiving)

If you have PostgreSQL WAL archiving configured, you can recover to a specific point in time — just before the accidental deletion:

# recovery.conf / postgresql.conf settings for PITR
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
recovery_target_time = '2026-02-28 14:55:00 UTC'  # Just before the DELETE
recovery_target_action = 'promote'

# Step-by-step PITR
# 1. Stop PostgreSQL
sudo systemctl stop postgresql

# 2. Back up the current (damaged) data directory
sudo cp -r /var/lib/postgresql/16/main /var/lib/postgresql/16/main.damaged

# 3. Restore base backup
sudo rm -rf /var/lib/postgresql/16/main
sudo tar xzf /backups/base-backup-latest.tar.gz -C /var/lib/postgresql/16/

# 4. Create recovery signal file
sudo touch /var/lib/postgresql/16/main/recovery.signal

# 5. Configure recovery target in postgresql.conf
echo "restore_command = 'cp /backups/wal_archive/%f %p'" | sudo tee -a /var/lib/postgresql/16/main/postgresql.conf
echo "recovery_target_time = '2026-02-28 14:55:00 UTC'" | sudo tee -a /var/lib/postgresql/16/main/postgresql.conf

# 6. Start PostgreSQL (it will replay WAL up to the target time)
sudo chown -R postgres:postgres /var/lib/postgresql/16/main
sudo systemctl start postgresql

# 7. Check logs for recovery progress
sudo tail -f /var/log/postgresql/postgresql-16-main.log

Recovery Scenario #2: "My Server Was Compromised"

You discover unauthorized access — unusual processes, modified files, unfamiliar SSH keys, or alerts from your monitoring system. This is a security incident requiring a structured response.

Phase 1: Containment (First 15 Minutes)

# 1. DO NOT shut down the server yet — preserve evidence

# 2. Record what's running RIGHT NOW
ps auxf > /tmp/forensics-processes.txt
ss -tlnp > /tmp/forensics-listening-ports.txt
last -50 > /tmp/forensics-login-history.txt
cat /etc/passwd > /tmp/forensics-passwd.txt
crontab -l > /tmp/forensics-crontab.txt 2>/dev/null
sudo cat /var/log/auth.log > /tmp/forensics-auth.txt

# 3. Check for unauthorized SSH keys
find / -name "authorized_keys" -exec echo "=== {} ===" \; -exec cat {} \; > /tmp/forensics-ssh-keys.txt

# 4. Check for recently modified files
find / -mtime -1 -type f -not -path "/proc/*" -not -path "/sys/*" 2>/dev/null > /tmp/forensics-recent-files.txt

# 5. Copy forensics files OFF the server
scp /tmp/forensics-*.txt admin@safe-machine:/incident-response/

Phase 2: Isolation

# 6. Block all incoming connections except your IP
# (From the hosting provider's console, not from the compromised server)
# Or use iptables if you must do it from the server:
sudo iptables -I INPUT -s YOUR_IP -j ACCEPT
sudo iptables -I INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -j DROP

# 7. Change all passwords and revoke all API keys
# Do this from a DIFFERENT machine:
# - Hosting provider password
# - Database passwords
# - API keys for external services
# - SSH keys (generate new ones)

# 8. Revoke compromised SSH keys
# On every server that trusted the compromised key:
# Remove the key from ~/.ssh/authorized_keys

Phase 3: Rebuild from Clean State

Never trust a compromised server. Rebuild from scratch.

# 9. Deploy a NEW VPS
# From the MassiveGRID control panel, create a fresh Ubuntu 24.04 VPS

# 10. Set up the new server using your configuration management
# If you followed our Ansible guide:
ansible-playbook -i inventory/production site.yml

# 11. Restore data from the MOST RECENT backup that predates the compromise
# Identify when the breach occurred from forensics logs
# Restore the backup from BEFORE that timestamp

# 12. Restore the application
cd /home/deploy/app
git clone https://github.com/yourorg/yourapp.git .
npm ci --production
pm2 start ecosystem.config.js

# 13. Update DNS to point to the new server
# Lower TTL first, then update the A record

# 14. Verify everything works on the new server before decommissioning the old one

Phase 4: Post-Incident

# 15. Analyze how the breach occurred
# Review forensics files from Phase 1
# Common entry points:
# - Weak SSH passwords (use key-only auth)
# - Unpatched application vulnerabilities
# - Exposed database ports
# - Stolen credentials from another breach
# - Insecure application code (SQL injection, RCE)

# 16. Harden the new server
# Follow the security hardening guide for EVERY item

For detailed hardening procedures, see our Ubuntu VPS security hardening guide.

Recovery Scenario #3: "My Application Is Corrupted After a Bad Deploy"

A deployment goes wrong: a database migration is destructive and irreversible, a configuration change breaks the application, or a code bug corrupts user data.

Immediate Rollback

# Option A: Git-based rollback (code issues)
cd /home/deploy/app

# See what changed
git log --oneline -5

# Roll back to the previous commit
git checkout HEAD~1

# Reinstall dependencies if they changed
npm ci --production

# Restart
pm2 reload all

# Verify
curl -s https://yourdomain.com/api/health/ready

# Option B: Docker-based rollback
# If using tagged Docker images:

# See running image
docker ps --format "{{.Image}}"
# myapp:v2.3.1

# Roll back to previous version
cd /home/deploy/app
# Edit docker-compose.yml to use previous image tag
sed -i 's/myapp:v2.3.1/myapp:v2.3.0/' docker-compose.yml
docker compose up -d

# Verify
docker compose logs --tail 50 app

Database Migration Rollback

# If the migration has a down/rollback function:
npm run db:migrate:undo
# or
python manage.py migrate previous_migration_name

# If the migration is irreversible (dropped column, deleted data):
# You MUST restore from backup

# 1. Stop the app
pm2 stop all

# 2. Restore database
sudo -u postgres dropdb myapp
sudo -u postgres createdb myapp
gunzip -c /backups/db-backup-pre-deploy.sql.gz | sudo -u postgres psql myapp

# 3. Roll back the code
cd /home/deploy/app
git checkout HEAD~1
npm ci --production

# 4. Restart
pm2 start all

Pre-Deploy Backup Script

Always create a backup immediately before deploying. Automate it:

#!/bin/bash
# pre-deploy-backup.sh — Run before every deployment

set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/pre-deploy"
mkdir -p "$BACKUP_DIR"

echo "Creating pre-deploy backup: $TIMESTAMP"

# Database backup
sudo -u postgres pg_dump -F c myapp > "$BACKUP_DIR/db-${TIMESTAMP}.dump"

# Application files backup
tar czf "$BACKUP_DIR/app-${TIMESTAMP}.tar.gz" -C /home/deploy app/

# Keep only last 10 pre-deploy backups
ls -t "$BACKUP_DIR"/db-*.dump | tail -n +11 | xargs rm -f 2>/dev/null
ls -t "$BACKUP_DIR"/app-*.tar.gz | tail -n +11 | xargs rm -f 2>/dev/null

echo "Pre-deploy backup complete: $BACKUP_DIR"
echo "  Database: db-${TIMESTAMP}.dump ($(du -sh "$BACKUP_DIR/db-${TIMESTAMP}.dump" | cut -f1))"
echo "  Application: app-${TIMESTAMP}.tar.gz ($(du -sh "$BACKUP_DIR/app-${TIMESTAMP}.tar.gz" | cut -f1))"

Integrate it into your deployment script:

#!/bin/bash
# deploy.sh

# Step 1: Pre-deploy backup
./pre-deploy-backup.sh || { echo "Backup failed, aborting deploy"; exit 1; }

# Step 2: Deploy
git pull origin main
npm ci --production
npm run db:migrate
pm2 reload all

# Step 3: Verify
sleep 5
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourdomain.com/api/health/ready)
if [ "$STATUS" != "200" ]; then
  echo "Deploy verification FAILED. Run rollback manually."
  echo "Backup location: /backups/pre-deploy/"
  exit 1
fi

echo "Deployment successful"

Testing Your Recovery Plan

An untested recovery plan is just a document. Schedule quarterly recovery drills on a test VPS.

Test recovery on a temporary Cloud VDS — dedicated resources give accurate recovery time estimates without affecting production.

# Quarterly DR test procedure

# 1. Spin up a test VPS (MassiveGRID, same specs as production)

# 2. Time yourself: can you rebuild from scratch?
START_TIME=$(date +%s)

# 3. Follow your recovery procedures EXACTLY as documented
# Don't improvise — the point is to test the documentation

# 4. Restore from the most recent backup
# Time the database restore specifically

# 5. Verify the application works
# Run automated tests or manual verification

# 6. Record the results
END_TIME=$(date +%s)
RECOVERY_TIME=$(( (END_TIME - START_TIME) / 60 ))
echo "Recovery completed in ${RECOVERY_TIME} minutes"
echo "RTO target: 60 minutes"
echo "Result: $([ $RECOVERY_TIME -le 60 ] && echo 'PASS' || echo 'FAIL')"

# 7. Document what went wrong and update the plan

# 8. Destroy the test VPS

Common discoveries during DR tests:

The backup is there but the restore command is wrong
A dependency was installed manually and is not in the configuration scripts
Database connection strings are hardcoded and point to the old server
SSL certificates cannot be re-issued quickly because DNS propagation takes time
The recovery procedure references a tool version that no longer exists

Each of these discoveries should result in an update to your recovery plan.

Documenting Your Infrastructure

If you were hit by a bus tomorrow, could someone else rebuild your server? Infrastructure documentation should answer: what is installed, how it is configured, and where the data is stored.

# Generate an infrastructure inventory automatically

#!/bin/bash
# infrastructure-audit.sh — Run monthly, store output in your DR plan

echo "=== INFRASTRUCTURE AUDIT $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="

echo ""
echo "## System"
echo "Hostname: $(hostname)"
echo "OS: $(lsb_release -ds)"
echo "Kernel: $(uname -r)"
echo "CPU: $(nproc) cores"
echo "RAM: $(free -h | awk '/Mem:/ {print $2}')"
echo "Disk: $(df -h / | awk 'NR==2 {print $2 " total, " $3 " used"}')"

echo ""
echo "## Installed Services"
systemctl list-units --type=service --state=running --no-pager | grep -v systemd

echo ""
echo "## Listening Ports"
sudo ss -tlnp | grep LISTEN

echo ""
echo "## Docker Containers"
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null || echo "Docker not installed"

echo ""
echo "## Nginx Sites"
ls /etc/nginx/sites-enabled/ 2>/dev/null || echo "No Nginx sites"

echo ""
echo "## Cron Jobs"
crontab -l 2>/dev/null || echo "No cron jobs for $(whoami)"
sudo crontab -l 2>/dev/null || echo "No cron jobs for root"

echo ""
echo "## SSL Certificates"
sudo certbot certificates 2>/dev/null || echo "Certbot not installed"

echo ""
echo "## Firewall Rules"
sudo ufw status verbose 2>/dev/null || echo "UFW not active"

echo ""
echo "## Backup Configuration"
ls -la /backups/ 2>/dev/null || echo "No /backups directory"
echo "Latest backup:"
ls -lt /backups/*.gz 2>/dev/null | head -3 || echo "No backup files found"

echo ""
echo "## Package Versions"
nginx -v 2>&1
node --version 2>/dev/null || echo "Node.js not installed"
python3 --version 2>/dev/null || echo "Python not installed"
psql --version 2>/dev/null || echo "PostgreSQL not installed"
docker --version 2>/dev/null || echo "Docker not installed"

Ansible as a Disaster Recovery Tool

The best disaster recovery tool is infrastructure as code. If your entire server configuration is defined in Ansible playbooks, rebuilding from scratch is a single command.

# Your Ansible repository IS your disaster recovery plan

# Directory structure
ansible/
├── inventory/
│   ├── production
│   └── staging
├── playbooks/
│   ├── site.yml           # Full server setup
│   ├── app-deploy.yml     # Application deployment
│   └── db-restore.yml     # Database restoration
├── roles/
│   ├── base/              # SSH hardening, UFW, fail2ban
│   ├── nginx/             # Nginx + SSL
│   ├── postgresql/        # PostgreSQL setup
│   ├── app/               # Application deployment
│   └── monitoring/        # Uptime Kuma, logging
└── group_vars/
    └── all.yml            # Encrypted variables (ansible-vault)

# Full server rebuild: one command
ansible-playbook -i inventory/production playbooks/site.yml

# Then restore the database from backup
ansible-playbook -i inventory/production playbooks/db-restore.yml \
  -e "backup_file=/backups/db-backup-latest.sql.gz"

# Example db-restore.yml playbook
---
- hosts: database
  become: yes
  vars:
    backup_file: "{{ backup_file }}"
    db_name: myapp
    db_user: appuser

  tasks:
    - name: Copy backup file to server
      copy:
        src: "{{ backup_file }}"
        dest: /tmp/restore.sql.gz

    - name: Stop application
      systemd:
        name: myapp
        state: stopped
      delegate_to: "{{ groups['webservers'][0] }}"

    - name: Drop and recreate database
      become_user: postgres
      shell: |
        dropdb --if-exists {{ db_name }}
        createdb -O {{ db_user }} {{ db_name }}

    - name: Restore from backup
      become_user: postgres
      shell: gunzip -c /tmp/restore.sql.gz | psql {{ db_name }}

    - name: Start application
      systemd:
        name: myapp
        state: started
      delegate_to: "{{ groups['webservers'][0] }}"

    - name: Verify application health
      uri:
        url: https://yourdomain.com/api/health/ready
        status_code: 200
      delegate_to: "{{ groups['webservers'][0] }}"
      retries: 10
      delay: 5

With Ansible, your disaster recovery procedure becomes: provision a new VPS, point Ansible at it, and run the playbook. Everything — from SSH hardening to SSL certificates to application deployment — is automated and reproducible.

Disaster Recovery Offsite Backup Strategy

Your backups must exist outside the server they protect. If the server is compromised or destroyed, local backups are useless.

# Offsite backup script — send backups to a remote location

#!/bin/bash
# offsite-backup.sh

set -euo pipefail

BACKUP_DIR="/backups"
REMOTE_DEST="backup-server:/offsite-backups/$(hostname)/"
RETENTION_DAYS=30

# Sync recent backups to remote server
rsync -avz --progress \
  "$BACKUP_DIR/" \
  "$REMOTE_DEST"

# Clean up old backups on remote (keep 30 days)
ssh backup-server "find /offsite-backups/$(hostname)/ -type f -mtime +${RETENTION_DAYS} -delete"

# Alternatively, upload to object storage (S3-compatible)
# aws s3 sync "$BACKUP_DIR/" "s3://your-backup-bucket/$(hostname)/" \
#   --storage-class STANDARD_IA \
#   --exclude "*.tmp"

echo "Offsite backup sync complete"

# Schedule daily offsite sync
sudo crontab -e
# Add:
30 4 * * * /home/admin/scripts/offsite-backup.sh >> /var/log/offsite-backup.log 2>&1

Follow the 3-2-1 backup rule:

3 copies of your data
2 different storage media or locations
1 copy offsite (different datacenter, different provider)

Disaster Recovery Is Included with Managed Hosting

If building, testing, and maintaining a disaster recovery plan is more operational overhead than you want to handle, MassiveGRID Managed Dedicated Cloud Servers include:

Automated daily backups with verified restoration
24/7 server monitoring and incident response
Security hardening and patch management
DDoS protection and firewall management
Full infrastructure documentation

You focus on your application. MassiveGRID handles the infrastructure, the backups, the security, and the disaster recovery.

Summary: The Disaster Recovery Checklist

Use this checklist to verify your disaster readiness:

DISASTER RECOVERY READINESS CHECKLIST
======================================

[ ] Automated daily database backups running
[ ] Backup files stored OFFSITE (not just on the VPS)
[ ] Backup restoration tested within the last 90 days
[ ] Recovery time measured and within RTO target
[ ] Data loss window acceptable for RPO target
[ ] Pre-deploy backups automated
[ ] Application rollback procedure documented and tested
[ ] Infrastructure documented (installed services, versions, config)
[ ] Configuration managed in code (Ansible, scripts, or Git)
[ ] Security hardening applied (SSH, firewall, updates)
[ ] Monitoring running on a SEPARATE server
[ ] Contact information documented and accessible
[ ] DR plan stored OUTSIDE the VPS it protects
[ ] DR plan tested quarterly
[ ] All team members know where to find the DR plan

Disaster recovery is the difference between a bad day and a catastrophe. The infrastructure handled by your hosting provider (hardware, disk, DDoS) is the foundation. Everything you build on top — backups, documentation, tested recovery procedures, infrastructure as code — determines whether a disaster means 15 minutes of recovery or 15 days of rebuilding from memory.