Blue-Green and Rolling Deployments on Ubuntu VPS: Zero-Downtime Updates

Every application eventually needs an update — a bug fix, a new feature, a security patch. The simplest approach is to stop the old version and start the new one. But that "simplest approach" means downtime: users see error pages, API consumers get connection resets, search engines note your unavailability, and revenue stops flowing. For a personal project, 30 seconds of downtime during a deploy might not matter. For a production application serving customers, even brief downtime erodes trust and costs money.

Zero-downtime deployment isn't a luxury reserved for companies running Kubernetes clusters. With the right patterns, you can deploy updates to applications running on a single Ubuntu VPS without any user ever seeing an error. This guide covers six practical deployment patterns — from simple Nginx upstream switching to Docker Swarm rolling updates — so you can choose the approach that fits your stack and infrastructure.

MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10

Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything

Why Deployment Strategy Matters

The naive deployment process — git pull && systemctl restart myapp — creates a gap between when the old process dies and the new one is ready to serve requests. That gap might be 2 seconds or 30 seconds depending on your application's startup time, but during that window every request fails.

The consequences compound beyond the immediate downtime:

User experience — users mid-session lose their work, see error pages, or get mysterious timeouts
API reliability — webhook deliveries fail, integration partners see errors, retry storms follow
SEO impact — search engine crawlers that hit 5xx errors during deployment may temporarily deindex pages
Monitoring noise — every deployment triggers alerts, making it harder to distinguish real incidents from deploy artifacts
Deploy anxiety — teams avoid deploying frequently because each deploy carries risk, leading to larger, riskier batches

A proper deployment strategy eliminates these problems. You deploy confidently, multiple times per day if needed, because you know users won't notice.

Deployment Strategies Compared

Four common deployment strategies, ordered from simplest to most sophisticated:

Stop-start — stop old version, start new version. Simple but causes downtime proportional to startup time. Only acceptable for non-production or internal tools.
Rolling — gradually replace instances of the old version with new ones. If you have 4 workers, restart them one at a time. Brief capacity reduction but no complete outage.
Blue-green — run two identical environments ("blue" and "green"). Deploy to the inactive one, verify it works, then switch traffic. Instant rollback by switching back. Requires double the resources during transition.
Canary — route a small percentage of traffic (5-10%) to the new version. Monitor for errors. Gradually increase traffic to the new version. Most sophisticated, requires traffic splitting capability.

For single-VPS deployments, blue-green and rolling are the most practical. Canary deployments typically require a load balancer with percentage-based routing, which is more common in multi-server setups.

Pattern 1: Nginx Upstream Blue-Green

The most straightforward zero-downtime pattern for any application behind Nginx. Run two instances of your application on different ports, and switch Nginx's upstream between them.

Create an upstream configuration that Nginx includes:

# /etc/nginx/conf.d/app-upstream.conf
# Currently pointing to "blue" (port 3001)
upstream app_backend {
    server 127.0.0.1:3001;
}

Your main Nginx site configuration proxies to this upstream:

server {
    listen 443 ssl http2;
    server_name app.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/app.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/app.yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://app_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

The deployment script swaps which port the upstream points to:

#!/bin/bash
# deploy-blue-green.sh

CURRENT_PORT=$(grep -oP 'server 127.0.0.1:\K[0-9]+' /etc/nginx/conf.d/app-upstream.conf)

if [ "$CURRENT_PORT" = "3001" ]; then
    NEW_PORT=3002
    NEW_COLOR="green"
else
    NEW_PORT=3001
    NEW_COLOR="blue"
fi

echo "Current: port $CURRENT_PORT → Deploying to: $NEW_COLOR (port $NEW_PORT)"

# Deploy new version to inactive port
cd /opt/myapp-${NEW_COLOR}
git pull origin main
npm install --production
PORT=$NEW_PORT npm start &

# Wait for new version to be healthy
echo "Waiting for new version on port $NEW_PORT..."
for i in $(seq 1 30); do
    if curl -sf http://127.0.0.1:${NEW_PORT}/health > /dev/null 2>&1; then
        echo "New version is healthy!"
        break
    fi
    if [ $i -eq 30 ]; then
        echo "ERROR: New version failed health check. Aborting."
        kill $(lsof -ti :$NEW_PORT) 2>/dev/null
        exit 1
    fi
    sleep 1
done

# Switch Nginx upstream
sed -i "s/server 127.0.0.1:${CURRENT_PORT}/server 127.0.0.1:${NEW_PORT}/" /etc/nginx/conf.d/app-upstream.conf
nginx -t && nginx -s reload

echo "Traffic switched to $NEW_COLOR (port $NEW_PORT)"

# Gracefully stop old version after a brief drain period
sleep 5
kill $(lsof -ti :$CURRENT_PORT) 2>/dev/null
echo "Old version stopped. Deployment complete."

Nginx's reload is graceful — existing connections complete on the old upstream while new connections go to the new one. The switch is effectively instant from the user's perspective.

Pattern 2: PM2 Rolling Restart for Node.js

If you're running a Node.js application with PM2 (as covered in our Node.js deployment guide), PM2 has built-in zero-downtime restart capabilities.

PM2's cluster mode runs multiple instances of your application. A rolling restart replaces them one at a time:

# Start your app in cluster mode (uses all available CPUs)
pm2 start app.js -i max --name myapp

# Deploy new code
cd /opt/myapp
git pull origin main
npm install --production

# Zero-downtime restart — replaces instances one by one
pm2 reload myapp

# Check status
pm2 status

The reload command (not restart) is the key difference. pm2 restart kills all processes and starts new ones (downtime). pm2 reload starts a new process, waits for it to signal readiness, then kills the old one — repeating for each cluster worker.

For applications that need initialization time (database connections, cache warming), configure the ready signal:

// In your app.js — signal PM2 when ready
const app = require('./app');
const server = app.listen(process.env.PORT, () => {
    // Tell PM2 this instance is ready to receive traffic
    if (process.send) {
        process.send('ready');
    }
});

// Handle graceful shutdown
process.on('SIGINT', () => {
    server.close(() => {
        process.exit(0);
    });
});

# ecosystem.config.js
module.exports = {
    apps: [{
        name: 'myapp',
        script: 'app.js',
        instances: 'max',
        wait_ready: true,        // Wait for 'ready' signal
        listen_timeout: 10000,   // Max wait time (ms)
        kill_timeout: 5000,      // Grace period for old instance
    }]
};

Pattern 3: Gunicorn Graceful Reload for Python

Python applications running behind Gunicorn (as covered in our Python deployment guide) support graceful reloading natively.

# Deploy new code
cd /opt/myapp
git pull origin main
pip install -r requirements.txt

# Graceful reload — Gunicorn spawns new workers, old ones finish current requests
kill -HUP $(cat /var/run/gunicorn.pid)

# Or if using systemd
systemctl reload myapp

When Gunicorn receives SIGHUP, it spawns new worker processes that load the updated code. Old workers finish processing their current requests and then exit. At no point are there zero available workers.

Configure your systemd service to support reload:

[Unit]
Description=Gunicorn application server
After=network.target

[Service]
User=www-data
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/venv/bin/gunicorn --workers 4 --bind 127.0.0.1:8000 --pid /var/run/gunicorn.pid wsgi:app
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/gunicorn.pid
Restart=on-failure

[Install]
WantedBy=multi-user.target

Pattern 4: Docker Compose Blue-Green

For Docker-based applications, the blue-green pattern uses two container definitions behind Nginx. This is one of the cleanest approaches because containers are inherently isolated.

# docker-compose.yml
services:
  app-blue:
    image: myapp:current
    container_name: app-blue
    ports:
      - "127.0.0.1:3001:3000"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  app-green:
    image: myapp:current
    container_name: app-green
    ports:
      - "127.0.0.1:3002:3000"
    restart: unless-stopped
    profiles:
      - green
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d
      - /etc/letsencrypt:/etc/letsencrypt:ro
    depends_on:
      - app-blue
    restart: unless-stopped

The deployment script builds the new image, starts the inactive container, verifies health, and switches:

#!/bin/bash
# docker-deploy.sh

# Build new image
docker build -t myapp:next .

# Determine which environment is active
if docker ps --format '{{.Names}}' | grep -q 'app-green'; then
    OLD="green"
    NEW="blue"
    NEW_PORT=3001
else
    OLD="blue"
    NEW="green"
    NEW_PORT=3002
fi

echo "Deploying to $NEW environment..."

# Tag image and start new container
docker tag myapp:next myapp:current
docker compose --profile $NEW up -d app-$NEW

# Wait for health
for i in $(seq 1 30); do
    STATUS=$(docker inspect --format='{{.State.Health.Status}}' app-$NEW 2>/dev/null)
    if [ "$STATUS" = "healthy" ]; then
        echo "Container app-$NEW is healthy"
        break
    fi
    [ $i -eq 30 ] && echo "Health check failed!" && docker compose stop app-$NEW && exit 1
    sleep 2
done

# Switch Nginx
sed -i "s/127.0.0.1:[0-9]*/127.0.0.1:$NEW_PORT/" ./nginx/conf.d/upstream.conf
docker compose exec nginx nginx -s reload

# Stop old container
sleep 5
docker compose stop app-$OLD

echo "Deployment complete: $NEW is live"

Pattern 5: Docker Swarm Rolling Update

If you've set up Docker Swarm (see our Docker Swarm guide), rolling updates are a first-class feature. Swarm handles the entire process — starting new containers, health checking them, draining old ones — automatically.

# Deploy or update a service with rolling update configuration
docker service update \
    --image myapp:v2.1 \
    --update-parallelism 1 \
    --update-delay 10s \
    --update-failure-action rollback \
    --update-order start-first \
    myapp

The key flags:

--update-parallelism 1 — update one container at a time
--update-delay 10s — wait 10 seconds between updating each container
--update-failure-action rollback — if the new version fails health checks, automatically roll back
--update-order start-first — start the new container before stopping the old one (ensures capacity)

Pattern 6: HAProxy Connection Draining

HAProxy (covered in our HAProxy guide) provides the most sophisticated connection management with its drain mode. Unlike Nginx's reload, HAProxy can explicitly drain connections from a specific backend server before removing it.

# Via HAProxy's runtime API (stats socket)
# Put the old backend in drain mode — stop sending new connections but let existing ones finish
echo "set server app_backend/app-old state drain" | socat stdio /var/run/haproxy.sock

# Wait for connections to drain
while [ $(echo "show stat" | socat stdio /var/run/haproxy.sock | grep "app-old" | cut -d, -f5) -gt 0 ]; do
    echo "Waiting for connections to drain..."
    sleep 2
done

# Now safely stop the old instance
echo "set server app_backend/app-old state maint" | socat stdio /var/run/haproxy.sock

This is particularly valuable for long-lived connections — WebSocket connections, file uploads, or long-polling endpoints that might take minutes to complete naturally.

Health Check Implementation

Every deployment pattern depends on health checks to verify the new version is working before sending it traffic. A health check endpoint should verify that the application can actually serve requests, not just that the process is running.

# Express.js health endpoint example
app.get('/health', async (req, res) => {
    try {
        // Check database connectivity
        await db.query('SELECT 1');
        // Check Redis connectivity
        await redis.ping();
        res.status(200).json({
            status: 'healthy',
            version: process.env.APP_VERSION || 'unknown',
            uptime: process.uptime()
        });
    } catch (error) {
        res.status(503).json({
            status: 'unhealthy',
            error: error.message
        });
    }
});

# Flask health endpoint example
@app.route('/health')
def health():
    try:
        db.session.execute(text('SELECT 1'))
        return jsonify({'status': 'healthy'}), 200
    except Exception as e:
        return jsonify({'status': 'unhealthy', 'error': str(e)}), 503

A good health check should be fast (under 1 second), check real dependencies (database, cache, external services), and return appropriate HTTP status codes (200 for healthy, 503 for unhealthy).

Automated Rollback on Failed Health Checks

The deployment script should automatically roll back if the new version fails health checks. Here's a generic rollback wrapper:

#!/bin/bash
# deploy-with-rollback.sh

MAX_HEALTH_ATTEMPTS=30
HEALTH_URL="http://127.0.0.1:${NEW_PORT}/health"

deploy_new_version() {
    # Your deployment logic here
    echo "Deploying..."
}

check_health() {
    for i in $(seq 1 $MAX_HEALTH_ATTEMPTS); do
        HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$HEALTH_URL")
        if [ "$HTTP_STATUS" = "200" ]; then
            return 0
        fi
        echo "Health check attempt $i/$MAX_HEALTH_ATTEMPTS: HTTP $HTTP_STATUS"
        sleep 2
    done
    return 1
}

rollback() {
    echo "ROLLING BACK — new version failed health checks"
    # Stop new version, restore old upstream config
    # Send alert via ntfy or similar
    curl -d "Deployment rollback triggered on $(hostname)" https://ntfy.yourdomain.com/deploys
    exit 1
}

deploy_new_version
if check_health; then
    switch_traffic
    echo "Deployment successful!"
else
    rollback
fi

Database Migrations During Deployment

Database schema changes are the hardest part of zero-downtime deployment. The old version and new version must both be able to work with the database during the transition period.

The golden rule: make migrations backward-compatible. Split breaking changes into multiple deployments:

Adding a column — safe. Deploy the migration first, then deploy the code that uses it.
Removing a column — deploy code that stops using the column first, then remove the column in a later migration.
Renaming a column — add new column, deploy code that writes to both, backfill data, deploy code that reads from new column, then drop old column.
Changing a column type — similar to renaming: add new column with new type, dual-write, backfill, switch reads, drop old.

Run migrations before deploying the new application code:

#!/bin/bash
# Migration runs against the current database
# Old version continues serving traffic during migration
python manage.py migrate --no-input

# Now deploy the new code that uses the migration
./deploy-blue-green.sh

Scripting Your Deployment Pipeline

Combine deployment patterns with CI/CD for fully automated zero-downtime deploys. Using a GitHub Actions self-hosted runner on your VPS (see our self-hosted runner guide), deployments trigger automatically on push to main:

# .github/workflows/deploy.yml
name: Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4

      - name: Run tests
        run: npm test

      - name: Build
        run: npm run build

      - name: Deploy with blue-green
        run: ./scripts/deploy-blue-green.sh

      - name: Verify deployment
        run: |
          sleep 5
          HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://app.yourdomain.com/health)
          if [ "$HTTP_STATUS" != "200" ]; then
            echo "Post-deploy health check failed!"
            ./scripts/rollback.sh
            exit 1
          fi

Temporary Resource Boost for Deployments

Zero-downtime deployment requires running two application versions simultaneously during the transition. On a MassiveGRID Cloud VPS, this temporarily doubles your application's resource usage. Independent scaling lets you add RAM before a deployment window, then scale back after the old version shuts down.

For applications with heavy startup costs (JVM warmup, cache pre-loading, ML model loading), the new instance might consume significant CPU during initialization. On a Cloud VDS with dedicated resources, health checks run at consistent speed — your deployment automation doesn't time out because of neighbor activity on the same host.

Prefer Managed Deployments?

Deployment automation, health checking, rollback procedures, and production monitoring during deploys are ongoing operational tasks that require maintenance and refinement. With Managed Cloud Dedicated Servers, our engineering team handles deployment pipeline setup, health check configuration, rollback automation, and monitoring — so your updates reach production safely without the operational overhead of maintaining deployment infrastructure yourself.