Build a Log Pipeline on Ubuntu VPS: Loki + Promtail + Grafana

Metrics tell you that something went wrong. Logs tell you why. If you have followed our Prometheus and Grafana monitoring guide, you already have metrics covering CPU, memory, disk, and application performance. But when Prometheus fires an alert at 3 AM — CPU spike, increased error rate, disk I/O saturation — you need logs to diagnose the root cause. That is where Loki fits into the stack: a log aggregation system designed by the Grafana team to work seamlessly alongside Prometheus, using the same label-based approach and the same Grafana dashboards you already know.

Unlike the ELK stack (Elasticsearch, Logstash, Kibana), Loki does not index the full text of every log line. It indexes only the metadata labels (job, host, container name) and stores the raw log content compressed on disk. This architectural decision makes Loki dramatically more resource-efficient — ideal for VPS environments where RAM and CPU are shared resources. A complete observability stack (Prometheus + Loki + Promtail + Grafana) runs comfortably on hardware that would struggle with Elasticsearch alone.

MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10

Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything

The Three Pillars of Observability

Modern observability relies on three complementary data types:

Metrics (Prometheus) — Numerical measurements over time: CPU usage at 85%, request latency at 200ms, error rate at 2.3%. Metrics answer "what is happening" and "how much."
Logs (Loki) — Timestamped text records from applications and systems: error messages, access logs, audit trails. Logs answer "why did it happen" and provide the context behind metric anomalies.
Traces (Tempo, Jaeger) — Request paths through distributed systems: this HTTP request hit service A, then service B, then the database. Traces answer "where did time go" in multi-service architectures.

This guide adds the second pillar — logs — to your existing Prometheus and Grafana setup. With metrics and logs in the same Grafana instance, you can click from a metric alert directly into the relevant logs without switching tools.

Why Loki Over Elasticsearch

The ELK stack (Elasticsearch + Logstash + Kibana) is the incumbent solution for log aggregation. It is powerful but resource-hungry. Here is why Loki is the better choice for VPS environments:

Memory efficiency — Elasticsearch needs 4-8GB of heap memory minimum. Loki runs with 256-512MB for moderate log volumes.
No full-text indexing — Elasticsearch indexes every word in every log line, consuming CPU and storage. Loki indexes only labels, storing raw logs compressed.
Same query patterns — If you know PromQL (Prometheus), LogQL (Loki) feels familiar. Same label selectors, same concepts.
Native Grafana integration — Loki is a first-class data source in Grafana. No separate Kibana installation needed.
Simpler operations — No cluster management, no shard rebalancing, no index lifecycle policies. Single binary, single configuration file.

The trade-off: Loki's grep-style searches are slower than Elasticsearch's indexed searches for ad-hoc full-text queries. But for operational use — finding error logs within a time range, filtering by service name, correlating with metrics — Loki is fast enough and far more efficient.

Prerequisites

This guide assumes you have Prometheus and Grafana running via Docker Compose, as described in our monitoring guide. Loki is resource-efficient. Added to your existing Grafana stack, Promtail uses approximately 50MB RAM and Loki uses 256-512MB. A Cloud VPS with 4 vCPU and 8GB RAM runs the complete observability stack — Prometheus, Grafana, Loki, and Promtail — with room for your application workloads.

Verify your existing stack is running:

docker compose -f /opt/monitoring/docker-compose.yml ps

Docker Compose Setup

Add Loki and Promtail to your existing monitoring Docker Compose file. If you are starting fresh, create the full stack:

# /opt/monitoring/docker-compose.yml
# Add these services to your existing Prometheus/Grafana compose file

services:
  # ... existing prometheus and grafana services ...

  loki:
    image: grafana/loki:3.3.0
    container_name: loki
    restart: unless-stopped
    ports:
      - "127.0.0.1:3100:3100"
    volumes:
      - ./loki/config.yml:/etc/loki/config.yml:ro
      - loki-data:/loki
    command: -config.file=/etc/loki/config.yml
    networks:
      - monitoring

  promtail:
    image: grafana/promtail:3.3.0
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail/config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /run/docker.sock:/run/docker.sock:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    networks:
      - monitoring

volumes:
  loki-data:

networks:
  monitoring:
    driver: bridge

Create the configuration directories:

mkdir -p /opt/monitoring/loki /opt/monitoring/promtail

Promtail Configuration

Promtail is the log collection agent. It tails log files, labels them, and ships them to Loki. Configure it to collect system logs, Nginx access and error logs, Docker container logs, and application logs:

# /opt/monitoring/promtail/config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # ── System logs ──────────────────────────────────────────────
  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: myserver
          __path__: /var/log/syslog

  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          host: myserver
          __path__: /var/log/auth.log

  # ── Nginx logs ───────────────────────────────────────────────
  - job_name: nginx-access
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          type: access
          host: myserver
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>\S+) \[(?P<time_local>.+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      - labels:
          method:
          status:

  - job_name: nginx-error
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          type: error
          host: myserver
          __path__: /var/log/nginx/error.log

  # ── Docker container logs ────────────────────────────────────
  - job_name: docker
    docker_sd_configs:
      - host: unix:///run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: container
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: stream
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: compose_service

  # ── Application logs ─────────────────────────────────────────
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: myapp
          host: myserver
          __path__: /var/log/myapp/*.log
    pipeline_stages:
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2}'
          max_wait_time: 3s
      - regex:
          expression: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<level>\w+) (?P<message>.*)'
      - labels:
          level:

Key configuration details: the positions file tracks where Promtail left off in each log file, so it survives restarts without duplicating or missing entries. Pipeline stages parse log lines and extract labels for efficient querying in Loki.

Loki Configuration

Configure Loki for single-instance deployment with filesystem storage:

# /opt/monitoring/loki/config.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/tsdb-index
    cache_location: /loki/tsdb-cache

limits_config:
  retention_period: 30d
  max_query_series: 5000
  max_query_parallelism: 2
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: filesystem

The retention_period: 30d automatically deletes logs older than 30 days. Adjust based on your compliance requirements and available disk space. The compactor runs every 10 minutes to merge small index files and enforce retention.

Start the updated stack:

cd /opt/monitoring
docker compose up -d loki promtail

Verify Loki is accepting data:

# Check Loki readiness
curl -s http://127.0.0.1:3100/ready

# Check Promtail targets
curl -s http://127.0.0.1:9080/targets

Adding Loki as a Data Source in Grafana

Open Grafana (typically at https://grafana.yourdomain.com), navigate to Connections > Data Sources > Add data source, and select Loki. Configure the connection:

URL: http://loki:3100 (using the Docker network hostname)
Timeout: 60s (for large queries)
Leave authentication disabled (Loki is on the internal Docker network)

Click Save & Test. Grafana will verify the connection and confirm "Data source successfully connected." You can now query logs from the Explore view alongside your Prometheus metrics.

LogQL Basics

LogQL is Loki's query language. If you know PromQL, the syntax feels familiar. Queries start with a log stream selector (curly braces with labels) and optionally add filter expressions:

# ── Stream selectors ───────────────────────────────────────────
# All Nginx access logs
{job="nginx", type="access"}

# All logs from a specific Docker container
{container="myapp"}

# All syslog entries
{job="syslog"}

# ── Line filters ───────────────────────────────────────────────
# Nginx 500 errors
{job="nginx", type="access"} |= "500"

# Error-level application logs
{job="myapp"} |= "ERROR"

# Exclude health check noise
{job="nginx", type="access"} != "/health"

# Regex filter — 4xx and 5xx status codes
{job="nginx", type="access"} |~ "\" [45]\\d{2} "

# ── Parsing and filtering ─────────────────────────────────────
# Parse Nginx logs and filter by status
{job="nginx", type="access"} | pattern `<ip> - - [<timestamp>] "<method> <path> <_>" <status> <bytes>` | status >= 400

# ── Metric queries (aggregation) ──────────────────────────────
# Error rate per minute
rate({job="nginx", type="access"} |= "500" [1m])

# Log volume by container
sum by (container) (rate({job="docker"} [5m]))

# Count failed SSH attempts per hour
sum(count_over_time({job="auth"} |= "Failed password" [1h]))

The power of LogQL becomes apparent when combined with Prometheus metrics. A spike in HTTP error rate (Prometheus) leads you to filter Nginx error logs (Loki) for the same time window — all within the same Grafana dashboard.

Building Log Dashboards in Grafana

Create a new dashboard that combines Prometheus metrics and Loki logs. Here is a practical layout for a web server monitoring dashboard:

Row 1 — Metrics overview (Prometheus):

Request rate (requests/second)
Error rate (4xx + 5xx percentage)
Response latency (p50, p95, p99)

Row 2 — Log panels (Loki):

Recent error logs: {job="nginx", type="error"}
HTTP 5xx access logs: {job="nginx", type="access"} |~ "\" 5\\d{2} "
Application exceptions: {job="myapp"} |= "Exception"

Row 3 — Log volume graphs (Loki metric queries):

Error log volume over time: sum(rate({job="nginx", type="error"} [5m]))
Log volume by container: sum by (compose_service) (rate({job="docker"} [5m]))

The key technique is using Grafana's dashboard variables. Create a variable $timeRange that synchronizes across all panels. When you spot a metric anomaly, zoom in on the time range, and all log panels update to show only logs from that window.

Alert Rules on Log Patterns

Grafana alerting works with Loki queries just as it does with Prometheus. Create alerts on log patterns that indicate problems:

# Alert: More than 10 HTTP 500 errors in 5 minutes
# LogQL expression for Grafana alert rule:
sum(count_over_time({job="nginx", type="access"} |= " 500 " [5m])) > 10

# Alert: Failed SSH login attempts (brute force detection)
sum(count_over_time({job="auth"} |= "Failed password" [15m])) > 20

# Alert: Application out-of-memory errors
count_over_time({job="myapp"} |= "OutOfMemoryError" [10m]) > 0

# Alert: Disk space warnings in syslog
count_over_time({job="syslog"} |= "No space left on device" [5m]) > 0

In Grafana, navigate to Alerting > Alert Rules > New Alert Rule. Select Loki as the data source, enter the LogQL expression, set the threshold condition, and configure notification channels. This gives you proactive alerting on log patterns without manually watching dashboards.

Log Retention and Storage Management

Logs consume disk space proportionally to your traffic and verbosity. A moderately busy web server generates 1-5GB of raw logs per day. Loki compresses logs efficiently (typically 10:1), but storage still grows over time.

Monitor Loki's disk usage:

# Check Loki storage size
du -sh /var/lib/docker/volumes/*loki*

# Check overall disk usage
df -h /

Strategies for managing log storage:

Tune retention — Set retention_period in Loki's config based on your needs: 7d for development, 30d for production, 90d for compliance.
Filter verbose logs in Promtail — Drop health check logs and other noise before they reach Loki:

# Add to Promtail pipeline_stages for nginx-access:
pipeline_stages:
  - drop:
      expression: '.*"GET /health HTTP.*"'
      drop_counter_reason: health_check
  - drop:
      expression: '.*"GET /favicon.ico HTTP.*"'
      drop_counter_reason: favicon

Scale storage independently — With MassiveGRID's independent resource scaling, add disk space without changing CPU or RAM allocation.

For comprehensive disk management strategies, see our disk space management guide.

Practical Workflow: From Alert to Root Cause

Here is how the complete observability stack works together in a real incident:

Prometheus alert fires — "HTTP error rate above 5% for 5 minutes."
Open Grafana dashboard — Metrics panel shows error rate spike starting at 14:32. CPU is normal. Memory is normal. Disk I/O has a spike.
Check Loki logs — Filter Nginx error logs for the 14:30-14:40 window:
```
{job="nginx", type="error"} |= "upstream" | line_format "{{.timestamp}} {{.message}}"
```
Result: "upstream timed out (110: Connection timed out) while reading response header from upstream."
Drill deeper — Check application logs for the same window:
```
{job="myapp"} |= "timeout" or |= "slow query"
```
Result: "Slow query detected: SELECT * FROM orders WHERE... took 12.4s"
Root cause identified — A missing database index caused full table scans, which saturated disk I/O, which caused application timeouts, which caused Nginx upstream timeouts, which caused HTTP 500 errors.

Without Loki, step 3 requires SSH-ing into the server and manually grepping log files. With Loki, you stay in Grafana and correlate metrics with logs in seconds. LogQL queries that scan log data on disk can be I/O intensive. Dedicated VPS resources ensure your log queries complete quickly without competing with your application workloads.

Connecting Loki Alerts to ntfy

If you have set up ntfy for push notifications (see our ntfy self-hosting guide), connect Grafana alerts to ntfy for instant mobile notifications on log anomalies.

In Grafana, go to Alerting > Contact Points > New Contact Point and add a webhook:

# Webhook URL for ntfy
https://ntfy.yourdomain.com/alerts

# Or use the Grafana alerting webhook with a script
# /opt/monitoring/scripts/ntfy-notify.sh
#!/bin/bash
TITLE="$1"
MESSAGE="$2"
curl -s \
    -H "Title: $TITLE" \
    -H "Priority: high" \
    -H "Tags: warning" \
    -d "$MESSAGE" \
    https://ntfy.yourdomain.com/server-alerts

Now log-based alerts — error spikes, failed login attempts, application exceptions — trigger push notifications to your phone. Combined with Prometheus metric alerts, you get comprehensive coverage.

Prefer Managed Observability?

A complete observability stack gives you visibility into every layer of your infrastructure. But visibility is only valuable if someone reads the dashboards and acts on the alerts. MassiveGRID's fully managed hosting includes 24/7 monitoring by a human team that investigates alerts, diagnoses issues, and takes action — even at 3 AM. You get the dashboards and the peace of mind that someone is watching them when you are not.