n8n has quietly become one of the most capable AI agent platforms available. With over 70 AI and LangChain nodes, native support for OpenAI, Anthropic, Google Gemini, and open-source models through Ollama, n8n lets you build multi-step AI agent workflows that rival custom Python scripts — without writing code for orchestration, error handling, or API retry logic. The catch is that AI workloads are fundamentally different from standard automation, and the VPS that comfortably runs 50 webhook-triggered workflows will buckle under a single complex agent chain.

This guide covers the actual server requirements for running n8n AI agent workflows in production: what resources each component consumes, how to size your VPS for API-based versus local LLM deployments, and where infrastructure choices directly impact whether your agent chains complete or timeout mid-execution.

If you are new to self-hosting n8n, start with our complete Docker setup guide first. This article assumes you already have a working n8n instance and want to add AI capabilities or scale existing AI workflows.

n8n as an AI Agent Platform

The AI landscape in n8n has matured significantly. What started as a simple OpenAI node for text completion has grown into a full agent orchestration framework. Here is what n8n offers as of early 2026:

  • LangChain integration: Native nodes for chains, agents, memory, output parsers, and tools. You can build ReAct agents, conversational agents with memory, and multi-tool agent chains entirely within n8n's visual editor.
  • LLM provider nodes: Direct integration with OpenAI (GPT-4o, o1, o3-mini), Anthropic (Claude 3.5 Sonnet, Claude 3 Opus), Google Gemini, Mistral, Cohere, and Hugging Face Inference API. Each provider node handles authentication, rate limiting, and response parsing.
  • Local LLM support via Ollama: Run open-source models (Llama 3, Mistral, Phi-3, Gemma) directly on your server. n8n connects to Ollama's local API, keeping all inference data on your infrastructure.
  • Vector database nodes: Qdrant, Pinecone, Supabase Vector, PostgreSQL with pgvector, and Weaviate. These enable retrieval-augmented generation (RAG) pipelines where your agent searches through your own documents before generating responses.
  • MCP (Model Context Protocol) integration: Connect to MCP servers that expose tools, resources, and prompts to your AI agents, extending their capabilities to interact with external systems through a standardized protocol.
  • Document loaders and text splitters: Process PDFs, CSVs, HTML, and other document formats, chunk them appropriately, and embed them into vector stores for RAG workflows.

This makes n8n substantially more capable for AI workflows than Zapier or Make. Zapier's AI features are limited to single-step AI actions within Zaps. Make has basic AI modules but lacks the LangChain agent framework, vector database integrations, and local LLM support that n8n provides. For teams building complex AI agent workflows, n8n is the only visual automation platform that competes with custom code.

VPS Requirements for AI Workloads

AI workflows stress your server differently than standard automations. A typical webhook-to-CRM workflow executes in under a second and consumes minimal resources. An AI agent workflow can run for 30–120 seconds, make multiple LLM API calls, process large text payloads, and store or retrieve vector embeddings — all while holding state in memory. Understanding where the resource pressure comes from is essential for right-sizing your VPS.

CPU

For API-based AI workflows (OpenAI, Anthropic, etc.), CPU is not the primary bottleneck. Your server sends a request, waits for the LLM provider to process it, and receives a response. The actual computation happens on OpenAI's or Anthropic's infrastructure. However, CPU becomes critical in three scenarios:

  • Document processing: Parsing PDFs, splitting text into chunks, and computing embeddings locally (using models like all-MiniLM-L6-v2) is CPU-intensive. A 50-page PDF can take 10–30 seconds to process on a 2 vCPU server.
  • Concurrent agent chains: Each agent chain occupies a Node.js execution thread. With queue mode, multiple workers compete for CPU time. Four simultaneous agent chains on 2 vCPU will context-switch heavily, increasing latency for all of them.
  • Local LLM inference: Running Ollama with Llama 3 8B on CPU (no GPU) is possible but slow. Expect 5–15 tokens per second on an 8 vCPU server, compared to 60+ tokens per second with a GPU. Usable for batch processing, painful for interactive workflows.

RAM

RAM is where AI workflows diverge most from standard automation. Here is a realistic breakdown:

Component RAM Usage Notes
n8n (main process) 400–600 MB Base process with loaded workflows
PostgreSQL 500 MB – 1 GB Execution history + workflow storage
Each n8n worker (queue mode) 200–500 MB RAM spikes during large payload processing
Qdrant vector DB 256 MB – 2 GB Depends on collection size and index type
Redis (queue mode) 128–256 MB Job queue and execution state
Ollama (Llama 3 8B) 6–8 GB Model loaded entirely in RAM without GPU
Docker + OS overhead 500 MB – 1 GB Base system requirements

The key insight: API-based AI workflows (calling OpenAI or Anthropic) add almost no RAM overhead beyond what n8n already uses. The LLM processing happens on the provider's servers. Your server just handles the request/response cycle and any intermediate data processing. But the moment you add a vector database or local LLM, RAM requirements jump substantially.

Storage

AI workflows generate more storage pressure than you might expect:

  • Execution history: AI agent executions store the full input and output of every LLM call. A single agent chain with 5 LLM calls can produce 50–200 KB of execution data. At 1,000 agent executions per day, that is 50–200 MB daily before pruning.
  • Vector embeddings: A Qdrant collection storing 100,000 document chunks with 384-dimension embeddings occupies roughly 200–400 MB on disk. Larger embedding models (1536 dimensions from OpenAI) consume 3–4x more.
  • Local LLM models: Ollama stores models on disk. Llama 3 8B in Q4 quantization requires approximately 4.7 GB. Mixtral 8x7B needs roughly 26 GB. If you experiment with multiple models, storage fills quickly.

NVMe SSD storage matters here for the same reason it matters for PostgreSQL generally — write IOPS. Execution history writes, vector database updates, and model loading all benefit from low-latency storage. On a traditional SATA SSD, loading a 4.7 GB model from disk takes noticeably longer than on NVMe.

Network

API-based AI workflows are network-intensive. Each LLM call sends a prompt (often 1–10 KB) and receives a response (1–50 KB, or more for long generations). An agent chain making 5 sequential calls to GPT-4o with moderate context windows generates 50–200 KB of network traffic per execution. At scale, the total bandwidth is manageable, but latency is the real concern.

A round trip to OpenAI's API from a US East data center takes 30–80ms. From Singapore, it takes 200–350ms. For a 5-call agent chain, that latency difference compounds: 150–400ms total from New York versus 1–1.75 seconds from Singapore. If your agent chains interact primarily with US-based LLM APIs, hosting your n8n instance in New York or a similar US East location reduces cumulative latency meaningfully.

Info

MassiveGRID operates data centers in New York, London, Frankfurt, and Singapore. For API-heavy AI workflows, choose the location closest to your primary LLM provider. OpenAI and Anthropic both serve from US infrastructure, making New York the lowest-latency option for most AI workloads.

Sizing Tiers: API-Only, Hybrid, and Local LLM

Not all AI workflows have the same infrastructure requirements. The right VPS configuration depends on where your LLM inference happens and what supporting services you run alongside n8n.

Tier 1: API-Only AI (OpenAI, Anthropic, Gemini)

This is the most common setup. Your n8n instance sends prompts to external LLM providers and processes the responses. All heavy computation happens on the provider's infrastructure. Your server handles orchestration, data processing, and storage.

Resource Recommended Why
vCPU 2 cores Orchestration + data processing
RAM 4 GB n8n + PostgreSQL + headroom for payloads
SSD 64 GB OS + Docker + execution history
Monthly cost $9.58/mo

This configuration handles up to 20–30 active AI workflows making API calls to external providers. The bottleneck is n8n's execution concurrency, not your server's hardware. If you hit limits, enable queue mode with Redis workers before upgrading hardware.

Tier 2: Hybrid (API + Vector Database + Processing)

Once you add RAG (retrieval-augmented generation) to your workflows, you need a vector database running alongside n8n. This tier supports workflows that search your own documents, knowledge bases, or product catalogs before generating LLM responses.

Resource Recommended Why
vCPU 4 cores n8n + vector DB queries + document processing
RAM 8 GB n8n + PostgreSQL + Qdrant + Redis + workers
SSD 128 GB Vector storage + expanded execution history
Monthly cost $19.16/mo

At this tier, you can run n8n with queue mode (1 main + 2 workers), Qdrant or ChromaDB for vector search, Redis for the job queue, and PostgreSQL — all on the same server. This handles 50–100 active workflows including complex RAG pipelines that search through tens of thousands of document chunks.

Run n8n AI workflows on reliable infrastructure

High-availability VPS with Proxmox failover, Ceph storage, and 24/7 human support.

Recommended for AI: 4 vCPU / 8 GB RAM / 128 GB SSD — $19.16/mo

Configure Your VPS →

Tier 3: Local LLM Hosting (Ollama + n8n)

Running an LLM locally eliminates API costs and keeps all data on your infrastructure — no prompts sent to external servers. The tradeoff is significantly higher resource requirements and slower inference speeds compared to cloud APIs.

Resource Recommended Why
vCPU 8+ cores LLM inference is CPU-bound without GPU
RAM 16+ GB Model loaded in RAM (8B model = 6–8 GB) + n8n stack
SSD 256 GB Model files + vector DB + execution data
Monthly cost $38.32/mo

Honest assessment: running an 8B-parameter model on CPU produces 5–15 tokens per second. That means a 500-word response takes 15–40 seconds to generate. For batch processing workflows (summarizing documents overnight, classifying support tickets), this is perfectly usable. For interactive or time-sensitive workflows, API-based inference with GPT-4o or Claude will deliver faster results at a lower infrastructure cost.

If you need fast local inference, consider MassiveGRID's GPU dedicated servers or Dedicated VPS with physically dedicated CPU cores ($5.74/core) for better single-threaded performance.

Tip

Start with Tier 1 (API-only). Most teams overestimate their need for local LLMs. OpenAI's API costs for a typical n8n AI workflow run $5–30/month — far less than the infrastructure premium for local hosting. Move to Tier 2 when you need RAG, and Tier 3 only when data sovereignty or API cost elimination is a hard requirement.

Docker Compose for an AI-Ready Stack

The following Docker Compose configuration extends the base n8n setup with Qdrant for vector storage and Redis for queue mode. This is the Tier 2 (Hybrid) configuration — the sweet spot for most teams running AI agent workflows.

version: "3.8"

services:
  postgres:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_USER: n8n
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: n8n
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U n8n"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - n8n-network

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    networks:
      - n8n-network

  qdrant:
    image: qdrant/qdrant:latest
    restart: unless-stopped
    volumes:
      - qdrant_data:/qdrant/storage
    environment:
      QDRANT__SERVICE__GRPC_PORT: 6334
    networks:
      - n8n-network

  n8n:
    image: n8nio/n8n:latest
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_PORT: 5432
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
      N8N_HOST: ${N8N_HOST}
      N8N_PORT: 5678
      N8N_PROTOCOL: https
      WEBHOOK_URL: https://${N8N_HOST}/
      N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
      GENERIC_TIMEZONE: ${GENERIC_TIMEZONE:-UTC}
      EXECUTIONS_MODE: queue
      QUEUE_BULL_REDIS_HOST: redis
      QUEUE_BULL_REDIS_PORT: 6379
      EXECUTIONS_DATA_PRUNE: "true"
      EXECUTIONS_DATA_MAX_AGE: 168
      # Increase payload size for AI responses
      N8N_PAYLOAD_SIZE_MAX: 64
    volumes:
      - n8n_data:/home/node/.n8n
    networks:
      - n8n-network

  n8n-worker:
    image: n8nio/n8n:latest
    restart: unless-stopped
    depends_on:
      - n8n
    command: worker
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_PORT: 5432
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
      N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
      EXECUTIONS_MODE: queue
      QUEUE_BULL_REDIS_HOST: redis
      QUEUE_BULL_REDIS_PORT: 6379
      GENERIC_TIMEZONE: ${GENERIC_TIMEZONE:-UTC}
      N8N_PAYLOAD_SIZE_MAX: 64
    volumes:
      - n8n_data:/home/node/.n8n
    networks:
      - n8n-network

  caddy:
    image: caddy:2-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy_data:/data
      - caddy_config:/config
    depends_on:
      - n8n
    networks:
      - n8n-network

volumes:
  postgres_data:
  redis_data:
  qdrant_data:
  n8n_data:
  caddy_data:
  caddy_config:

networks:
  n8n-network:
    driver: bridge

Key Configuration Details

  • N8N_PAYLOAD_SIZE_MAX: 64 — Increases the maximum payload size to 64 MB. AI responses, especially from models returning structured JSON or processing long documents, can exceed n8n's default 16 MB limit. Without this, large agent responses silently fail.
  • Qdrant runs on port 6333 (REST) and 6334 (gRPC) inside the Docker network. In your n8n Qdrant node credentials, set the host to qdrant and port to 6333. No API key is needed for local access within the Docker network.
  • Redis is configured with a 256 MB memory cap and LRU eviction. This prevents Redis from consuming unbounded memory if the job queue backs up during heavy AI workloads.
  • Queue mode separates the n8n UI/webhook handler from the execution workers. The main n8n service handles the editor and triggers, while n8n-worker processes the actual workflow executions. Scale workers with docker compose up -d --scale n8n-worker=3 as needed.
Warning

The N8N_ENCRYPTION_KEY must be identical across the main n8n process and all workers. If a worker uses a different key, it cannot decrypt credentials and every AI node that requires API keys will fail silently. Copy the same .env file for all services.

Adding Ollama for Local LLMs

If you want to run local models alongside the API-based setup, add this service to the Compose file:

  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - n8n-network
    # For CPU-only inference (no GPU passthrough):
    deploy:
      resources:
        limits:
          memory: 10G

After starting the container, pull a model:

# Pull Llama 3 8B (4.7 GB download)
docker exec -it n8n-docker-ollama-1 ollama pull llama3

# Verify it's available
docker exec -it n8n-docker-ollama-1 ollama list

In n8n, add Ollama credentials with the base URL http://ollama:11434. The Ollama node will then list all locally available models.

Why Dedicated Resources Matter for AI

AI agent workflows have a property that standard automations do not: they are chain-dependent. A typical ReAct agent works like this:

  1. Receive a task (e.g., "Research competitor pricing and summarize findings").
  2. Call the LLM to decide which tool to use first (web search, database query, API call).
  3. Execute the chosen tool and collect results.
  4. Send results back to the LLM for analysis.
  5. LLM decides the next action or generates a final response.
  6. Repeat steps 2–5 until the task is complete (often 3–8 iterations).

Each step depends on the previous step completing successfully. If step 3 takes 10 seconds because a noisy neighbor on shared infrastructure is consuming CPU during your vector database query, the entire chain extends. If the total execution exceeds n8n's timeout (default 300 seconds), the agent chain fails entirely — not with a graceful error, but with lost intermediate state and potentially orphaned API calls.

On shared (oversold) VPS infrastructure, this manifests as intermittent failures. The same agent workflow completes in 45 seconds on Monday morning and times out at 300+ seconds on Tuesday afternoon when other tenants run CPU-intensive workloads. Debugging this is frustrating because the problem is not in your workflow — it is in your infrastructure.

Dedicated resources eliminate this variable. On MassiveGRID's Cloud VPS, your vCPU cores and RAM are allocated exclusively to your instance. Vector database queries take the same time on Tuesday afternoon as Monday morning. Agent chains complete with consistent timing. For Tier 3 workloads (local LLM inference), consider the Dedicated VPS with physically isolated CPU cores for maximum single-threaded performance.

GDPR and AI Data Sovereignty

Self-hosting n8n on an EU VPS (Frankfurt, for example) gives you sovereignty over three critical AI components:

  1. Workflow logic: Your agent definitions, prompt templates, tool configurations, and decision flows stay on EU infrastructure. A competitor analyzing your API traffic cannot reverse-engineer your automation strategy.
  2. Credentials: API keys for OpenAI, Anthropic, CRM systems, and databases are encrypted at rest with your N8N_ENCRYPTION_KEY and never leave the server.
  3. Execution history: Every input, output, and intermediate result from your AI agent chains is stored in PostgreSQL on your VPS. This data often contains customer information, business logic, and proprietary analysis.

However, self-hosting does not solve data sovereignty for the LLM inference itself. When your n8n workflow calls OpenAI's API, the prompt — including any customer data embedded in it — is sent to OpenAI's US-based infrastructure. The same applies to Anthropic, Google, and most commercial LLM providers.

If full data sovereignty is a hard requirement (GDPR Article 44 compliance, DORA, or industry-specific regulations), you have two options:

  • Local LLM inference with Ollama running on your EU VPS. All prompts and responses stay on your server. The tradeoff is slower inference and models that are less capable than GPT-4o or Claude.
  • EU-hosted LLM providers like Mistral AI (headquartered in Paris) or providers offering EU-only inference endpoints. n8n supports custom API endpoints for most LLM nodes.

For a comprehensive guide to GDPR-compliant self-hosting, see our GDPR-compliant n8n hosting guide.

Next Steps

The right VPS tier for n8n AI agents depends entirely on where your inference happens. For the majority of teams using OpenAI or Anthropic APIs, a 2 vCPU / 4 GB server at $9.58/mo handles AI workflows comfortably. Once you add vector databases and RAG pipelines, step up to 4 vCPU / 8 GB at $19.16/mo. Reserve the 8+ vCPU / 16+ GB tier for local LLM hosting or heavy concurrent workloads.

Getting started — If you do not have a self-hosted n8n instance yet, follow our complete Docker setup guide to get a production-ready deployment in 30 minutes. The base configuration in that guide uses the Tier 1 specs; you can upgrade to the AI-ready Docker Compose above when you are ready.

Right-sizing your VPS — For a broader look at n8n resource requirements beyond AI workloads, our best VPS for n8n guide covers sizing for standard automation, webhook-heavy deployments, and agency use cases.

Scaling with queue mode — AI workflows benefit disproportionately from queue mode because agent chains are long-running. Our queue mode guide walks through the architecture, worker scaling, and when to add more workers versus more CPU.

When MassiveGRID is not needed: If you are running a handful of simple AI workflows as a side project and cost is the primary concern, a basic shared VPS is fine for getting started. The dedicated resources and HA infrastructure matter when AI agent reliability becomes a business requirement — when a failed agent chain means a missed customer response or a broken data pipeline.