Complete Guide: OpenWebUI with Multiple Ollama Servers via Tailscale

Complete Guide: OpenWebUI with Multiple Ollama Servers via Tailscale

OpenWebUI multi-Ollama setup architecture with Tailscale VPN

🌟 Architecture Overview

Your setup will consist of:

  • OpenWebUI running on a VPS (centralized web interface)
  • 3 Ollama servers on different devices (GPU workstations, laptops, etc.)
  • Tailscale VPN connecting all devices securely
  • Automatic load balancing and failover built into OpenWebUI

🔧 Prerequisites

Before starting, ensure you have:

  • A VPS with Docker installed
  • 3 devices capable of running Ollama
  • A Tailscale account
  • Basic command-line knowledge

📋 Step 1: Setting Up Tailscale Network

1.1 Install Tailscale on All Devices

On VPS (Ubuntu/Debian):

curl -fsSL https://tailscale.com/install.sh | sh

On Device 1-3 (Linux):

curl -fsSL https://tailscale.com/install.sh | sh

On Windows devices:
Download from tailscale.com and install[1]

1.2 Connect All Devices to Tailscale

On each device:

sudo tailscale up

Verify connection:

tailscale ip -4

Note down the Tailscale IP addresses for each device:

  • VPS: e.g., 100.64.0.1
  • Device 1: e.g., 100.64.0.2
  • Device 2: e.g., 100.64.0.3
  • Device 3: e.g., 100.64.0.4

🤖 Step 2: Configure Ollama Servers

2.1 Install Ollama on Each Device

On Linux/macOS:

curl -fsSL https://ollama.com/install.sh | sh

On Windows:
Download from ollama.ai

2.2 Configure Ollama for Network Access

Important: By default, Ollama only listens on 127.0.0.1. We need to configure it to accept connections from other devices[2][3].

On Linux (systemd service):

sudo systemctl edit ollama

Add this configuration:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

On Windows:

setx OLLAMA_HOST 0.0.0.0:11434

On macOS:

launchctl setenv OLLAMA_HOST 0.0.0.0:11434

2.3 Restart Ollama Services

Linux:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Windows/macOS:
Restart the Ollama application.

2.4 Test Ollama Access

Test each server from your VPS:

# Replace with actual Tailscale IPs
curl http://100.64.0.2:11434/api/tags
curl http://100.64.0.3:11434/api/tags
curl http://100.64.0.4:11434/api/tags

2.5 Pull Models on Each Server

Download models on each Ollama server:

# Example: Download a model on each server
ollama pull llama3.2
ollama pull codellama
ollama pull mistral

🌐 Step 3: Deploy OpenWebUI on VPS

3.1 Create OpenWebUI with Multiple Ollama Endpoints

Method 1: Using Docker with Environment Variables

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URLS="http://100.64.0.2:11434;http://100.64.0.3:11434;http://100.64.0.4:11434" \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Method 2: Using Docker Compose

# docker-compose.yml
version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URLS=http://100.64.0.2:11434;http://100.64.0.3:11434;http://100.64.0.4:11434
    volumes:
      - open-webui:/app/backend/data
    restart: always

volumes:
  open-webui:

Deploy:

docker-compose up -d

⚙️ Step 4: Configure OpenWebUI Admin Settings

4.1 Access OpenWebUI Admin Interface

  1. Navigate to http://your-vps-ip:3000
  2. Create an admin account
  3. Go to Admin SettingsSettingsConnections

OpenWebUI admin interface for configuring multiple Ollama servers

OpenWebUI admin interface for configuring multiple Ollama servers

4.2 Configure Multiple Ollama Endpoints

In the admin interface:

  1. Navigate to ConnectionsOllama API
  2. Add multiple endpoints:
    • http://100.64.0.2:11434 (Device 1)
    • http://100.64.0.3:11434 (Device 2)
    • http://100.64.0.4:11434 (Device 3)

Verify connections:
Each endpoint should show as "Connected" with a green indicator[4][5].

🔄 Step 5: Load Balancing and Failover Configuration

5.1 How OpenWebUI Handles Multiple Servers

OpenWebUI automatically provides:

  • Round-robin load balancing across available servers[6][7]
  • Automatic failover when servers are unavailable[8]
  • Smart routing to servers containing specific models[9]

OpenWebUI failover and load balancing flowchart

OpenWebUI failover and load balancing flowchart

5.2 Load Balancing Behavior

OpenWebUI's built-in load balancing:

  • Distributes requests across all available Ollama instances
  • Uses a simple round-robin approach by default
  • Can be enhanced with weighted algorithms in future versions[8:1]

5.3 Failover Mechanism

Automatic failover process:

  1. OpenWebUI detects unresponsive server
  2. Automatically routes requests to available servers
  3. Continues monitoring failed servers
  4. Restores traffic when servers come back online

🛠️ Step 6: Advanced Configuration

6.1 Model Consistency

Ensure model consistency across servers:

# Update all models on all servers
ollama pull llama3.2
ollama pull codellama
ollama pull mistral

Important: All servers should have the same model versions and tags for optimal performance[7:1][9:1].

6.2 Health Monitoring Script

Create a health check script:

#!/bin/bash
# health_check.sh

SERVERS=(
  "100.64.0.2:11434"
  "100.64.0.3:11434"
  "100.64.0.4:11434"
)

for server in "${SERVERS[@]}"; do
  if curl -s "http://$server/api/tags" > /dev/null; then
    echo "✅ $server is healthy"
  else
    echo "❌ $server is down"
  fi
done

6.3 Custom Load Balancer (Optional)

For more advanced load balancing, consider using the Ollama Load Balancer tool:

# Install Ollama Load Balancer
wget https://github.com/BigBIueWhale/ollama_load_balancer/releases/latest/download/ollama_load_balancer

# Configure load balancer
./ollama_load_balancer \
  --servers "100.64.0.2:11434,100.64.0.3:11434,100.64.0.4:11434" \
  --port 11435

🔒 Security Considerations

6.1 Tailscale Security

Benefits:

  • End-to-end encryption
  • Zero-trust network architecture
  • No exposed ports to the internet
  • Automatic certificate management

6.2 Additional Security Measures

Firewall configuration:

# Only allow Tailscale traffic on Ollama port
sudo ufw allow from 100.64.0.0/10 to any port 11434
sudo ufw deny 11434

Authentication:

  • Enable OpenWebUI authentication
  • Use strong passwords
  • Consider OAuth integration

🚀 Step 7: Testing and Verification

7.1 Test Load Balancing

Create test requests:

# Test multiple requests to see load distribution
for i in {1..10}; do
  curl -X POST http://your-vps-ip:3000/api/chat \
    -H "Content-Type: application/json" \
    -d '{"model": "llama3.2", "message": "Hello"}'
done

7.2 Test Failover

Simulate server failure:

  1. Stop Ollama on one device: sudo systemctl stop ollama
  2. Verify requests still work through OpenWebUI
  3. Restart the server: sudo systemctl start ollama
  4. Confirm it rejoins the pool

7.3 Monitor Performance

Check server metrics:

# Monitor system resources
htop
nvidia-smi  # For GPU usage
netstat -an | grep 11434

📊 Performance Optimization

8.1 Model Distribution Strategy

Distribute models based on hardware:

  • High-end GPU server: Large models (70B+)
  • Mid-range server: Medium models (13B-34B)
  • Low-end server: Small models (7B and below)

8.2 Network Optimization

Optimize Tailscale performance:

# Enable subnet routing for better performance
sudo tailscale up --advertise-routes=192.168.1.0/24

8.3 Resource Management

Monitor and manage resources:

  • Use htop to monitor CPU usage
  • Use nvidia-smi for GPU monitoring
  • Set up alerts for high resource usage

🔧 Troubleshooting Common Issues

9.1 Connection Problems

Issue: OpenWebUI can't connect to Ollama servers

Solutions:

  1. Verify Tailscale connectivity: tailscale ping 100.64.0.2
  2. Check Ollama is listening on all interfaces: netstat -an | grep 11434
  3. Verify firewall settings
  4. Test direct API access: curl http://100.64.0.2:11434/api/tags

9.2 Load Balancing Issues

Issue: Requests only go to one server

Solutions:

  1. Check all servers are properly configured in OpenWebUI
  2. Verify model consistency across servers
  3. Review OpenWebUI logs: docker logs open-webui

9.3 Performance Issues

Issue: Slow response times

Solutions:

  1. Check network latency between servers
  2. Monitor resource usage on each server
  3. Optimize model distribution
  4. Consider upgrading hardware

🎯 Best Practices

10.1 Maintenance

Regular maintenance tasks:

  • Update models regularly across all servers
  • Monitor server health and performance
  • Keep Tailscale and OpenWebUI updated
  • Backup OpenWebUI configuration

10.2 Scaling

To add more servers:

  1. Install Ollama on new device
  2. Add to Tailscale network
  3. Configure OLLAMA_HOST
  4. Add endpoint to OpenWebUI configuration

10.3 Monitoring

Set up monitoring:

  • Use tools like Prometheus/Grafana
  • Monitor API response times
  • Track model usage statistics
  • Set up alerts for server failures

🎉 Conclusion

You now have a robust, scalable setup with:

  • One OpenWebUI instance on VPS
  • Three Ollama servers connected via Tailscale
  • Automatic load balancing and failover
  • Secure network connectivity
  • High availability architecture

This setup provides excellent performance, redundancy, and scalability for your AI workloads. The combination of OpenWebUI's built-in load balancing, Tailscale's secure networking, and multiple Ollama instances creates a professional-grade AI infrastructure that can handle multiple users and concurrent requests efficiently.

Happy AI computing! 🚀🤖


  1. https://dev.to/coderberry/how-to-set-up-ollama-on-windows-for-network-access-via-tailscale-4ph6 ↩︎

  2. https://aident.ai/blog/how-to-expose-ollama-service-api-to-network ↩︎

  3. https://www.youtube.com/watch?v=UnPpUAt7hNM ↩︎

  4. https://github.com/open-webui/open-webui/issues/788 ↩︎

  5. https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/ ↩︎

  6. https://docs.openwebui.com/features/ ↩︎

  7. https://docs.openwebui.com/tutorial/ollama/ ↩︎ ↩︎

  8. https://huggingface.co/spaces/open-webui/open-webui/blob/f3b054e21b69b8238d4077761d041bac54f6080f/backend/open_webui/routers/ollama.py ↩︎ ↩︎

  9. https://docs.openwebui.com/features/ollama/ ↩︎ ↩︎

Read more