Using OpenClaw with Ollama: Free, Private AI Agents with Local LLMs
|7 min read
Table of Contents
Haven't installed OpenClaw yet?
curl -fsSL https://openclaw.ai/install.sh | bash
iwr -useb https://openclaw.ai/install.ps1 | iex
curl -fsSL https://openclaw.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
Worried it'll affect your machine? ClawTank — cloud deploy in 60s, zero risk to your files.
Using OpenClaw with Ollama: Free, Private AI Agents with Local LLMs
Every request to a cloud LLM provider costs money. A typical OpenClaw conversation involving a few tool calls might cost $0.02--$0.10. If you run your agent frequently, monthly API bills can reach $20--$50 or more.
Ollama changes that equation entirely. By running an open-source language model on your own hardware, you can operate OpenClaw with zero API costs, complete data privacy, and no internet dependency. The tradeoff is capability -- local models are not as strong as frontier cloud models -- but for many everyday agent tasks, they are sufficient.
Why Run a Local LLM?
Cost: Local inference after the initial hardware investment costs only electricity -- a few cents per hour even with a dedicated GPU.
Privacy: A local LLM processes everything on your machine. Nothing leaves your network. This matters for sensitive data like medical records, financial documents, or proprietary code.
Availability: A local LLM works in airplane mode, during ISP outages, and without any API key. For always-on agents, local inference removes an entire category of failure modes.
Installing Ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Verify and start
ollama --version
ollama serve
By default, Ollama listens on http://localhost:11434. Verify with:
curl http://localhost:11434/api/tags
Choosing the Right Model
Agent workloads require strong instruction following, reliable tool-use formatting, and sufficient context length. Here are the best models for OpenClaw, ranked by capability.
Tier 1: Best Quality (Workstation GPU)
Model
Parameters
VRAM
Context
Best For
Qwen 2.5 72B (Q4)
72B
48 GB
128K
Closest to cloud quality
Llama 3.3 70B (Q4)
70B
44 GB
128K
Strong general-purpose
DeepSeek-R1 70B (Q4)
70B
44 GB
64K
Reasoning-heavy tasks
Tier 2: Sweet Spot (Consumer GPU)
Model
Parameters
VRAM
Context
Best For
Qwen 2.5 32B (Q4)
32B
20 GB
128K
Best quality/speed balance
DeepSeek-R1 32B (Q4)
32B
20 GB
64K
Complex reasoning
Mistral Small 24B (Q4)
24B
16 GB
128K
Fast, good tool use
# Recommended starting point for RTX 3090/4080/4090
ollama pull qwen2.5:32b-instruct-q4_K_M
A 32B model in Q4 quantization hits the sweet spot for most users with a modern gaming GPU[1].
Tier 3: Accessible (Any GPU or CPU)
Model
Parameters
VRAM
Context
Best For
Llama 3.1 8B (Q4)
8B
6 GB
128K
Light tasks, fast responses
Mistral 7B (Q4)
7B
5 GB
32K
Simple automations
Qwen 2.5 7B (Q4)
7B
5 GB
128K
Multilingual tasks
Context Length
OpenClaw includes conversation history, skill definitions, tool results, and system instructions in every prompt. Multi-step tasks can reach 30,000--50,000 tokens. Minimum context length: 64K tokens. Models with 128K context are preferred.
Key settings: type: "openai-compatible" uses Ollama's OpenAI-compatible endpoint. apiKey: "ollama" is a placeholder -- Ollama needs no auth but the field cannot be empty. maxConcurrentTasks: 1 avoids memory pressure from parallel inference. taskTimeout: 300 gives local models adequate time.
Deploy your own AI assistant
ClawTank deploys OpenClaw for you — no servers, no Docker, no SSH. Free 14-day trial included.
Lower temperature (0.3) produces more consistent agent behavior. The repeat penalty prevents loops that local models sometimes fall into.
Limitations of Local Models
Local models are worse than frontier cloud models in several ways. Instruction following is less precise -- local models sometimes skip steps or hallucinate nonexistent tool calls. Complex reasoning is noticeably weaker; a 32B local model performs roughly at the level of a cloud model from 2--3 generations ago[3]. Tool use reliability is lower -- occasional malformed JSON or wrong parameter names. And long context quality degrades more than with cloud models.
The Hybrid Strategy
Use local for routine tasks, cloud for complex work:
openclaw chat "Check if my backup ran last night" # local
openclaw chat --provider anthropic "Review this PR and suggest fixes" # cloud
This typically reduces cloud API costs by 60--80%.
Remote Ollama and Offline Operation
Run Ollama on a GPU machine and connect from a Raspberry Pi or laptop:
# On GPU machine
OLLAMA_HOST=0.0.0.0:11434 ollama serve
# In openclaw.json on the client
"baseUrl": "http://192.168.1.100:11434/v1"
For fully air-gapped environments, pre-download models on a connected machine and copy ~/.ollama/ to the offline machine:
# On connected machine
ollama pull qwen2.5:32b-instruct-q4_K_M
# Transfer to offline machine
rsync -av ~/.ollama/ user@offline-machine:~/.ollama/
Then start Ollama and OpenClaw normally on the offline machine. This makes OpenClaw viable in classified environments, remote locations, and any scenario where data must not leave the local machine.
If managing this infrastructure is not for you, ClawTank offers hosted instances with provider management built in.
Security note: Ollama has no authentication by default. Only expose it on trusted networks, or use an SSH tunnel for remote access:
# Check loaded models and memory usage
ollama ps
# Monitor inference in real-time
journalctl -u ollama -f
If performance degrades, check GPU temperature (thermal throttling), verify VRAM is not oversubscribed with nvidia-smi, ensure the model is fully loaded in VRAM without RAM spill, and restart Ollama to clear memory fragmentation.
Summary
Running OpenClaw with Ollama gives you a private, zero-cost AI agent. The practical path:
Install Ollama and pull a model matched to your hardware
Start with Qwen 2.5 32B (24 GB GPU) or Llama 3.1 8B (smaller GPUs)
Configure OpenClaw with openai-compatible provider type
Use the hybrid strategy for cost savings with quality where it matters
Tune context length and keep-alive for agent workloads
The local model handles 70--80% of everyday tasks well, and you always have the cloud for the rest.