OpenClaw + OpenAI: How to Use GPT Models with Your Agents

OpenClaw defaults to Anthropic's Claude models, but it works with OpenAI's GPT models too. If you already have an OpenAI API key, or if a specific task works better with GPT-4o or o3, swapping models takes about 30 seconds.
This guide covers the full setup — API keys, model selection, config files, cost comparison, and when each provider actually makes sense.
Why Use OpenAI Models with OpenClaw?
Claude is the default for good reason — OpenClaw is built by Anthropic, and the agentic loop is optimized for Claude's tool-use patterns. But there are legitimate reasons to reach for OpenAI:
GPT-4o has strong vision capabilities and faster response times for certain tasks
o1 and o3 reasoning models excel at math-heavy and logic-intensive problems
Cost arbitrage — for simple tasks, GPT-4o-mini can be significantly cheaper than Claude Sonnet
Existing billing — if your company already has an OpenAI Enterprise agreement
OpenClaw's agent framework is model-agnostic. The harness (tools, file access, shell commands) stays the same regardless of which model is doing the thinking.
Setting Up Your OpenAI API Key
First, grab your API key from platform.openai.com. Then set it as an environment variable:
# Add to your shell profile (~/.zshrc or ~/.bashrc) export OPENAI_API_KEY=sk-proj-your-key-here # Reload your shell source ~/.zshrc
Verify it's set:
echo $OPENAI_API_KEY # Should print your key
You can also set it per-session if you don't want it permanently in your profile. Just export it before launching OpenClaw.
Get Your Free Marketing Audit
AI agents analyze your site for SEO, CRO, and content issues — full report in 2 minutes.
Configuring OpenClaw to Use GPT Models
OpenClaw reads model configuration from its settings file. You can set this globally or per-project.
Global Configuration
Edit your global OpenClaw config:
# Open the settings file openclaw config set model openai:gpt-4o
Or edit ~/.openclaw/config.json directly:
{
"model": "openai:gpt-4o",
"apiKeys": {
"openai": "sk-proj-your-key-here"
}
}
Per-Project Configuration
Create a .openclaw/config.json in your project root:
{
"model": "openai:gpt-4o"
}
Project-level config overrides global config. This is useful when different repos need different models.
Switching Models on the Fly
You can also specify the model per-command without changing any config:
# Use GPT-4o for a single task openclaw --model openai:gpt-4o "analyze this CSV and create a summary report" # Use o3 for a reasoning-heavy task openclaw --model openai:o3 "find the bug in this algorithm and prove the fix is correct" # Use GPT-4o-mini for a simple task openclaw --model openai:gpt-4o-mini "rename all files in this directory to kebab-case"
Available OpenAI Models
Here's what works with OpenClaw and when to use each:
| Model | Best For | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Tool Use | | ------- | ---------- | --------------------------- | ---------------------------- | ---------- | | gpt-4o | General tasks, vision, fast responses | $2.50 | $10.00 | Excellent | | gpt-4o-mini | Simple tasks, high volume | $0.15 | $0.60 | Good | | o1 | Complex reasoning, math | $15.00 | $60.00 | Limited | | o3 | Advanced reasoning, research | $10.00 | $40.00 | Good | | o3-mini | Reasoning on a budget | $1.10 | $4.40 | Good |
For comparison, Claude Sonnet 4 costs $3/$15 per million tokens (input/output). Claude Haiku runs $0.25/$1.25.
When to Use OpenAI vs Anthropic
After running hundreds of agentic tasks across both providers, here's where each shines:
Use Claude (default) when:
Writing or editing code — Claude's code generation is consistently more careful with existing codebases
Multi-step file operations — Claude follows OpenClaw's tool-use patterns more reliably
Long-context tasks — Claude's 200K context window handles large codebases better
You need the agent to ask clarifying questions instead of guessing
Use GPT-4o when:
Vision-heavy tasks — analyzing screenshots, diagrams, or UI mockups
Quick one-shot tasks where speed matters more than depth
You need web browsing with strong summarization
Batch processing simple, repetitive operations
Use o3 when:
Mathematical proofs or algorithm design
Tasks that require multi-step logical reasoning
Code review where you want the model to think deeply about edge cases
Research synthesis across multiple documents
Practical Examples
Using GPT-4o for Screenshot Analysis
openclaw --model openai:gpt-4o "look at the screenshot in ./bug-report.png and identify what UI element is broken, then fix the CSS"
GPT-4o's vision model parses the screenshot, identifies the issue, and OpenClaw's file tools make the fix.
Using o3 for Algorithm Optimization
openclaw --model openai:o3 "the sorting function in src/utils/sort.js is O(n²) — refactor it to O(n log n) and add benchmarks proving the improvement"
The o3 reasoning model takes longer but produces more thorough algorithmic analysis.
Using GPT-4o-mini for Bulk Operations
openclaw --model openai:gpt-4o-mini "add JSDoc comments to every exported function in the src/lib/ directory"
For repetitive, well-defined tasks, mini models save significant cost without sacrificing quality.
Cost Comparison: A Real Workday
Here's what a typical day of agent usage costs across providers, assuming ~50 tasks:
| Task Type | Count | Claude Sonnet | GPT-4o | GPT-4o-mini | | ----------- | ------- | -------------- | -------- | ------------- | | Code edits | 20 | $4.80 | $3.20 | $0.48 | | File analysis | 15 | $2.70 | $1.80 | $0.27 | | Research/browsing | 10 | $3.60 | $2.40 | $0.36 | | Complex reasoning | 5 | $4.50 | $3.00 | $0.45 | | Daily total | 50 | $15.60 | $10.40 | $1.56 |
The cheapest option isn't always the best. GPT-4o-mini will fumble complex tasks that Claude Sonnet handles cleanly. Match the model to the task complexity.
Multi-Model Workflows
The power move is using different models for different parts of your workflow. Some teams set up OpenClaw skills that specify the model per-skill:
# .openclaw/skills/quick-fix.md --- model: openai:gpt-4o-mini --- Fix the specified bug with minimal changes. Run tests after.
# .openclaw/skills/deep-review.md --- model: openai:o3 --- Review the code changes for logical errors, edge cases, and security issues.
This way, cheap models handle routine work while expensive models focus on tasks that justify the cost.
Troubleshooting
"Invalid API key" error: Make sure you're using a project key (starts with sk-proj-) not a legacy key. Check that the environment variable is exported, not just set.
Model not found: Use the full provider prefix — openai:gpt-4o, not just gpt-4o. OpenClaw needs to know which API endpoint to hit.
Rate limits: OpenAI's rate limits are per-organization. If you hit them, either upgrade your usage tier on platform.openai.com or add request delays in your OpenClaw config.
Tool use failures with o1: The o1 model has limited tool-use support. If your task requires heavy file operations or shell commands, stick with gpt-4o or Claude.
For teams managing multiple agents with different model configurations, RunAgents lets you assign models per-agent from a dashboard — no config files to juggle.
Frequently Asked Questions
Can I use OpenAI and Anthropic models in the same OpenClaw session?
Not within a single session, but you can switch between sessions. Each OpenClaw invocation uses one model. Use the --model flag to pick per-task, or set up skills with different model defaults.
Does OpenClaw work with the OpenAI Assistants API?
No. OpenClaw uses OpenAI's Chat Completions API directly. It manages its own agentic loop, tool calls, and context — it doesn't rely on OpenAI's assistant or thread abstractions.
Is function calling different between Claude and GPT models in OpenClaw?
OpenClaw abstracts this away. Its tool-use system translates to the correct format for each provider. You write the same commands regardless of which model is running underneath.
Will my OpenClaw skills work with OpenAI models?
Yes. Skills are model-agnostic instructions. The same skill file works with Claude, GPT-4o, or any other supported model. Performance may vary — some skills work better with certain models.
What happens to my conversation history when I switch models?
Each OpenClaw session is independent. Switching models between sessions doesn't carry over context. If you need continuity, use OpenClaw's session resume feature within the same model.
Is there a cost dashboard to track spending across providers?
OpenClaw itself doesn't have one, but ClawMetry provides local token and cost tracking. For team-wide visibility across multiple agents and providers, RunAgents includes per-agent cost dashboards.
Want multi-model agents managed from one dashboard? RunAgents gives you managed OpenClaw hosting with task management, team collaboration, and agent debugging built in. Get started free
Related Guides
What Is OpenClaw? -- Framework overview
OpenClaw Skills Guide -- Extend your agents
Run OpenClaw in Docker -- Containerized setup
Get Your Free Marketing Audit
Our AI agents analyze your site and surface every SEO, CRO, and content problem — with prioritized fixes. Full report in 2 minutes.
Audit My Site Free →No credit card required
Keep reading
How to Use OpenClaw for SEO: 6 Autonomous Workflows to Outrank Your Competition
How to Use OpenClaw for SEO: 6 Autonomous Workflows to Outrank Your Competition SEO is one of those disciplines that's ...
17 min readHow to Use OpenClaw for Marketing: 5 Workflows That Actually Work
How to Use OpenClaw for Marketing: 5 Workflows That Actually Work Most marketing teams are drowning in tools. The avera...
12 min readOpenClaw Dashboard: How to Monitor and Manage Your AI Agents
OpenClaw is powerful. It's also a black box. You fire off a task, the agent runs for 15 minutes, and you get a result. ...
8 min read