
"Claude Computer Use API Guide: Build AI Desktop Automation in 2026"
Claude Computer Use API Guide: Build AI Desktop Automation in 2026#
Claude's Computer Use capability is one of the most groundbreaking AI features of 2026. It allows Claude to see your screen, move the mouse, click buttons, type text, and navigate applications — effectively giving AI the ability to use a computer like a human would.
This guide covers everything developers need to know about integrating Claude Computer Use into their applications.
What is Claude Computer Use?#
Claude Computer Use is a tool-use capability that enables Claude to:
- See the screen: Take and analyze screenshots
- Move the mouse: Navigate to specific coordinates
- Click elements: Left-click, right-click, double-click
- Type text: Enter text into any input field
- Use keyboard shortcuts: Ctrl+C, Alt+Tab, etc.
- Scroll: Navigate through long pages and documents
- Coordinate across apps: Switch between multiple windows
Unlike traditional browser automation (Selenium, Playwright), Computer Use works with any application — web browsers, desktop software, terminal windows, even games.
How It Works#
┌─────────────────────────────────────────────────────┐
│ Your Application │
│ │
│ 1. Send task description + screenshot to Claude │
│ 2. Claude analyzes the screen │
│ 3. Claude returns tool calls (click, type, etc.) │
│ 4. Your app executes the actions │
│ 5. Take new screenshot │
│ 6. Repeat until task complete │
│ │
│ [Screenshot] → [Claude] → [Actions] → [Screenshot] │
└─────────────────────────────────────────────────────┘
Getting Started with Claude Computer Use#
Prerequisites#
- Anthropic API key (or a Crazyrouter key for cost savings)
- Python 3.10+
- A display environment (local desktop, VNC, or headless browser)
Basic Implementation (Python)#
import anthropic
import base64
import subprocess
client = anthropic.Anthropic(
api_key="YOUR_API_KEY",
base_url="https://crazyrouter.com/v1" # Optional: use Crazyrouter for savings
)
def take_screenshot():
"""Capture the current screen."""
subprocess.run(["scrot", "/tmp/screenshot.png"], check=True)
with open("/tmp/screenshot.png", "rb") as f:
return base64.standard_b64encode(f.read()).decode()
def execute_action(action):
"""Execute a computer use action."""
if action["type"] == "click":
subprocess.run(["xdotool", "mousemove", str(action["x"]), str(action["y"])])
subprocess.run(["xdotool", "click", "1"])
elif action["type"] == "type":
subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
elif action["type"] == "key":
subprocess.run(["xdotool", "key", action["key"]])
elif action["type"] == "scroll":
direction = "5" if action["direction"] == "down" else "-5"
subprocess.run(["xdotool", "click", direction])
def computer_use_loop(task: str, max_steps: int = 20):
"""Main loop for Computer Use interaction."""
messages = []
screenshot = take_screenshot()
messages.append({
"role": "user",
"content": [
{"type": "text", "text": task},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot
}
}
]
})
for step in range(max_steps):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 0
}],
messages=messages
)
# Check if Claude wants to use a tool
tool_use_blocks = [b for b in response.content if b.type == "tool_use"]
if not tool_use_blocks:
# Claude is done - extract final text response
text_blocks = [b.text for b in response.content if b.type == "text"]
return "\n".join(text_blocks)
# Execute each action
messages.append({"role": "assistant", "content": response.content})
for tool_block in tool_use_blocks:
action = tool_block.input
print(f"Step {step + 1}: {action.get('action', 'unknown')} at ({action.get('coordinate', 'N/A')})")
execute_action(action)
# Take new screenshot after actions
import time
time.sleep(1) # Wait for UI to update
screenshot = take_screenshot()
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_blocks[0].id,
"content": [{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot
}
}]
}]
})
return "Max steps reached"
# Example: automate a web task
result = computer_use_loop(
"Open Firefox, go to crazyrouter.com, and take a screenshot of the pricing page"
)
print(result)
Docker Setup for Headless Environments#
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
xvfb \
x11vnc \
scrot \
xdotool \
firefox \
python3 \
python3-pip
RUN pip3 install anthropic
# Set up virtual display
ENV DISPLAY=:99
RUN Xvfb :99 -screen 0 1920x1080x24 &
COPY computer_use.py /app/
WORKDIR /app
CMD ["python3", "computer_use.py"]
docker build -t claude-computer-use .
docker run -it claude-computer-use
Claude Computer Use vs Traditional Automation#
| Feature | Claude Computer Use | Selenium/Playwright | AutoHotkey/AppleScript |
|---|---|---|---|
| Works with any app | ✅ Any visual app | ❌ Web only | ⚠️ OS-specific |
| No selectors needed | ✅ Visual understanding | ❌ CSS/XPath required | ❌ Window handles |
| Handles dynamic UI | ✅ Adapts automatically | ⚠️ Brittle selectors | ❌ Hard-coded positions |
| Natural language tasks | ✅ "Click the submit button" | ❌ Explicit code only | ❌ Explicit code only |
| Speed | Slower (screenshot loop) | Fast (direct DOM) | Fast (direct API) |
| Cost | API costs per step | Free (self-hosted) | Free |
| Reliability | Good (improving) | Excellent (for web) | Good (for OS tasks) |
Pricing#
| Model | Anthropic Direct | Crazyrouter | Savings |
|---|---|---|---|
| Claude Sonnet 4 (input) | $3/1M tokens | $2.10/1M tokens | 30% |
| Claude Sonnet 4 (output) | $15/1M tokens | $10.50/1M tokens | 30% |
| Claude Opus 4 (input) | $15/1M tokens | $10.50/1M tokens | 30% |
| Claude Opus 4 (output) | $75/1M tokens | $52.50/1M tokens | 30% |
Typical cost per Computer Use session (10 steps, Sonnet):
- Direct: ~$0.15-0.30 per task
- Via Crazyrouter: ~$0.10-0.21 per task
Note: Each step involves sending a screenshot (~1,500 tokens for image) plus receiving action instructions. More complex tasks requiring more steps will cost proportionally more.
Real-World Use Cases#
1. Automated Testing#
result = computer_use_loop(
"Test the login flow: enter username 'test@example.com' and password 'test123', "
"click Login, verify the dashboard loads correctly, then log out."
)
2. Data Entry Automation#
result = computer_use_loop(
"Open the CRM application, create a new contact with name 'John Smith', "
"email 'john@example.com', phone '555-0123', company 'Acme Corp'. Save the record."
)
3. Legacy System Integration#
result = computer_use_loop(
"Open the AS/400 terminal emulator, navigate to the inventory screen, "
"search for part number 'XJ-2847', and read back the current stock level."
)
4. Web Scraping Without Selectors#
result = computer_use_loop(
"Go to the competitor's pricing page, read all the plan names and prices, "
"and summarize them in a structured format."
)
Best Practices#
- Set reasonable step limits — Most tasks complete in 5-15 steps
- Add delays between actions — UI needs time to render (0.5-2s)
- Use clear, specific instructions — "Click the blue 'Submit' button in the top-right"
- Handle errors gracefully — Claude might misclick; your loop should allow retries
- Optimize screenshot size — Resize screenshots to reduce token costs
- Use Sonnet for most tasks — Opus is only needed for extremely complex multi-step tasks
- Sandbox the environment — Run in Docker/VM to prevent unintended actions
FAQ#
What models support Computer Use?#
Claude Sonnet 4 and Claude Opus 4 both support Computer Use. Sonnet is recommended for most tasks due to its cost-performance balance. Computer Use is not available on Claude Haiku.
Is Claude Computer Use safe for production?#
Computer Use should always run in a sandboxed environment (Docker container, VM, or isolated display). Never give it access to sensitive systems without proper safeguards, human oversight, and action logging.
How does Computer Use handle screen resolution?#
You specify the display dimensions when initializing the tool. Claude adapts its coordinate calculations to match. Common setups are 1920×1080 or 1280×720. Lower resolutions reduce image token costs.
Can Computer Use work with mobile apps?#
Yes, through Android emulators or iOS simulators. Set up a virtual display running the mobile emulator, and Claude can interact with mobile UIs the same way it handles desktop applications.
How do I reduce Computer Use API costs?#
Use Crazyrouter for 30% savings on all Claude API calls. Additionally, resize screenshots before sending (720p is usually sufficient), minimize unnecessary steps by writing clear instructions, and use Sonnet instead of Opus for standard tasks.
Summary#
Claude Computer Use is a revolutionary capability that lets AI interact with any visual application. While traditional automation tools remain better for stable, high-speed web tasks, Computer Use shines for complex, multi-application workflows and dynamic UIs.
For cost-effective Computer Use deployments, Crazyrouter offers 30% savings on all Anthropic API calls, plus access to 300+ other AI models through the same key.
Start building AI automation → Get your API key at Crazyrouter.com


