
OpenClaw Advanced Techniques: Expert-Level Optimization
OpenClaw Advanced Techniques: Expert-Level Optimization#
This comprehensive guide covers advanced OpenClaw techniques for production environments. From sophisticated caching strategies to multi-region deployments, you'll learn expert-level patterns that maximize performance, reliability, and cost-efficiency.
Advanced Memory Management Strategies#
OpenClaw's memory system is crucial for maintaining context across conversations. Advanced memory management techniques can dramatically improve response quality and reduce costs.
Hierarchical Memory Architecture#
Implement a multi-tier memory system for optimal performance:
Hot Memory: Recent interactions stored in Redis with sub-millisecond access. This tier handles the most frequently accessed context and should contain the last 10-20 conversation turns.
Warm Memory: Summarized conversations stored in PostgreSQL with indexed full-text search. This tier provides fast access to historical context while reducing storage costs.
Cold Memory: Complete conversation archives in object storage (S3/MinIO) with vector embeddings for semantic search. This tier enables long-term context retrieval at minimal cost.
export class HierarchicalMemory {
constructor(
private redis: RedisClient,
private postgres: PostgresClient,
private s3: S3Client,
private embeddings: EmbeddingService
) {}
async store(conversationId: string, message: Message): Promise<void> {
// Store in hot memory
await this.redis.lpush(
`conv:${conversationId}:hot`,
JSON.stringify(message)
);
await this.redis.ltrim(`conv:${conversationId}:hot`, 0, 19);
// Check if we need to promote to warm memory
const hotSize = await this.redis.llen(`conv:${conversationId}:hot`);
if (hotSize >= 20) {
await this.promoteToWarm(conversationId);
}
}
private async promoteToWarm(conversationId: string): Promise<void> {
// Get oldest messages from hot memory
const messages = await this.redis.lrange(
`conv:${conversationId}:hot`,
10,
19
);
// Summarize with AI
const summary = await this.summarizeMessages(messages);
// Store in PostgreSQL
await this.postgres.query(
`INSERT INTO conversation_summaries (conversation_id, summary, message_count, created_at)
VALUES ($1, $2, $3, NOW())`,
[conversationId, summary, messages.length]
);
// Remove from hot memory
await this.redis.ltrim(`conv:${conversationId}:hot`, 0, 9);
}
async retrieve(conversationId: string, query: string): Promise<Context> {
// Search hot memory first
const hotMessages = await this.redis.lrange(
`conv:${conversationId}:hot`,
0,
-1
);
// Search warm memory with full-text search
const warmResults = await this.postgres.query(
`SELECT summary FROM conversation_summaries
WHERE conversation_id = $1
AND to_tsvector('english', summary) @@ plainto_tsquery('english', $2)
ORDER BY created_at DESC LIMIT 5`,
[conversationId, query]
);
// Search cold memory with vector similarity
const queryEmbedding = await this.embeddings.embed(query);
const coldResults = await this.searchVectorStore(
conversationId,
queryEmbedding
);
return this.combineContext(hotMessages, warmResults, coldResults);
}
}
Intelligent Context Pruning#
Automatically prune less relevant context to stay within token limits:
export class ContextPruner {
constructor(private embeddings: EmbeddingService) {}
async prune(
messages: Message[],
maxTokens: number,
currentQuery: string
): Promise<Message[]> {
// Always keep system message and recent messages
const systemMessages = messages.filter(m => m.role === 'system');
const recentMessages = messages.slice(-5);
// Calculate relevance scores for remaining messages
const queryEmbedding = await this.embeddings.embed(currentQuery);
const scoredMessages = await Promise.all(
messages.slice(0, -5).map(async (msg) => {
const msgEmbedding = await this.embeddings.embed(msg.content);
const similarity = this.cosineSimilarity(queryEmbedding, msgEmbedding);
return { message: msg, score: similarity };
})
);
// Sort by relevance
scoredMessages.sort((a, b) => b.score - a.score);
// Add messages until we hit token limit
const result = [...systemMessages];
let tokenCount = this.countTokens(result);
for (const { message } of scoredMessages) {
const messageTokens = this.countTokens([message]);
if (tokenCount + messageTokens <= maxTokens - this.countTokens(recentMessages)) {
result.push(message);
tokenCount += messageTokens;
}
}
// Add recent messages
result.push(...recentMessages);
return result;
}
private cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
private countTokens(messages: Message[]): number {
return messages.reduce(
(sum, msg) => sum + Math.ceil(msg.content.length / 4),
0
);
}
}
Advanced Caching Patterns#
Sophisticated caching strategies can reduce API costs by 70-90% while maintaining response quality.
Semantic Cache with Vector Similarity#
Cache responses based on semantic similarity rather than exact matches:
import { createClient } from 'redis';
import { OpenAI } from 'openai';
export class SemanticCache {
private redis: ReturnType<typeof createClient>;
private embeddings: OpenAI;
constructor() {
this.redis = createClient({ url: process.env.REDIS_URL });
this.embeddings = new OpenAI({
apiKey: process.env.CRAZYROUTER_API_KEY,
baseURL: 'https://crazyrouter.com/v1'
});
}
async get(query: string, threshold: number = 0.95): Promise<string | null> {
// Generate embedding for query
const queryEmbedding = await this.embeddings.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
// Search for similar cached queries
const results = await this.redis.ft.search(
'idx:cache',
`*=>[KNN 5 @embedding $vector AS score]`,
{
PARAMS: {
vector: Buffer.from(
new Float32Array(queryEmbedding.data[0].embedding).buffer
)
},
SORTBY: 'score',
DIALECT: 2
}
);
// Return cached response if similarity exceeds threshold
if (results.total > 0) {
const topResult = results.documents[0];
if (parseFloat(topResult.value.score) >= threshold) {
return topResult.value.response;
}
}
return null;
}
async set(
query: string,
response: string,
ttl: number = 3600
): Promise<void> {
// Generate embedding
const embedding = await this.embeddings.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
// Store in Redis with vector
const key = `cache:${Date.now()}:${Math.random()}`;
await this.redis.hSet(key, {
query,
response,
embedding: Buffer.from(
new Float32Array(embedding.data[0].embedding).buffer
),
timestamp: Date.now()
});
await this.redis.expire(key, ttl);
}
}
Multi-Level Cache Strategy#
Implement cascading cache levels for optimal hit rates:
export class MultiLevelCache {
constructor(
private l1: Map<string, any>, // In-memory cache
private l2: RedisClient, // Redis cache
private l3: SemanticCache // Semantic cache
) {}
async get(key: string, query: string): Promise<any> {
// L1: Check in-memory cache (fastest)
if (this.l1.has(key)) {
return this.l1.get(key);
}
// L2: Check Redis cache (fast)
const l2Result = await this.l2.get(key);
if (l2Result) {
this.l1.set(key, JSON.parse(l2Result));
return JSON.parse(l2Result);
}
// L3: Check semantic cache (slower but more flexible)
const l3Result = await this.l3.get(query);
if (l3Result) {
this.l1.set(key, l3Result);
await this.l2.setex(key, 3600, JSON.stringify(l3Result));
return l3Result;
}
return null;
}
async set(key: string, query: string, value: any): Promise<void> {
// Store in all cache levels
this.l1.set(key, value);
await this.l2.setex(key, 3600, JSON.stringify(value));
await this.l3.set(query, JSON.stringify(value));
}
}
Production-Grade Error Handling#
Robust error handling ensures reliability in production environments.
Circuit Breaker Pattern#
Prevent cascading failures with circuit breakers:
export class CircuitBreaker {
private failures: number = 0;
private lastFailureTime: number = 0;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 5,
private timeout: number = 60000,
private resetTimeout: number = 30000
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailureTime > this.resetTimeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit breaker is open');
}
}
try {
const result = await fn();
if (this.state === 'half-open') {
this.reset();
}
return result;
} catch (error) {
this.recordFailure();
throw error;
}
}
private recordFailure(): void {
this.failures++;
this.lastFailureTime = Date.now();
if (this.failures >= this.threshold) {
this.state = 'open';
}
}
private reset(): void {
this.failures = 0;
this.state = 'closed';
}
}
Graceful Degradation#
Implement fallback strategies for service failures:
export class ResilientOpenClaw {
constructor(
private primary: CrazyrouterClient,
private fallback: LocalModelClient,
private cache: MultiLevelCache
) {}
async complete(request: CompletionRequest): Promise<CompletionResponse> {
// Try cache first
const cacheKey = this.getCacheKey(request);
const cached = await this.cache.get(cacheKey, request.messages[0].content);
if (cached) {
return cached;
}
// Try primary service with circuit breaker
try {
const response = await this.circuitBreaker.execute(() =>
this.primary.complete(request)
);
await this.cache.set(cacheKey, request.messages[0].content, response);
return response;
} catch (error) {
console.warn('Primary service failed, trying fallback', error);
}
// Try fallback service
try {
const response = await this.fallback.complete(request);
await this.cache.set(cacheKey, request.messages[0].content, response);
return response;
} catch (error) {
console.error('Fallback service failed', error);
}
// Return degraded response
return {
id: 'degraded',
object: 'chat.completion',
created: Date.now(),
model: request.model,
choices: [{
index: 0,
message: {
role: 'assistant',
content: 'I apologize, but I\'m experiencing technical difficulties. Please try again in a moment.'
},
finish_reason: 'stop'
}],
usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }
};
}
private getCacheKey(request: CompletionRequest): string {
return `completion:${request.model}:${JSON.stringify(request.messages)}`;
}
}
Advanced Routing and Load Balancing#
Sophisticated routing strategies optimize cost, latency, and reliability.
Weighted Round-Robin with Health Checks#
Distribute load across multiple providers based on health and performance:
export class SmartRouter {
private providers: Map<string, ProviderConfig> = new Map();
private healthScores: Map<string, number> = new Map();
constructor(providers: ProviderConfig[]) {
providers.forEach(p => {
this.providers.set(p.name, p);
this.healthScores.set(p.name, 1.0);
});
// Start health check loop
this.startHealthChecks();
}
async route(request: CompletionRequest): Promise<string> {
// Calculate effective weights based on health scores
const weights = Array.from(this.providers.entries()).map(([name, config]) => ({
name,
weight: config.weight * (this.healthScores.get(name) || 0)
}));
// Filter out unhealthy providers
const healthy = weights.filter(w => w.weight > 0);
if (healthy.length === 0) {
throw new Error('No healthy providers available');
}
// Weighted random selection
const totalWeight = healthy.reduce((sum, w) => sum + w.weight, 0);
let random = Math.random() * totalWeight;
for (const { name, weight } of healthy) {
random -= weight;
if (random <= 0) {
return name;
}
}
return healthy[0].name;
}
private async startHealthChecks(): Promise<void> {
setInterval(async () => {
for (const [name, config] of this.providers) {
try {
const start = Date.now();
await this.checkHealth(config);
const latency = Date.now() - start;
// Update health score based on latency
const score = Math.max(0, 1 - latency / 5000);
this.healthScores.set(name, score);
} catch (error) {
this.healthScores.set(name, 0);
}
}
}, 30000); // Check every 30 seconds
}
private async checkHealth(config: ProviderConfig): Promise<void> {
const response = await fetch(`${config.baseURL}/health`);
if (!response.ok) {
throw new Error('Health check failed');
}
}
}
Geographic Routing#
Route requests to the nearest provider for minimal latency:
export class GeoRouter {
private regions: Map<string, RegionConfig> = new Map([
['us-east', { endpoint: 'https://us-east.crazyrouter.com', latency: 0 }],
['us-west', { endpoint: 'https://us-west.crazyrouter.com', latency: 0 }],
['eu-west', { endpoint: 'https://eu-west.crazyrouter.com', latency: 0 }],
['ap-south', { endpoint: 'https://ap-south.crazyrouter.com', latency: 0 }]
]);
async route(clientIP: string): Promise<string> {
// Get client location
const location = await this.getLocation(clientIP);
// Find nearest region
const nearest = this.findNearestRegion(location);
return this.regions.get(nearest)!.endpoint;
}
private async getLocation(ip: string): Promise<{ lat: number; lon: number }> {
// Use IP geolocation service
const response = await fetch(`https://ipapi.co/${ip}/json/`);
const data = await response.json();
return { lat: data.latitude, lon: data.longitude };
}
private findNearestRegion(location: { lat: number; lon: number }): string {
const regionLocations = {
'us-east': { lat: 37.7749, lon: -122.4194 },
'us-west': { lat: 47.6062, lon: -122.3321 },
'eu-west': { lat: 51.5074, lon: -0.1278 },
'ap-south': { lat: 1.3521, lon: 103.8198 }
};
let nearest = 'us-east';
let minDistance = Infinity;
for (const [region, coords] of Object.entries(regionLocations)) {
const distance = this.haversineDistance(location, coords);
if (distance < minDistance) {
minDistance = distance;
nearest = region;
}
}
return nearest;
}
private haversineDistance(
a: { lat: number; lon: number },
b: { lat: number; lon: number }
): number {
const R = 6371; // Earth radius in km
const dLat = (b.lat - a.lat) * Math.PI / 180;
const dLon = (b.lon - a.lon) * Math.PI / 180;
const x = Math.sin(dLat / 2) * Math.sin(dLat / 2) +
Math.cos(a.lat * Math.PI / 180) * Math.cos(b.lat * Math.PI / 180) *
Math.sin(dLon / 2) * Math.sin(dLon / 2);
const c = 2 * Math.atan2(Math.sqrt(x), Math.sqrt(1 - x));
return R * c;
}
}
Advanced Monitoring and Observability#
Comprehensive monitoring enables proactive issue detection and resolution.
Distributed Tracing#
Implement OpenTelemetry for end-to-end request tracing:
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
const provider = new NodeTracerProvider();
provider.addSpanProcessor(
new BatchSpanProcessor(
new JaegerExporter({
endpoint: 'http://jaeger:14268/api/traces'
})
)
);
provider.register();
const tracer = trace.getTracer('openclaw');
export class TracedOpenClaw {
async complete(request: CompletionRequest): Promise<CompletionResponse> {
const span = tracer.startSpan('openclaw.complete');
try {
span.setAttributes({
'request.model': request.model,
'request.messages.count': request.messages.length,
'request.max_tokens': request.max_tokens
});
// Cache lookup
const cacheSpan = tracer.startSpan('cache.lookup', {}, context.active());
const cached = await this.cache.get(request);
cacheSpan.end();
if (cached) {
span.setAttribute('cache.hit', true);
span.end();
return cached;
}
// API call
const apiSpan = tracer.startSpan('api.call', {}, context.active());
const response = await this.client.complete(request);
apiSpan.setAttributes({
'response.tokens.prompt': response.usage.prompt_tokens,
'response.tokens.completion': response.usage.completion_tokens
});
apiSpan.end();
span.setStatus({ code: SpanStatusCode.OK });
span.end();
return response;
} catch (error) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message
});
span.end();
throw error;
}
}
}
Custom Metrics Dashboard#
Build a real-time metrics dashboard:
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
export class MetricsCollector {
private registry: Registry;
private requestCounter: Counter;
private latencyHistogram: Histogram;
private activeRequests: Gauge;
private costCounter: Counter;
constructor() {
this.registry = new Registry();
this.requestCounter = new Counter({
name: 'openclaw_requests_total',
help: 'Total number of requests',
labelNames: ['model', 'status', 'cache_hit'],
registers: [this.registry]
});
this.latencyHistogram = new Histogram({
name: 'openclaw_request_duration_seconds',
help: 'Request duration in seconds',
labelNames: ['model'],
buckets: [0.1, 0.5, 1, 2, 5, 10],
registers: [this.registry]
});
this.activeRequests = new Gauge({
name: 'openclaw_active_requests',
help: 'Number of active requests',
labelNames: ['model'],
registers: [this.registry]
});
this.costCounter = new Counter({
name: 'openclaw_cost_usd_total',
help: 'Total cost in USD',
labelNames: ['model', 'provider'],
registers: [this.registry]
});
}
recordRequest(
model: string,
status: string,
cacheHit: boolean,
duration: number,
cost: number,
provider: string
): void {
this.requestCounter.inc({
model,
status,
cache_hit: cacheHit.toString()
});
this.latencyHistogram.observe({ model }, duration);
this.costCounter.inc({ model, provider }, cost);
}
async getMetrics(): Promise<string> {
return this.registry.metrics();
}
}
Advanced Cost Optimization#
Sophisticated cost optimization strategies can reduce expenses by 80%+ while maintaining quality.
Dynamic Model Selection#
Automatically select the most cost-effective model for each request:
export class CostOptimizer {
private modelPricing = {
'gpt-4': { prompt: 0.03, completion: 0.06, quality: 0.95 },
'gpt-3.5-turbo': { prompt: 0.0015, completion: 0.002, quality: 0.85 },
'claude-3-opus-20240229': { prompt: 0.015, completion: 0.075, quality: 0.93 },
'claude-3-sonnet-20240229': { prompt: 0.003, completion: 0.015, quality: 0.88 },
'claude-3-haiku-20240307': { prompt: 0.00025, completion: 0.00125, quality: 0.75 }
};
async selectModel(
request: CompletionRequest,
budget: number,
minQuality: number = 0.8
): Promise<string> {
// Estimate token usage
const estimatedTokens = this.estimateTokens(request);
// Calculate cost for each model
const options = Object.entries(this.modelPricing)
.map(([model, pricing]) => {
const cost = (
estimatedTokens.prompt * pricing.prompt +
estimatedTokens.completion * pricing.completion
) / 1000;
return { model, cost, quality: pricing.quality };
})
.filter(o => o.quality >= minQuality && o.cost <= budget)
.sort((a, b) => {
// Optimize for quality/cost ratio
const ratioA = a.quality / a.cost;
const ratioB = b.quality / b.cost;
return ratioB - ratioA;
});
if (options.length === 0) {
throw new Error('No models available within budget and quality constraints');
}
return options[0].model;
}
private estimateTokens(request: CompletionRequest) {
const promptTokens = request.messages.reduce(
(sum, msg) => sum + Math.ceil(msg.content.length / 4),
0
);
const completionTokens = request.max_tokens || 1000;
return { prompt: promptTokens, completion: completionTokens };
}
}
Request Batching and Deduplication#
Reduce costs by batching similar requests:
export class RequestBatcher {
private pending: Map<string, Promise<any>> = new Map();
private queue: Array<{ request: any; resolve: Function; reject: Function }> = [];
private batchTimer: NodeJS.Timeout | null = null;
async execute(request: CompletionRequest): Promise<CompletionResponse> {
// Check for duplicate in-flight requests
const key = this.getRequestKey(request);
if (this.pending.has(key)) {
return this.pending.get(key)!;
}
// Create promise for this request
const promise = new Promise<CompletionResponse>((resolve, reject) => {
this.queue.push({ request, resolve, reject });
// Schedule batch processing
if (!this.batchTimer) {
this.batchTimer = setTimeout(() => this.processBatch(), 50);
}
});
this.pending.set(key, promise);
return promise;
}
private async processBatch(): Promise<void> {
this.batchTimer = null;
const batch = this.queue.splice(0);
if (batch.length === 0) return;
// Group similar requests
const groups = this.groupSimilarRequests(batch);
// Process each group
for (const group of groups) {
try {
const response = await this.processGroup(group);
group.forEach(({ resolve }) => resolve(response));
} catch (error) {
group.forEach(({ reject }) => reject(error));
}
}
// Clear pending
batch.forEach(({ request }) => {
this.pending.delete(this.getRequestKey(request));
});
}
private groupSimilarRequests(
batch: Array<{ request: any; resolve: Function; reject: Function }>
): Array<Array<{ request: any; resolve: Function; reject: Function }>> {
const groups: Map<string, typeof batch> = new Map();
for (const item of batch) {
const key = this.getGroupKey(item.request);
if (!groups.has(key)) {
groups.set(key, []);
}
groups.get(key)!.push(item);
}
return Array.from(groups.values());
}
private getRequestKey(request: CompletionRequest): string {
return `${request.model}:${JSON.stringify(request.messages)}`;
}
private getGroupKey(request: CompletionRequest): string {
return `${request.model}:${request.temperature}`;
}
private async processGroup(
group: Array<{ request: any; resolve: Function; reject: Function }>
): Promise<CompletionResponse> {
// Use first request as representative
return this.client.complete(group[0].request);
}
}
Conclusion#
These advanced OpenClaw techniques enable production-grade deployments that are performant, reliable, and cost-effective. By implementing hierarchical memory, semantic caching, circuit breakers, intelligent routing, and comprehensive monitoring, you can build AI applications that scale to millions of users.
Key takeaways:
- Implement multi-tier memory architecture for optimal context management
- Use semantic caching to achieve 70-90% cost reduction
- Deploy circuit breakers and graceful degradation for reliability
- Leverage geographic routing and weighted load balancing
- Monitor with distributed tracing and custom metrics
- Optimize costs with dynamic model selection and request batching
With these advanced techniques, you're ready to deploy OpenClaw at enterprise scale with Crazyrouter!
Congratulations on completing the OpenClaw Mastery series! Start with OpenClaw Tutorial: Getting Started to begin your journey, or visit Crazyrouter for unified API access to 300+ AI models.


