
How to Build AI-Powered Applications: A Developer's Guide
Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant, or creating intelligent automation, this guide covers everything from architecture decisions to production deployment.
Understanding AI Application Architecture#
Basic Architecture Pattern#
Most AI applications follow this pattern:
User Input → Your Application → AI API → Response Processing → User Output
Key components:
- Frontend: User interface for input/output
- Backend: Business logic and API orchestration
- AI Layer: Model selection and prompt management
- Data Layer: Context storage and caching
Choosing Your AI Integration Approach#
| Approach | Complexity | Flexibility | Cost |
|---|---|---|---|
| Direct API calls | Low | Medium | Variable |
| SDK/Library | Low | High | Variable |
| API Gateway | Medium | Very High | Lower |
| Self-hosted models | High | Maximum | Fixed |
Getting Started: Your First AI Feature#
Step 1: Choose Your Model#
For most applications, start with:
| Use Case | Recommended Model | Why |
|---|---|---|
| Chatbot | GPT-4o Mini or Claude Haiku | Fast, cheap, good enough |
| Content generation | Claude Sonnet or GPT-4o | Better quality |
| Code assistance | Claude Sonnet or GPT-4 | Strong reasoning |
| Document analysis | Claude (200K context) | Long context window |
Step 2: Set Up API Access#
Option A: Direct Provider Access
# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# Anthropic
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
Option B: API Gateway (Recommended)
Using a gateway like Crazyrouter simplifies multi-model access:
from openai import OpenAI
# Single endpoint for all models
client = OpenAI(
api_key="your-gateway-key",
base_url="https://api.crazyrouter.com/v1"
)
# Use any model with the same code
response = client.chat.completions.create(
model="claude-3-5-sonnet", # or "gpt-4o", "gemini-pro", etc.
messages=[{"role": "user", "content": "Hello!"}]
)
Step 3: Basic Implementation#
Here's a minimal chatbot implementation:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.crazyrouter.com/v1"
)
def chat(user_message: str, conversation_history: list) -> str:
conversation_history.append({
"role": "user",
"content": user_message
})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation_history,
max_tokens=1000
)
assistant_message = response.choices[0].message.content
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
Building Production-Ready AI Features#
Prompt Engineering Best Practices#
1. Use System Prompts
messages = [
{
"role": "system",
"content": """You are a helpful customer support agent for TechCorp.
- Be friendly and professional
- If you don't know something, say so
- Never make up information about products
- For billing issues, direct users to billing@techcorp.com"""
},
{"role": "user", "content": user_input}
]
2. Structure Your Prompts
prompt = f"""
Task: Summarize the following article
Format: 3 bullet points, max 20 words each
Tone: Professional
Article:
{article_text}
Summary:
"""
3. Use Few-Shot Examples
messages = [
{"role": "system", "content": "You classify customer feedback as positive, negative, or neutral."},
{"role": "user", "content": "Great product, love it!"},
{"role": "assistant", "content": "positive"},
{"role": "user", "content": "Worst purchase ever."},
{"role": "assistant", "content": "negative"},
{"role": "user", "content": actual_feedback}
]
Error Handling and Resilience#
Handle API Errors Gracefully
import time
from openai import OpenAI, APIError, RateLimitError
def call_ai_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
return response.choices[0].message.content
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
continue
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(1)
continue
raise Exception("Max retries exceeded")
Implement Fallback Models
MODELS = ["gpt-4o", "claude-3-5-sonnet", "gpt-4o-mini"]
def call_with_fallback(messages):
for model in MODELS:
try:
response = client.chat.completions.create(
model=model,
messages=messages
)
return response.choices[0].message.content
except Exception as e:
print(f"Model {model} failed: {e}")
continue
raise Exception("All models failed")
Streaming Responses#
For better UX, stream responses instead of waiting:
def stream_response(messages):
stream = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
Frontend Integration (JavaScript)
async function streamChat(message) {
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ message }),
headers: { 'Content-Type': 'application/json' }
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
appendToChat(text);
}
}
Advanced Patterns#
RAG (Retrieval-Augmented Generation)#
Combine AI with your own data:
from openai import OpenAI
def answer_with_context(question: str, documents: list) -> str:
# 1. Find relevant documents (simplified)
relevant_docs = search_documents(question, documents)
# 2. Build context
context = "\n\n".join([doc.content for doc in relevant_docs[:3]])
# 3. Generate answer
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"""Answer questions based on the provided context.
If the answer isn't in the context, say "I don't have that information."
Context:
{context}"""
},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
Function Calling / Tool Use#
Let AI interact with your systems:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools
)
# Handle tool calls
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
if tool_call.function.name == "get_weather":
args = json.loads(tool_call.function.arguments)
weather = get_weather(args["location"])
# Continue conversation with result...
Multi-Model Routing#
Use different models for different tasks:
def route_to_model(task_type: str, content: str) -> str:
model_map = {
"simple_qa": "gpt-4o-mini",
"complex_reasoning": "gpt-4o",
"long_document": "claude-3-5-sonnet",
"code_generation": "claude-3-5-sonnet",
"creative_writing": "gpt-4o"
}
model = model_map.get(task_type, "gpt-4o-mini")
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": content}]
)
return response.choices[0].message.content
Cost Optimization Strategies#
1. Implement Caching#
import hashlib
import redis
cache = redis.Redis()
def cached_completion(messages, ttl=3600):
# Create cache key from messages
key = hashlib.md5(str(messages).encode()).hexdigest()
# Check cache
cached = cache.get(key)
if cached:
return cached.decode()
# Call API
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
result = response.choices[0].message.content
# Store in cache
cache.setex(key, ttl, result)
return result
2. Token Counting#
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o") -> int:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
def estimate_cost(input_text: str, output_tokens: int = 500):
input_tokens = count_tokens(input_text)
# GPT-4o pricing
input_cost = (input_tokens / 1_000_000) * 2.50
output_cost = (output_tokens / 1_000_000) * 10.00
return input_cost + output_cost
3. Use Appropriate Models#
| Task Complexity | Model | Cost/1M tokens |
|---|---|---|
| Simple | GPT-4o Mini | $0.15 input |
| Medium | GPT-4o | $2.50 input |
| Complex | GPT-4 / Claude Opus | $10-15 input |
4. Batch Processing#
For non-real-time tasks:
async def batch_process(items: list, batch_size: int = 10):
results = []
for i in range(0, len(items), batch_size):
batch = items[i:i + batch_size]
# Process batch concurrently
tasks = [process_item(item) for item in batch]
batch_results = await asyncio.gather(*tasks)
results.extend(batch_results)
# Respect rate limits
await asyncio.sleep(1)
return results
Security Best Practices#
1. Never Expose API Keys#
# Bad - key in code
client = OpenAI(api_key="sk-abc123...")
# Good - environment variable
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
2. Validate and Sanitize Input#
def sanitize_user_input(text: str) -> str:
# Remove potential prompt injection attempts
dangerous_patterns = [
"ignore previous instructions",
"disregard above",
"new instructions:"
]
text_lower = text.lower()
for pattern in dangerous_patterns:
if pattern in text_lower:
raise ValueError("Invalid input detected")
# Limit length
return text[:10000]
3. Implement Rate Limiting#
from functools import wraps
import time
def rate_limit(calls_per_minute: int):
min_interval = 60.0 / calls_per_minute
last_called = [0.0]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_called[0]
wait_time = min_interval - elapsed
if wait_time > 0:
time.sleep(wait_time)
last_called[0] = time.time()
return func(*args, **kwargs)
return wrapper
return decorator
@rate_limit(calls_per_minute=60)
def call_ai(message):
# Your AI call here
pass
Monitoring and Observability#
Track Key Metrics#
import time
from dataclasses import dataclass
@dataclass
class AICallMetrics:
model: str
input_tokens: int
output_tokens: int
latency_ms: float
success: bool
cost: float
def track_ai_call(func):
def wrapper(*args, **kwargs):
start = time.time()
try:
result = func(*args, **kwargs)
latency = (time.time() - start) * 1000
# Log metrics
log_metrics(AICallMetrics(
model=kwargs.get('model', 'unknown'),
input_tokens=result.usage.prompt_tokens,
output_tokens=result.usage.completion_tokens,
latency_ms=latency,
success=True,
cost=calculate_cost(result.usage)
))
return result
except Exception as e:
log_error(e)
raise
return wrapper
Deployment Checklist#
Before going to production:
- API keys stored securely (environment variables, secrets manager)
- Rate limiting implemented
- Error handling and retries in place
- Fallback models configured
- Input validation and sanitization
- Cost monitoring and alerts set up
- Response caching where appropriate
- Logging and observability configured
- Load testing completed
Conclusion#
Building AI-powered applications is straightforward with the right approach:
- Start simple - Basic API calls, then add complexity
- Use an API gateway - Simplifies multi-model access and reduces costs
- Implement resilience - Retries, fallbacks, and error handling
- Optimize costs - Caching, model routing, token management
- Monitor everything - Track usage, costs, and performance
The AI landscape evolves quickly. Using an API gateway gives you flexibility to adopt new models without code changes.
Need reliable API access for your AI application? Crazyrouter provides a unified endpoint for 300+ models with built-in failover and competitive pricing. Start building today.


