EnglishTutorial

How to Build AI-Powered Applications: A Developer's Guide

Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant

Crazyrouter Team

January 26, 2026 / 239 views

How to Build AI-Powered Applications: A Developer's Guide

Crazyrouter

Read the docs Check live pricing Open image tool Create account

Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant, or creating intelligent automation, this guide covers everything from architecture decisions to production deployment.

Understanding AI Application Architecture#

Basic Architecture Pattern#

Most AI applications follow this pattern:

code

User Input → Your Application → AI API → Response Processing → User Output

Key components:

Frontend: User interface for input/output
Backend: Business logic and API orchestration
AI Layer: Model selection and prompt management
Data Layer: Context storage and caching

Choosing Your AI Integration Approach#

Approach	Complexity	Flexibility	Cost
Direct API calls	Low	Medium	Variable
SDK/Library	Low	High	Variable
API Gateway	Medium	Very High	Lower
Self-hosted models	High	Maximum	Fixed

Getting Started: Your First AI Feature#

Step 1: Choose Your Model#

For most applications, start with:

Use Case	Recommended Model	Why
Chatbot	GPT-4o Mini or Claude Haiku	Fast, cheap, good enough
Content generation	Claude Sonnet or GPT-4o	Better quality
Code assistance	Claude Sonnet or GPT-4	Strong reasoning
Document analysis	Claude (200K context)	Long context window

Step 2: Set Up API Access#

Option A: Direct Provider Access

python

# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# Anthropic
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

Option B: API Gateway (Recommended)

Using a gateway like Crazyrouter simplifies multi-model access:

python

from openai import OpenAI

# Single endpoint for all models
client = OpenAI(
    api_key="your-gateway-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Use any model with the same code
response = client.chat.completions.create(
    model="claude-3-5-sonnet",  # or "gpt-4o", "gemini-pro", etc.
    messages=[{"role": "user", "content": "Hello!"}]
)

Step 3: Basic Implementation#

Here's a minimal chatbot implementation:

python

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.crazyrouter.com/v1"
)

def chat(user_message: str, conversation_history: list) -> str:
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_history,
        max_tokens=1000
    )

    assistant_message = response.choices[0].message.content
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })

    return assistant_message

Building Production-Ready AI Features#

Prompt Engineering Best Practices#

1. Use System Prompts

python

messages = [
    {
        "role": "system",
        "content": """You are a helpful customer support agent for TechCorp.
        - Be friendly and professional
        - If you don't know something, say so
        - Never make up information about products
        - For billing issues, direct users to billing@techcorp.com"""
    },
    {"role": "user", "content": user_input}
]

2. Structure Your Prompts

python

prompt = f"""
Task: Summarize the following article
Format: 3 bullet points, max 20 words each
Tone: Professional

Article:
{article_text}

Summary:
"""

3. Use Few-Shot Examples

python

messages = [
    {"role": "system", "content": "You classify customer feedback as positive, negative, or neutral."},
    {"role": "user", "content": "Great product, love it!"},
    {"role": "assistant", "content": "positive"},
    {"role": "user", "content": "Worst purchase ever."},
    {"role": "assistant", "content": "negative"},
    {"role": "user", "content": actual_feedback}
]

Error Handling and Resilience#

Handle API Errors Gracefully

python

import time
from openai import OpenAI, APIError, RateLimitError

def call_ai_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
            continue

        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
            continue

    raise Exception("Max retries exceeded")

Implement Fallback Models

python

MODELS = ["gpt-4o", "claude-3-5-sonnet", "gpt-4o-mini"]

def call_with_fallback(messages):
    for model in MODELS:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Model {model} failed: {e}")
            continue

    raise Exception("All models failed")

Streaming Responses#

For better UX, stream responses instead of waiting:

python

def stream_response(messages):
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

Frontend Integration (JavaScript)

javascript

async function streamChat(message) {
    const response = await fetch('/api/chat', {
        method: 'POST',
        body: JSON.stringify({ message }),
        headers: { 'Content-Type': 'application/json' }
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const text = decoder.decode(value);
        appendToChat(text);
    }
}

Advanced Patterns#

RAG (Retrieval-Augmented Generation)#

Combine AI with your own data:

python

from openai import OpenAI

def answer_with_context(question: str, documents: list) -> str:
    # 1. Find relevant documents (simplified)
    relevant_docs = search_documents(question, documents)

    # 2. Build context
    context = "\n\n".join([doc.content for doc in relevant_docs[:3]])

    # 3. Generate answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""Answer questions based on the provided context.
                If the answer isn't in the context, say "I don't have that information."

                Context:
                {context}"""
            },
            {"role": "user", "content": question}
        ]
    )

    return response.choices[0].message.content

Function Calling / Tool Use#

Let AI interact with your systems:

python

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    if tool_call.function.name == "get_weather":
        args = json.loads(tool_call.function.arguments)
        weather = get_weather(args["location"])
        # Continue conversation with result...

Multi-Model Routing#

Use different models for different tasks:

python

def route_to_model(task_type: str, content: str) -> str:
    model_map = {
        "simple_qa": "gpt-4o-mini",
        "complex_reasoning": "gpt-4o",
        "long_document": "claude-3-5-sonnet",
        "code_generation": "claude-3-5-sonnet",
        "creative_writing": "gpt-4o"
    }

    model = model_map.get(task_type, "gpt-4o-mini")

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": content}]
    )

    return response.choices[0].message.content

Cost Optimization Strategies#

1. Implement Caching#

python

import hashlib
import redis

cache = redis.Redis()

def cached_completion(messages, ttl=3600):
    # Create cache key from messages
    key = hashlib.md5(str(messages).encode()).hexdigest()

    # Check cache
    cached = cache.get(key)
    if cached:
        return cached.decode()

    # Call API
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    result = response.choices[0].message.content

    # Store in cache
    cache.setex(key, ttl, result)

    return result

2. Token Counting#

python

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def estimate_cost(input_text: str, output_tokens: int = 500):
    input_tokens = count_tokens(input_text)

    # GPT-4o pricing
    input_cost = (input_tokens / 1_000_000) * 2.50
    output_cost = (output_tokens / 1_000_000) * 10.00

    return input_cost + output_cost

3. Use Appropriate Models#

Task Complexity	Model	Cost/1M tokens
Simple	GPT-4o Mini	$0.15 input
Medium	GPT-4o	$2.50 input
Complex	GPT-4 / Claude Opus	$10-15 input

4. Batch Processing#

For non-real-time tasks:

python

async def batch_process(items: list, batch_size: int = 10):
    results = []

    for i in range(0, len(items), batch_size):
        batch = items[i:i + batch_size]

        # Process batch concurrently
        tasks = [process_item(item) for item in batch]
        batch_results = await asyncio.gather(*tasks)
        results.extend(batch_results)

        # Respect rate limits
        await asyncio.sleep(1)

    return results

Security Best Practices#

1. Never Expose API Keys#

python

# Bad - key in code
client = OpenAI(api_key="sk-abc123...")

# Good - environment variable
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

2. Validate and Sanitize Input#

python

def sanitize_user_input(text: str) -> str:
    # Remove potential prompt injection attempts
    dangerous_patterns = [
        "ignore previous instructions",
        "disregard above",
        "new instructions:"
    ]

    text_lower = text.lower()
    for pattern in dangerous_patterns:
        if pattern in text_lower:
            raise ValueError("Invalid input detected")

    # Limit length
    return text[:10000]

3. Implement Rate Limiting#

python

from functools import wraps
import time

def rate_limit(calls_per_minute: int):
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            wait_time = min_interval - elapsed

            if wait_time > 0:
                time.sleep(wait_time)

            last_called[0] = time.time()
            return func(*args, **kwargs)

        return wrapper
    return decorator

@rate_limit(calls_per_minute=60)
def call_ai(message):
    # Your AI call here
    pass

Monitoring and Observability#

Track Key Metrics#

python

import time
from dataclasses import dataclass

@dataclass
class AICallMetrics:
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    success: bool
    cost: float

def track_ai_call(func):
    def wrapper(*args, **kwargs):
        start = time.time()

        try:
            result = func(*args, **kwargs)
            latency = (time.time() - start) * 1000

            # Log metrics
            log_metrics(AICallMetrics(
                model=kwargs.get('model', 'unknown'),
                input_tokens=result.usage.prompt_tokens,
                output_tokens=result.usage.completion_tokens,
                latency_ms=latency,
                success=True,
                cost=calculate_cost(result.usage)
            ))

            return result

        except Exception as e:
            log_error(e)
            raise

    return wrapper

Deployment Checklist#

Before going to production:

API keys stored securely (environment variables, secrets manager)
Rate limiting implemented
Error handling and retries in place
Fallback models configured
Input validation and sanitization
Cost monitoring and alerts set up
Response caching where appropriate
Logging and observability configured
Load testing completed

Conclusion#

Building AI-powered applications is straightforward with the right approach:

Start simple - Basic API calls, then add complexity
Use an API gateway - Simplifies multi-model access and reduces costs
Implement resilience - Retries, fallbacks, and error handling
Optimize costs - Caching, model routing, token management
Monitor everything - Track usage, costs, and performance

The AI landscape evolves quickly. Using an API gateway gives you flexibility to adopt new models without code changes.

Need reliable API access for your AI application? Crazyrouter provides a unified endpoint for 300+ models with built-in failover and competitive pricing. Start building today.