Login
Back to Blog
How to Build AI-Powered Applications: A Developer's Guide

How to Build AI-Powered Applications: A Developer's Guide

C
Crazyrouter Team
January 26, 2026
13 viewsEnglishTutorial
Share:

Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant, or creating intelligent automation, this guide covers everything from architecture decisions to production deployment.

Understanding AI Application Architecture#

Basic Architecture Pattern#

Most AI applications follow this pattern:

code
User Input → Your Application → AI API → Response Processing → User Output

Key components:

  1. Frontend: User interface for input/output
  2. Backend: Business logic and API orchestration
  3. AI Layer: Model selection and prompt management
  4. Data Layer: Context storage and caching

Choosing Your AI Integration Approach#

ApproachComplexityFlexibilityCost
Direct API callsLowMediumVariable
SDK/LibraryLowHighVariable
API GatewayMediumVery HighLower
Self-hosted modelsHighMaximumFixed

Getting Started: Your First AI Feature#

Step 1: Choose Your Model#

For most applications, start with:

Use CaseRecommended ModelWhy
ChatbotGPT-4o Mini or Claude HaikuFast, cheap, good enough
Content generationClaude Sonnet or GPT-4oBetter quality
Code assistanceClaude Sonnet or GPT-4Strong reasoning
Document analysisClaude (200K context)Long context window

Step 2: Set Up API Access#

Option A: Direct Provider Access

python
# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# Anthropic
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

Option B: API Gateway (Recommended)

Using a gateway like Crazyrouter simplifies multi-model access:

python
from openai import OpenAI

# Single endpoint for all models
client = OpenAI(
    api_key="your-gateway-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Use any model with the same code
response = client.chat.completions.create(
    model="claude-3-5-sonnet",  # or "gpt-4o", "gemini-pro", etc.
    messages=[{"role": "user", "content": "Hello!"}]
)

Step 3: Basic Implementation#

Here's a minimal chatbot implementation:

python
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.crazyrouter.com/v1"
)

def chat(user_message: str, conversation_history: list) -> str:
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_history,
        max_tokens=1000
    )

    assistant_message = response.choices[0].message.content
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })

    return assistant_message

Building Production-Ready AI Features#

Prompt Engineering Best Practices#

1. Use System Prompts

python
messages = [
    {
        "role": "system",
        "content": """You are a helpful customer support agent for TechCorp.
        - Be friendly and professional
        - If you don't know something, say so
        - Never make up information about products
        - For billing issues, direct users to billing@techcorp.com"""
    },
    {"role": "user", "content": user_input}
]

2. Structure Your Prompts

python
prompt = f"""
Task: Summarize the following article
Format: 3 bullet points, max 20 words each
Tone: Professional

Article:
{article_text}

Summary:
"""

3. Use Few-Shot Examples

python
messages = [
    {"role": "system", "content": "You classify customer feedback as positive, negative, or neutral."},
    {"role": "user", "content": "Great product, love it!"},
    {"role": "assistant", "content": "positive"},
    {"role": "user", "content": "Worst purchase ever."},
    {"role": "assistant", "content": "negative"},
    {"role": "user", "content": actual_feedback}
]

Error Handling and Resilience#

Handle API Errors Gracefully

python
import time
from openai import OpenAI, APIError, RateLimitError

def call_ai_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
            continue

        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
            continue

    raise Exception("Max retries exceeded")

Implement Fallback Models

python
MODELS = ["gpt-4o", "claude-3-5-sonnet", "gpt-4o-mini"]

def call_with_fallback(messages):
    for model in MODELS:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Model {model} failed: {e}")
            continue

    raise Exception("All models failed")

Streaming Responses#

For better UX, stream responses instead of waiting:

python
def stream_response(messages):
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

Frontend Integration (JavaScript)

javascript
async function streamChat(message) {
    const response = await fetch('/api/chat', {
        method: 'POST',
        body: JSON.stringify({ message }),
        headers: { 'Content-Type': 'application/json' }
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const text = decoder.decode(value);
        appendToChat(text);
    }
}

Advanced Patterns#

RAG (Retrieval-Augmented Generation)#

Combine AI with your own data:

python
from openai import OpenAI

def answer_with_context(question: str, documents: list) -> str:
    # 1. Find relevant documents (simplified)
    relevant_docs = search_documents(question, documents)

    # 2. Build context
    context = "\n\n".join([doc.content for doc in relevant_docs[:3]])

    # 3. Generate answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""Answer questions based on the provided context.
                If the answer isn't in the context, say "I don't have that information."

                Context:
                {context}"""
            },
            {"role": "user", "content": question}
        ]
    )

    return response.choices[0].message.content

Function Calling / Tool Use#

Let AI interact with your systems:

python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    if tool_call.function.name == "get_weather":
        args = json.loads(tool_call.function.arguments)
        weather = get_weather(args["location"])
        # Continue conversation with result...

Multi-Model Routing#

Use different models for different tasks:

python
def route_to_model(task_type: str, content: str) -> str:
    model_map = {
        "simple_qa": "gpt-4o-mini",
        "complex_reasoning": "gpt-4o",
        "long_document": "claude-3-5-sonnet",
        "code_generation": "claude-3-5-sonnet",
        "creative_writing": "gpt-4o"
    }

    model = model_map.get(task_type, "gpt-4o-mini")

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": content}]
    )

    return response.choices[0].message.content

Cost Optimization Strategies#

1. Implement Caching#

python
import hashlib
import redis

cache = redis.Redis()

def cached_completion(messages, ttl=3600):
    # Create cache key from messages
    key = hashlib.md5(str(messages).encode()).hexdigest()

    # Check cache
    cached = cache.get(key)
    if cached:
        return cached.decode()

    # Call API
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    result = response.choices[0].message.content

    # Store in cache
    cache.setex(key, ttl, result)

    return result

2. Token Counting#

python
import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def estimate_cost(input_text: str, output_tokens: int = 500):
    input_tokens = count_tokens(input_text)

    # GPT-4o pricing
    input_cost = (input_tokens / 1_000_000) * 2.50
    output_cost = (output_tokens / 1_000_000) * 10.00

    return input_cost + output_cost

3. Use Appropriate Models#

Task ComplexityModelCost/1M tokens
SimpleGPT-4o Mini$0.15 input
MediumGPT-4o$2.50 input
ComplexGPT-4 / Claude Opus$10-15 input

4. Batch Processing#

For non-real-time tasks:

python
async def batch_process(items: list, batch_size: int = 10):
    results = []

    for i in range(0, len(items), batch_size):
        batch = items[i:i + batch_size]

        # Process batch concurrently
        tasks = [process_item(item) for item in batch]
        batch_results = await asyncio.gather(*tasks)
        results.extend(batch_results)

        # Respect rate limits
        await asyncio.sleep(1)

    return results

Security Best Practices#

1. Never Expose API Keys#

python
# Bad - key in code
client = OpenAI(api_key="sk-abc123...")

# Good - environment variable
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

2. Validate and Sanitize Input#

python
def sanitize_user_input(text: str) -> str:
    # Remove potential prompt injection attempts
    dangerous_patterns = [
        "ignore previous instructions",
        "disregard above",
        "new instructions:"
    ]

    text_lower = text.lower()
    for pattern in dangerous_patterns:
        if pattern in text_lower:
            raise ValueError("Invalid input detected")

    # Limit length
    return text[:10000]

3. Implement Rate Limiting#

python
from functools import wraps
import time

def rate_limit(calls_per_minute: int):
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            wait_time = min_interval - elapsed

            if wait_time > 0:
                time.sleep(wait_time)

            last_called[0] = time.time()
            return func(*args, **kwargs)

        return wrapper
    return decorator

@rate_limit(calls_per_minute=60)
def call_ai(message):
    # Your AI call here
    pass

Monitoring and Observability#

Track Key Metrics#

python
import time
from dataclasses import dataclass

@dataclass
class AICallMetrics:
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    success: bool
    cost: float

def track_ai_call(func):
    def wrapper(*args, **kwargs):
        start = time.time()

        try:
            result = func(*args, **kwargs)
            latency = (time.time() - start) * 1000

            # Log metrics
            log_metrics(AICallMetrics(
                model=kwargs.get('model', 'unknown'),
                input_tokens=result.usage.prompt_tokens,
                output_tokens=result.usage.completion_tokens,
                latency_ms=latency,
                success=True,
                cost=calculate_cost(result.usage)
            ))

            return result

        except Exception as e:
            log_error(e)
            raise

    return wrapper

Deployment Checklist#

Before going to production:

  • API keys stored securely (environment variables, secrets manager)
  • Rate limiting implemented
  • Error handling and retries in place
  • Fallback models configured
  • Input validation and sanitization
  • Cost monitoring and alerts set up
  • Response caching where appropriate
  • Logging and observability configured
  • Load testing completed

Conclusion#

Building AI-powered applications is straightforward with the right approach:

  1. Start simple - Basic API calls, then add complexity
  2. Use an API gateway - Simplifies multi-model access and reduces costs
  3. Implement resilience - Retries, fallbacks, and error handling
  4. Optimize costs - Caching, model routing, token management
  5. Monitor everything - Track usage, costs, and performance

The AI landscape evolves quickly. Using an API gateway gives you flexibility to adopt new models without code changes.


Need reliable API access for your AI application? Crazyrouter provides a unified endpoint for 300+ models with built-in failover and competitive pricing. Start building today.

Related Articles