Login
Back to Blog
EnglishTutorial

Text-Embedding-3-Small Complete Guide: OpenAI's Cost-Effective Embedding Model

A practical guide to OpenAI's text-embedding-3-small model. Covers API usage, dimension reduction, performance benchmarks, and how to build semantic search with code examples.

C
Crazyrouter Team
February 23, 2026 / 1691 views
Share:
Text-Embedding-3-Small Complete Guide: OpenAI's Cost-Effective Embedding Model

If you're building semantic search, RAG pipelines, or recommendation systems, you need an embedding model. OpenAI's text-embedding-3-small hits the sweet spot between quality and cost — and it's what most production systems should default to.

Here's everything you need to know.

What Is text-embedding-3-small?#

text-embedding-3-small is OpenAI's compact embedding model that converts text into numerical vectors (embeddings). These vectors capture semantic meaning, so similar texts produce similar vectors.

Key Specs#

Featuretext-embedding-3-smalltext-embedding-3-largetext-embedding-ada-002 (legacy)
Dimensions1536 (default)3072 (default)1536 (fixed)
Adjustable Dimensions✅ (down to 256)✅ (down to 256)
MTEB Score62.3%64.6%61.0%
Max Tokens819181918191
Price (per 1M tokens)$0.02$0.13$0.10
Relative Cost1x6.5x5x

The key insight: text-embedding-3-small is 5x cheaper than the legacy ada-002 model while delivering better quality. There's almost no reason to use ada-002 anymore.

Adjustable Dimensions#

One of the best features is dimension reduction. You can request fewer dimensions to save storage and speed up similarity search:

DimensionsMTEB ScoreVector SizeUse Case
1536 (default)62.3%6.1 KBBest quality
76861.5%3.1 KBGood balance
51260.8%2.0 KBLarge-scale search
25659.2%1.0 KBMemory-constrained

At 512 dimensions, you get 97.6% of the full quality at 33% of the storage cost. That's a great tradeoff for most applications.

How to Use text-embedding-3-small#

Python (via Crazyrouter)#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Basic embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536
print(f"First 5 values: {embedding[:5]}")

# With dimension reduction
response_small = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
    dimensions=512  # Reduce to 512 dimensions
)

embedding_small = response_small.data[0].embedding
print(f"Reduced dimensions: {len(embedding_small)}")  # 512

Batch Embedding#

python
texts = [
    "How do I reset my password?",
    "I forgot my login credentials",
    "What's the weather like today?",
    "Can I change my email address?",
    "Password recovery not working"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts,
    dimensions=512
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings of {len(embeddings[0])} dimensions")

Node.js Example#

javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-crazyrouter-key',
  baseURL: 'https://api.crazyrouter.com/v1'
});

async function getEmbedding(text, dimensions = 1536) {
  const response = await client.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
    dimensions
  });
  return response.data[0].embedding;
}

// Semantic similarity
async function cosineSimilarity(a, b) {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

const emb1 = await getEmbedding('How to reset password');
const emb2 = await getEmbedding('I forgot my login');
const emb3 = await getEmbedding('Best pizza in New York');

console.log('Similar:', await cosineSimilarity(emb1, emb2));  // ~0.85
console.log('Different:', await cosineSimilarity(emb1, emb3)); // ~0.15

cURL Example#

bash
curl -X POST https://api.crazyrouter.com/v1/embeddings \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "How do I reset my password?",
    "dimensions": 512
  }'

Here's a complete example of building a semantic search system:

python
import numpy as np
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

DIMENSIONS = 512

# Step 1: Index your documents
documents = [
    "Python is a high-level programming language known for its simplicity.",
    "JavaScript runs in web browsers and powers interactive websites.",
    "Docker containers package applications with their dependencies.",
    "Kubernetes orchestrates container deployments at scale.",
    "PostgreSQL is a powerful open-source relational database.",
    "Redis is an in-memory data store used for caching.",
    "GraphQL is a query language for APIs developed by Facebook.",
    "REST APIs use HTTP methods to perform CRUD operations.",
]

def get_embeddings(texts):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
        dimensions=DIMENSIONS
    )
    return [item.embedding for item in response.data]

# Generate embeddings for all documents
doc_embeddings = np.array(get_embeddings(documents))

# Step 2: Search
def search(query, top_k=3):
    query_embedding = np.array(get_embeddings([query])[0])

    # Cosine similarity
    similarities = np.dot(doc_embeddings, query_embedding) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
    )

    top_indices = np.argsort(similarities)[::-1][:top_k]

    results = []
    for idx in top_indices:
        results.append({
            "document": documents[idx],
            "score": float(similarities[idx])
        })
    return results

# Try it
results = search("How do I cache data?")
for r in results:
    print(f"[{r['score']:.3f}] {r['document']}")
# [0.742] Redis is an in-memory data store used for caching.
# [0.385] PostgreSQL is a powerful open-source relational database.
# [0.312] Docker containers package applications with their dependencies.

text-embedding-3-small vs Alternatives#

Embedding Model Comparison#

ModelProviderMTEB ScorePrice (1M tokens)DimensionsOpen Source
text-embedding-3-smallOpenAI62.3%$0.02256-1536
text-embedding-3-largeOpenAI64.6%$0.13256-3072
Gemini EmbeddingGoogle63.8%$0.01768
Voyage-3Voyage AI67.1%$0.061024
BGE-M3BAAI63.5%Free (self-host)1024
Nomic Embed v2Nomic62.8%Free (self-host)768

When to Choose text-embedding-3-small#

  • Best for: Production systems where cost matters and quality is "good enough"
  • Not ideal for: Research benchmarks or when you need absolute best retrieval quality

For most RAG and search applications, the difference between 62% and 67% MTEB score is negligible in practice. The 3x-6x cost difference is not.

Pricing#

Cost Comparison via Crazyrouter#

ModelOfficial PriceCrazyrouter PriceSavings
text-embedding-3-small$0.02/1M tokens$0.015/1M tokens25%
text-embedding-3-large$0.13/1M tokens$0.10/1M tokens23%

Real-World Cost Estimates#

Use CaseDocumentsTokensMonthly Cost (Crazyrouter)
Small FAQ bot1,000 docs~500K< $0.01
Medium knowledge base50,000 docs~25M~$0.38
Large search engine1M docs~500M~$7.50
Enterprise RAG10M docs~5B~$75

Embedding is a one-time cost per document. You only re-embed when content changes. Through Crazyrouter, you can access text-embedding-3-small alongside 300+ other models with a single API key.

Best Practices#

  1. Use 512 dimensions for most applications — best quality/storage tradeoff
  2. Batch your requests — send up to 2048 texts per API call
  3. Cache embeddings — store in a vector database, don't re-compute
  4. Chunk long documents — split into 200-500 token chunks for better retrieval
  5. Normalize vectors — OpenAI embeddings are already normalized, but verify after dimension reduction
  6. Use metadata filtering — combine vector search with traditional filters for better results

FAQ#

What's the difference between text-embedding-3-small and text-embedding-3-large?#

The large model has higher quality (64.6% vs 62.3% MTEB) but costs 6.5x more. For most production use cases, the small model is sufficient. Use the large model only when retrieval quality is critical and cost isn't a concern.

Can I reduce dimensions after generating embeddings?#

Yes. You can truncate the embedding vector to fewer dimensions and re-normalize. However, it's better to request the reduced dimensions directly via the dimensions parameter, as the model optimizes for the target dimensionality.

How many tokens does a typical document use?#

Roughly 1 token per 4 characters in English, or 1 token per 1-2 characters in Chinese. A 500-word English paragraph is about 650 tokens.

Which vector database should I use?#

Popular choices: Pinecone (managed), Weaviate (open source), Qdrant (open source), pgvector (PostgreSQL extension). For getting started, pgvector is the simplest if you already use PostgreSQL.

Is text-embedding-3-small good for multilingual content?#

Yes. It handles English, Chinese, Japanese, Korean, and European languages well. For specialized multilingual needs, consider BGE-M3 or Cohere's multilingual model.

Summary#

text-embedding-3-small is the default choice for production embedding workloads. It's cheap, fast, supports dimension reduction, and delivers quality that's more than adequate for search, RAG, and classification tasks.

Access it through Crazyrouter for the best pricing and the convenience of a unified API that also gives you access to GPT-5, Claude, Gemini, and 300+ other models.

Implementation Guides

Topics

Related Posts

Gemini 2.5 Flash Image Generation Guide: Create AI Images with Google's ModelTutorial

Gemini 2.5 Flash Image Generation Guide: Create AI Images with Google's Model

Learn how to generate images with Gemini 2.5 Flash, Google's multimodal AI model. Includes API tutorial, code examples, and comparison with DALL-E and Midjourney.

Feb 22
Llama 4 API Guide 2026: Complete Developer TutorialTutorial

Llama 4 API Guide 2026: Complete Developer Tutorial

"Complete guide to Meta's Llama 4 models in 2026. Learn about Llama 4 Scout, Maverick, and Behemoth with API integration, pricing, and code examples."

Mar 1
AI Structured Output Guide 2026: JSON Mode Across OpenAI, Claude, and GeminiTutorial

AI Structured Output Guide 2026: JSON Mode Across OpenAI, Claude, and Gemini

Complete developer guide to structured outputs and JSON mode across OpenAI, Claude, and Gemini APIs — with code examples, schema design tips, and a comparison of reliability across providers.

Apr 8
Build a World Cup Odds Movement Monitor with Claude Code and claude-fable-5Tutorial

Build a World Cup Odds Movement Monitor with Claude Code and claude-fable-5

A second Claude Code project in the World Cup analytics series: build an odds movement monitor, compute implied probability shifts, and use claude-fable-5 through Crazyrouter to generate validated JSON analysis without betting advice.

Jun 13
Seedream 4.0 API Tutorial: ByteDance's Image Generation Model for DevelopersTutorial

Seedream 4.0 API Tutorial: ByteDance's Image Generation Model for Developers

"Step-by-step tutorial for using Seedream 4.0 API — ByteDance's advanced image generation model. Includes setup, code examples, pricing, and comparison with DALL-E 3 and Midjourney."

Feb 19
text-embedding-3-small Dimensions Explained: 1536 vs 1024 vs 512Tutorial

text-embedding-3-small Dimensions Explained: 1536 vs 1024 vs 512

A practical guide to text-embedding-3-small dimensions: default 1536 vectors, the dimensions parameter, storage tradeoffs, and API examples.

Jun 5