EnglishTutorial

Text-Embedding-3-Small Complete Guide: OpenAI's Cost-Effective Embedding Model

A practical guide to OpenAI's text-embedding-3-small model. Covers API usage, dimension reduction, performance benchmarks, and how to build semantic search with code examples.

Crazyrouter Team

February 23, 2026 / 1691 views

Text-Embedding-3-Small Complete Guide: OpenAI's Cost-Effective Embedding Model

Crazyrouter

Check live pricing Read the docs Open image tool Create account

If you're building semantic search, RAG pipelines, or recommendation systems, you need an embedding model. OpenAI's text-embedding-3-small hits the sweet spot between quality and cost — and it's what most production systems should default to.

Here's everything you need to know.

What Is text-embedding-3-small?#

text-embedding-3-small is OpenAI's compact embedding model that converts text into numerical vectors (embeddings). These vectors capture semantic meaning, so similar texts produce similar vectors.

Key Specs#

Feature	text-embedding-3-small	text-embedding-3-large	text-embedding-ada-002 (legacy)
Dimensions	1536 (default)	3072 (default)	1536 (fixed)
Adjustable Dimensions	✅ (down to 256)	✅ (down to 256)	❌
MTEB Score	62.3%	64.6%	61.0%
Max Tokens	8191	8191	8191
Price (per 1M tokens)	$0.02	$0.13	$0.10
Relative Cost	1x	6.5x	5x

The key insight: text-embedding-3-small is 5x cheaper than the legacy ada-002 model while delivering better quality. There's almost no reason to use ada-002 anymore.

Adjustable Dimensions#

One of the best features is dimension reduction. You can request fewer dimensions to save storage and speed up similarity search:

Dimensions	MTEB Score	Vector Size	Use Case
1536 (default)	62.3%	6.1 KB	Best quality
768	61.5%	3.1 KB	Good balance
512	60.8%	2.0 KB	Large-scale search
256	59.2%	1.0 KB	Memory-constrained

At 512 dimensions, you get 97.6% of the full quality at 33% of the storage cost. That's a great tradeoff for most applications.

How to Use text-embedding-3-small#

Python (via Crazyrouter)#

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Basic embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536
print(f"First 5 values: {embedding[:5]}")

# With dimension reduction
response_small = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
    dimensions=512  # Reduce to 512 dimensions
)

embedding_small = response_small.data[0].embedding
print(f"Reduced dimensions: {len(embedding_small)}")  # 512

Batch Embedding#

python

texts = [
    "How do I reset my password?",
    "I forgot my login credentials",
    "What's the weather like today?",
    "Can I change my email address?",
    "Password recovery not working"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts,
    dimensions=512
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings of {len(embeddings[0])} dimensions")

Node.js Example#

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-crazyrouter-key',
  baseURL: 'https://api.crazyrouter.com/v1'
});

async function getEmbedding(text, dimensions = 1536) {
  const response = await client.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
    dimensions
  });
  return response.data[0].embedding;
}

// Semantic similarity
async function cosineSimilarity(a, b) {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

const emb1 = await getEmbedding('How to reset password');
const emb2 = await getEmbedding('I forgot my login');
const emb3 = await getEmbedding('Best pizza in New York');

console.log('Similar:', await cosineSimilarity(emb1, emb2));  // ~0.85
console.log('Different:', await cosineSimilarity(emb1, emb3)); // ~0.15

cURL Example#

bash

curl -X POST https://api.crazyrouter.com/v1/embeddings \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "How do I reset my password?",
    "dimensions": 512
  }'

Building Semantic Search#

Here's a complete example of building a semantic search system:

python

import numpy as np
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

DIMENSIONS = 512

# Step 1: Index your documents
documents = [
    "Python is a high-level programming language known for its simplicity.",
    "JavaScript runs in web browsers and powers interactive websites.",
    "Docker containers package applications with their dependencies.",
    "Kubernetes orchestrates container deployments at scale.",
    "PostgreSQL is a powerful open-source relational database.",
    "Redis is an in-memory data store used for caching.",
    "GraphQL is a query language for APIs developed by Facebook.",
    "REST APIs use HTTP methods to perform CRUD operations.",
]

def get_embeddings(texts):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
        dimensions=DIMENSIONS
    )
    return [item.embedding for item in response.data]

# Generate embeddings for all documents
doc_embeddings = np.array(get_embeddings(documents))

# Step 2: Search
def search(query, top_k=3):
    query_embedding = np.array(get_embeddings([query])[0])

    # Cosine similarity
    similarities = np.dot(doc_embeddings, query_embedding) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
    )

    top_indices = np.argsort(similarities)[::-1][:top_k]

    results = []
    for idx in top_indices:
        results.append({
            "document": documents[idx],
            "score": float(similarities[idx])
        })
    return results

# Try it
results = search("How do I cache data?")
for r in results:
    print(f"[{r['score']:.3f}] {r['document']}")
# [0.742] Redis is an in-memory data store used for caching.
# [0.385] PostgreSQL is a powerful open-source relational database.
# [0.312] Docker containers package applications with their dependencies.

text-embedding-3-small vs Alternatives#

Embedding Model Comparison#

Model	Provider	MTEB Score	Price (1M tokens)	Dimensions	Open Source
text-embedding-3-small	OpenAI	62.3%	$0.02	256-1536	❌
text-embedding-3-large	OpenAI	64.6%	$0.13	256-3072	❌
Gemini Embedding	Google	63.8%	$0.01	768	❌
Voyage-3	Voyage AI	67.1%	$0.06	1024	❌
BGE-M3	BAAI	63.5%	Free (self-host)	1024	✅
Nomic Embed v2	Nomic	62.8%	Free (self-host)	768	✅

When to Choose text-embedding-3-small#

Best for: Production systems where cost matters and quality is "good enough"
Not ideal for: Research benchmarks or when you need absolute best retrieval quality

For most RAG and search applications, the difference between 62% and 67% MTEB score is negligible in practice. The 3x-6x cost difference is not.

Pricing#

Cost Comparison via Crazyrouter#

Model	Official Price	Crazyrouter Price	Savings
text-embedding-3-small	$0.02/1M tokens	$0.015/1M tokens	25%
text-embedding-3-large	$0.13/1M tokens	$0.10/1M tokens	23%

Real-World Cost Estimates#

Use Case	Documents	Tokens	Monthly Cost (Crazyrouter)
Small FAQ bot	1,000 docs	~500K	< $0.01
Medium knowledge base	50,000 docs	~25M	~$0.38
Large search engine	1M docs	~500M	~$7.50
Enterprise RAG	10M docs	~5B	~$75

Embedding is a one-time cost per document. You only re-embed when content changes. Through Crazyrouter, you can access text-embedding-3-small alongside 300+ other models with a single API key.

Best Practices#

Use 512 dimensions for most applications — best quality/storage tradeoff
Batch your requests — send up to 2048 texts per API call
Cache embeddings — store in a vector database, don't re-compute
Chunk long documents — split into 200-500 token chunks for better retrieval
Normalize vectors — OpenAI embeddings are already normalized, but verify after dimension reduction
Use metadata filtering — combine vector search with traditional filters for better results

FAQ#

What's the difference between text-embedding-3-small and text-embedding-3-large?#

The large model has higher quality (64.6% vs 62.3% MTEB) but costs 6.5x more. For most production use cases, the small model is sufficient. Use the large model only when retrieval quality is critical and cost isn't a concern.

Can I reduce dimensions after generating embeddings?#

Yes. You can truncate the embedding vector to fewer dimensions and re-normalize. However, it's better to request the reduced dimensions directly via the dimensions parameter, as the model optimizes for the target dimensionality.

How many tokens does a typical document use?#

Roughly 1 token per 4 characters in English, or 1 token per 1-2 characters in Chinese. A 500-word English paragraph is about 650 tokens.

Which vector database should I use?#

Popular choices: Pinecone (managed), Weaviate (open source), Qdrant (open source), pgvector (PostgreSQL extension). For getting started, pgvector is the simplest if you already use PostgreSQL.

Is text-embedding-3-small good for multilingual content?#

Yes. It handles English, Chinese, Japanese, Korean, and European languages well. For specialized multilingual needs, consider BGE-M3 or Cohere's multilingual model.

Summary#

text-embedding-3-small is the default choice for production embedding workloads. It's cheap, fast, supports dimension reduction, and delivers quality that's more than adequate for search, RAG, and classification tasks.

Access it through Crazyrouter for the best pricing and the convenience of a unified API that also gives you access to GPT-5, Claude, Gemini, and 300+ other models.