
"Text-Embedding-3-Small Complete Guide: OpenAI's Cost-Effective Embedding Model"
If you're building semantic search, RAG pipelines, or recommendation systems, you need an embedding model. OpenAI's text-embedding-3-small hits the sweet spot between quality and cost — and it's what most production systems should default to.
Here's everything you need to know.
What Is text-embedding-3-small?#
text-embedding-3-small is OpenAI's compact embedding model that converts text into numerical vectors (embeddings). These vectors capture semantic meaning, so similar texts produce similar vectors.
Key Specs#
| Feature | text-embedding-3-small | text-embedding-3-large | text-embedding-ada-002 (legacy) |
|---|---|---|---|
| Dimensions | 1536 (default) | 3072 (default) | 1536 (fixed) |
| Adjustable Dimensions | ✅ (down to 256) | ✅ (down to 256) | ❌ |
| MTEB Score | 62.3% | 64.6% | 61.0% |
| Max Tokens | 8191 | 8191 | 8191 |
| Price (per 1M tokens) | $0.02 | $0.13 | $0.10 |
| Relative Cost | 1x | 6.5x | 5x |
The key insight: text-embedding-3-small is 5x cheaper than the legacy ada-002 model while delivering better quality. There's almost no reason to use ada-002 anymore.
Adjustable Dimensions#
One of the best features is dimension reduction. You can request fewer dimensions to save storage and speed up similarity search:
| Dimensions | MTEB Score | Vector Size | Use Case |
|---|---|---|---|
| 1536 (default) | 62.3% | 6.1 KB | Best quality |
| 768 | 61.5% | 3.1 KB | Good balance |
| 512 | 60.8% | 2.0 KB | Large-scale search |
| 256 | 59.2% | 1.0 KB | Memory-constrained |
At 512 dimensions, you get 97.6% of the full quality at 33% of the storage cost. That's a great tradeoff for most applications.
How to Use text-embedding-3-small#
Python (via Crazyrouter)#
from openai import OpenAI
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://api.crazyrouter.com/v1"
)
# Basic embedding
response = client.embeddings.create(
model="text-embedding-3-small",
input="How do I reset my password?"
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}") # 1536
print(f"First 5 values: {embedding[:5]}")
# With dimension reduction
response_small = client.embeddings.create(
model="text-embedding-3-small",
input="How do I reset my password?",
dimensions=512 # Reduce to 512 dimensions
)
embedding_small = response_small.data[0].embedding
print(f"Reduced dimensions: {len(embedding_small)}") # 512
Batch Embedding#
texts = [
"How do I reset my password?",
"I forgot my login credentials",
"What's the weather like today?",
"Can I change my email address?",
"Password recovery not working"
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts,
dimensions=512
)
embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings of {len(embeddings[0])} dimensions")
Node.js Example#
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-crazyrouter-key',
baseURL: 'https://api.crazyrouter.com/v1'
});
async function getEmbedding(text, dimensions = 1536) {
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: text,
dimensions
});
return response.data[0].embedding;
}
// Semantic similarity
async function cosineSimilarity(a, b) {
let dot = 0, normA = 0, normB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
const emb1 = await getEmbedding('How to reset password');
const emb2 = await getEmbedding('I forgot my login');
const emb3 = await getEmbedding('Best pizza in New York');
console.log('Similar:', await cosineSimilarity(emb1, emb2)); // ~0.85
console.log('Different:', await cosineSimilarity(emb1, emb3)); // ~0.15
cURL Example#
curl -X POST https://api.crazyrouter.com/v1/embeddings \
-H "Authorization: Bearer your-crazyrouter-key" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "How do I reset my password?",
"dimensions": 512
}'
Building Semantic Search#
Here's a complete example of building a semantic search system:
import numpy as np
from openai import OpenAI
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://api.crazyrouter.com/v1"
)
DIMENSIONS = 512
# Step 1: Index your documents
documents = [
"Python is a high-level programming language known for its simplicity.",
"JavaScript runs in web browsers and powers interactive websites.",
"Docker containers package applications with their dependencies.",
"Kubernetes orchestrates container deployments at scale.",
"PostgreSQL is a powerful open-source relational database.",
"Redis is an in-memory data store used for caching.",
"GraphQL is a query language for APIs developed by Facebook.",
"REST APIs use HTTP methods to perform CRUD operations.",
]
def get_embeddings(texts):
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts,
dimensions=DIMENSIONS
)
return [item.embedding for item in response.data]
# Generate embeddings for all documents
doc_embeddings = np.array(get_embeddings(documents))
# Step 2: Search
def search(query, top_k=3):
query_embedding = np.array(get_embeddings([query])[0])
# Cosine similarity
similarities = np.dot(doc_embeddings, query_embedding) / (
np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)
top_indices = np.argsort(similarities)[::-1][:top_k]
results = []
for idx in top_indices:
results.append({
"document": documents[idx],
"score": float(similarities[idx])
})
return results
# Try it
results = search("How do I cache data?")
for r in results:
print(f"[{r['score']:.3f}] {r['document']}")
# [0.742] Redis is an in-memory data store used for caching.
# [0.385] PostgreSQL is a powerful open-source relational database.
# [0.312] Docker containers package applications with their dependencies.
text-embedding-3-small vs Alternatives#
Embedding Model Comparison#
| Model | Provider | MTEB Score | Price (1M tokens) | Dimensions | Open Source |
|---|---|---|---|---|---|
| text-embedding-3-small | OpenAI | 62.3% | $0.02 | 256-1536 | ❌ |
| text-embedding-3-large | OpenAI | 64.6% | $0.13 | 256-3072 | ❌ |
| Gemini Embedding | 63.8% | $0.01 | 768 | ❌ | |
| Voyage-3 | Voyage AI | 67.1% | $0.06 | 1024 | ❌ |
| BGE-M3 | BAAI | 63.5% | Free (self-host) | 1024 | ✅ |
| Nomic Embed v2 | Nomic | 62.8% | Free (self-host) | 768 | ✅ |
When to Choose text-embedding-3-small#
- Best for: Production systems where cost matters and quality is "good enough"
- Not ideal for: Research benchmarks or when you need absolute best retrieval quality
For most RAG and search applications, the difference between 62% and 67% MTEB score is negligible in practice. The 3x-6x cost difference is not.
Pricing#
Cost Comparison via Crazyrouter#
| Model | Official Price | Crazyrouter Price | Savings |
|---|---|---|---|
| text-embedding-3-small | $0.02/1M tokens | $0.015/1M tokens | 25% |
| text-embedding-3-large | $0.13/1M tokens | $0.10/1M tokens | 23% |
Real-World Cost Estimates#
| Use Case | Documents | Tokens | Monthly Cost (Crazyrouter) |
|---|---|---|---|
| Small FAQ bot | 1,000 docs | ~500K | < $0.01 |
| Medium knowledge base | 50,000 docs | ~25M | ~$0.38 |
| Large search engine | 1M docs | ~500M | ~$7.50 |
| Enterprise RAG | 10M docs | ~5B | ~$75 |
Embedding is a one-time cost per document. You only re-embed when content changes. Through Crazyrouter, you can access text-embedding-3-small alongside 300+ other models with a single API key.
Best Practices#
- Use 512 dimensions for most applications — best quality/storage tradeoff
- Batch your requests — send up to 2048 texts per API call
- Cache embeddings — store in a vector database, don't re-compute
- Chunk long documents — split into 200-500 token chunks for better retrieval
- Normalize vectors — OpenAI embeddings are already normalized, but verify after dimension reduction
- Use metadata filtering — combine vector search with traditional filters for better results
FAQ#
What's the difference between text-embedding-3-small and text-embedding-3-large?#
The large model has higher quality (64.6% vs 62.3% MTEB) but costs 6.5x more. For most production use cases, the small model is sufficient. Use the large model only when retrieval quality is critical and cost isn't a concern.
Can I reduce dimensions after generating embeddings?#
Yes. You can truncate the embedding vector to fewer dimensions and re-normalize. However, it's better to request the reduced dimensions directly via the dimensions parameter, as the model optimizes for the target dimensionality.
How many tokens does a typical document use?#
Roughly 1 token per 4 characters in English, or 1 token per 1-2 characters in Chinese. A 500-word English paragraph is about 650 tokens.
Which vector database should I use?#
Popular choices: Pinecone (managed), Weaviate (open source), Qdrant (open source), pgvector (PostgreSQL extension). For getting started, pgvector is the simplest if you already use PostgreSQL.
Is text-embedding-3-small good for multilingual content?#
Yes. It handles English, Chinese, Japanese, Korean, and European languages well. For specialized multilingual needs, consider BGE-M3 or Cohere's multilingual model.
Summary#
text-embedding-3-small is the default choice for production embedding workloads. It's cheap, fast, supports dimension reduction, and delivers quality that's more than adequate for search, RAG, and classification tasks.
Access it through Crazyrouter for the best pricing and the convenience of a unified API that also gives you access to GPT-5, Claude, Gemini, and 300+ other models.


