"Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model (2026)"

Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model#

text-embedding-3-small is OpenAI's cost-effective embedding model, released in January 2024. It converts text into 1536-dimensional vectors that capture semantic meaning — the foundation for semantic search, RAG pipelines, recommendation systems, and classification tasks.

This guide covers everything: pricing, token limits, dimensions, API usage, dimension reduction, performance benchmarks, and how it compares to text-embedding-3-large.

Text-Embedding-3-Small Quick Reference#

Spec	Value
Model name	`text-embedding-3-small`
Provider	OpenAI
Default dimensions	1536
Adjustable dimensions	256 – 1536
Max input tokens	8,191
Max batch size	2,048 inputs per request
Pricing (OpenAI direct)	$0.020 per 1M tokens
Pricing (Crazyrouter)	$0.016 per 1M tokens
MTEB benchmark score	62.3
Multilingual	Yes
Output format	`float` or `base64`
Release date	January 25, 2024
Status	Active (not deprecated)

Text-Embedding-3-Small Pricing#

text-embedding-3-small costs $0.020 per 1 million tokens** on OpenAI directly. Through [Crazyrouter](https://crazyrouter.com), the price drops to **$ 0.016 per 1M tokens — a 20% discount.

To put that in perspective:

Document Volume	Approx. Tokens	Cost (OpenAI)	Cost (Crazyrouter)
100 pages of text	~75,000	$0.0015	$0.0012
10,000 pages	~7.5M	$0.15	$0.12
1 million pages	~750M	$15.00	$12.00
Wikipedia (English, full)	~4.4B	$88.00	$70.40

Cost Comparison with Other Embedding Models#

Model	Price / 1M Tokens	Price via Crazyrouter	Dimensions
text-embedding-3-small	$0.020	$0.016	1536
text-embedding-3-large	$0.130	$0.100	3072
text-embedding-ada-002	$0.100	—	1536
Google text-embedding-005	$0.00625	$0.005	768
Cohere embed-v4	$0.100	—	1024
Voyage voyage-3-large	$0.180	—	2048

text-embedding-3-small is 6.5x cheaper than text-embedding-3-large and 5x cheaper than the older ada-002 — while outperforming ada-002 on benchmarks.

For a deeper comparison of all embedding models, see our AI Embeddings Comparison 2026 Guide.

Text-Embedding-3-Small Dimensions#

The default output is a 1536-dimensional vector. But text-embedding-3-small supports dimension reduction via the dimensions parameter — you can request any value from 256 to 1536.

This is done using Matryoshka Representation Learning (MRL). The model is trained so that the first N dimensions of the vector carry the most important information. Truncating to fewer dimensions loses some nuance but keeps most of the semantic signal.

Dimension vs. Quality Tradeoff#

Dimensions	MTEB Score	Storage per Vector	Relative Quality
1536 (default)	62.3	6,144 bytes	100%
1024	~61.5	4,096 bytes	~98.7%
768	~60.8	3,072 bytes	~97.6%
512	~59.7	2,048 bytes	~95.8%
256	~57.8	1,024 bytes	~92.8%

When to Reduce Dimensions#

256 dimensions: Prototyping, low-resource environments, or when storage is the bottleneck
512 dimensions: Good balance for mobile apps or edge deployments
768 dimensions: Matches Google's embedding size — useful for migration
1536 dimensions: Production workloads where quality matters most

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Request 512 dimensions instead of the default 1536
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How does semantic search work?",
    dimensions=512
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 512

Text-Embedding-3-Small Token Limit and Context Length#

text-embedding-3-small accepts up to 8,191 tokens per input string. This is the model's context window for embedding.

Key details:

Tokenizer: cl100k_base (same as GPT-4)
1 token ≈ 4 characters in English, ≈ 0.75 words
8,191 tokens ≈ 6,100 words ≈ 24,000 characters
Text exceeding 8,191 tokens is truncated (not rejected)

How to Count Tokens Before Sending#

python

import tiktoken

encoder = tiktoken.get_encoding("cl100k_base")

text = "Your document text here..."
tokens = encoder.encode(text)
print(f"Token count: {len(tokens)}")

# If over limit, chunk the text
MAX_TOKENS = 8191
if len(tokens) > MAX_TOKENS:
    chunks = [tokens[i:i+MAX_TOKENS] for i in range(0, len(tokens), MAX_TOKENS)]
    texts = [encoder.decode(chunk) for chunk in chunks]
    print(f"Split into {len(texts)} chunks")

Handling Long Documents#

For documents longer than 8,191 tokens, you have two options:

Chunking: Split into overlapping segments and embed each one
Summarize first: Use an LLM to summarize, then embed the summary

python

def chunk_text(text, max_tokens=8000, overlap=200):
    """Split text into overlapping chunks that fit the token limit."""
    encoder = tiktoken.get_encoding("cl100k_base")
    tokens = encoder.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = start + max_tokens
        chunk = encoder.decode(tokens[start:end])
        chunks.append(chunk)
        start = end - overlap  # overlap for context continuity
    return chunks

How to Use the Text-Embedding-3-Small API#

Python (OpenAI SDK)#

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is retrieval augmented generation?"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536
print(f"Tokens used: {response.usage.total_tokens}")

Batch Embedding (Multiple Texts)#

Send up to 2,048 texts in a single request for better throughput:

python

texts = [
    "First document about machine learning",
    "Second document about natural language processing",
    "Third document about computer vision",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]
print(f"Got {len(embeddings)} embeddings, each with {len(embeddings[0])} dimensions")
print(f"Total tokens: {response.usage.total_tokens}")

cURL#

bash

curl https://api.crazyrouter.com/v1/embeddings \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "What is retrieval augmented generation?",
    "dimensions": 1536
  }'

Response Format#

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797347, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Base64 Output Format#

For bandwidth-sensitive applications, request base64-encoded output:

python

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here",
    encoding_format="base64"
)
# Returns base64-encoded bytes instead of float array

Error Handling#

python

from openai import OpenAI, RateLimitError, APIError

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

import time

def get_embedding(text, model="text-embedding-3-small", max_retries=3):
    """Get embedding with retry logic for rate limits."""
    for attempt in range(max_retries):
        try:
            response = client.embeddings.create(
                model=model,
                input=text
            )
            return response.data[0].embedding
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        except APIError as e:
            print(f"API error: {e}")
            raise
    raise Exception("Max retries exceeded")

Text-Embedding-3-Small vs Text-Embedding-3-Large#

This is the most common comparison. Here's the full breakdown:

Feature	text-embedding-3-small	text-embedding-3-large
Default dimensions	1536	3072
Adjustable dimensions	256 – 1536	256 – 3072
Max tokens	8,191	8,191
MTEB score	62.3	64.6
Price / 1M tokens	$0.020	$0.130
Price via Crazyrouter	$0.016	$0.100
Relative quality	Baseline	+3.7% better
Relative cost	Baseline	6.5x more expensive
Storage (float32)	6 KB/vector	12 KB/vector

When to Choose text-embedding-3-small#

Budget-conscious projects
High-volume embedding workloads (millions of documents)
RAG applications where "good enough" retrieval is fine
Prototyping and development
Applications where latency matters (smaller vectors = faster similarity search)

When to Choose text-embedding-3-large#

Search quality is the top priority and budget allows it
Legal, medical, or financial domains where precision matters
Small document collections where the 6.5x cost difference is negligible
You can use dimension reduction (e.g., 1024 dims) to get large-model quality at reduced storage

The Dimension Reduction Trick#

text-embedding-3-large at 1024 dimensions often outperforms text-embedding-3-small at 1536 dimensions — while using less storage:

python

# Large model at reduced dimensions — often the sweet spot
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Your text here",
    dimensions=1024  # Reduced from 3072
)
# Better quality than small@1536, less storage than small@1536

Text-Embedding-3-Small vs Ada-002#

text-embedding-ada-002 was OpenAI's previous generation embedding model. Here's why you should migrate:

Feature	text-embedding-3-small	text-embedding-ada-002
MTEB score	62.3	61.0
Price / 1M tokens	$0.020	$0.100
Dimensions	1536 (adjustable)	1536 (fixed)
Dimension reduction	✅ Yes	❌ No
Multilingual (MIRACL)	44.0	31.4

text-embedding-3-small is 5x cheaper, higher quality, and supports dimension reduction. There's no reason to stay on ada-002.

Migration note: Embeddings from different models are not compatible. Switching requires re-embedding all your documents.

Text-Embedding-3-Small Multilingual Support#

text-embedding-3-small supports multilingual text natively. On the MIRACL multilingual benchmark, it scores 44.0 — a massive improvement over ada-002's 31.4.

Supported languages include English, Chinese, Japanese, Korean, Spanish, French, German, Portuguese, Russian, Arabic, Hindi, and many more.

python

# Multilingual embeddings — same API, no configuration needed
texts = [
    "How does machine learning work?",          # English
    "机器学习是如何工作的？",                      # Chinese
    "機械学習はどのように機能しますか？",            # Japanese
    "¿Cómo funciona el aprendizaje automático?",  # Spanish
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# Cross-lingual similarity works out of the box
import numpy as np
embeddings = [item.embedding for item in response.data]
en_embedding = np.array(embeddings[0])
zh_embedding = np.array(embeddings[1])

similarity = np.dot(en_embedding, zh_embedding) / (
    np.linalg.norm(en_embedding) * np.linalg.norm(zh_embedding)
)
print(f"EN-ZH similarity: {similarity:.4f}")  # High similarity for same meaning

For applications that are primarily multilingual (100+ languages, cross-lingual retrieval at scale), also consider Cohere embed-v4 or the open-source BGE-M3.

Is Text-Embedding-3-Small Deprecated?#

No. As of 2026, text-embedding-3-small is fully active and supported by OpenAI. It is not deprecated and there is no announced deprecation date.

The model that is deprecated is the older text-embedding-ada-002. OpenAI recommends migrating from ada-002 to the text-embedding-3 series.

Timeline:

text-embedding-ada-002: Released December 2022, now legacy
text-embedding-3-small: Released January 2024, current recommended model
text-embedding-3-large: Released January 2024, current premium model

Building Semantic Search with Text-Embedding-3-Small#

Here's a complete working example:

python

from openai import OpenAI
import numpy as np

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Step 1: Embed your document collection
documents = [
    "Python is a high-level programming language known for its readability.",
    "JavaScript is the language of the web, running in every browser.",
    "Rust provides memory safety without garbage collection.",
    "Go was designed at Google for building scalable network services.",
    "TypeScript adds static typing to JavaScript for better tooling.",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
doc_embeddings = np.array([item.embedding for item in response.data])

# Step 2: Search by meaning
query = "Which language is best for web development?"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=query
)
query_embedding = np.array(query_response.data[0].embedding)

# Step 3: Rank by cosine similarity
similarities = np.dot(doc_embeddings, query_embedding) / (
    np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)

# Step 4: Return top results
ranked = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)
print("Search results:")
for idx, score in ranked[:3]:
    print(f"  [{score:.4f}] {documents[idx]}")

Scaling with a Vector Database#

For production workloads with millions of documents, use a vector database instead of in-memory numpy:

python

# Example with Pinecone
import pinecone

pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("my-embeddings")

# Upsert embeddings
vectors = [
    {"id": f"doc_{i}", "values": emb.tolist(), "metadata": {"text": doc}}
    for i, (doc, emb) in enumerate(zip(documents, doc_embeddings))
]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=query_embedding.tolist(),
    top_k=3,
    include_metadata=True
)
for match in results.matches:
    print(f"  [{match.score:.4f}] {match.metadata['text']}")

Other popular vector databases that work well with text-embedding-3-small:

Pinecone: Fully managed, easiest to start
Weaviate: Open source, supports hybrid search
Qdrant: Open source, high performance
ChromaDB: Lightweight, great for prototyping
pgvector: PostgreSQL extension, no new infrastructure needed

Performance Benchmarks#

MTEB (Massive Text Embedding Benchmark)#

Model	Average	Classification	Clustering	Retrieval
text-embedding-3-small	62.3	67.1	41.2	51.7
text-embedding-3-large	64.6	69.8	44.1	55.4
text-embedding-ada-002	61.0	66.3	40.1	49.3
Cohere embed-v4	66.2	71.5	46.3	56.8
Voyage voyage-3-large	67.1	72.0	47.2	58.1

Multilingual (MIRACL)#

Model	MIRACL Score
text-embedding-3-small	44.0
text-embedding-3-large	54.9
text-embedding-ada-002	31.4

Latency#

text-embedding-3-small is one of the fastest commercial embedding models. Typical latency through Crazyrouter:

Batch Size	Avg Latency
1 text	~50ms
10 texts	~80ms
100 texts	~200ms
1000 texts	~800ms

Best Practices#

Batch requests: Send multiple texts per API call (up to 2,048) to reduce overhead
Cache embeddings: Never re-embed the same text — store results in a vector database or cache
Normalize vectors: The API returns normalized vectors by default, but verify if using dimension reduction
Choose dimensions wisely: Start with 1536, reduce only if storage or latency is a real constraint
Use consistent models: Never mix embeddings from different models in the same index
Chunk long documents: Split texts over 8,191 tokens with overlap for context continuity
Monitor token usage: Track usage.total_tokens in responses to manage costs

Frequently Asked Questions#

What is text-embedding-3-small?#

text-embedding-3-small is OpenAI's cost-effective text embedding model. It converts text into 1536-dimensional numerical vectors that capture semantic meaning, enabling applications like semantic search, RAG, classification, and clustering.

How much does text-embedding-3-small cost?#

$0.020 per 1 million tokens on OpenAI directly. Through [Crazyrouter](https://crazyrouter.com), it costs$ 0.016 per 1M tokens — 20% cheaper. Embedding 10,000 pages of text costs roughly $0.15.

What is the token limit for text-embedding-3-small?#

8,191 tokens per input string. That's approximately 6,100 words or 24,000 characters in English. Text exceeding this limit is silently truncated.

How many dimensions does text-embedding-3-small output?#

1,536 dimensions by default. You can reduce this to any value between 256 and 1,536 using the dimensions parameter in the API request.

Is text-embedding-3-small deprecated?#

No. It is fully active and supported as of 2026. The deprecated model is the older text-embedding-ada-002.

What's the difference between text-embedding-3-small and text-embedding-3-large?#

text-embedding-3-large outputs 3,072 dimensions (vs 1,536), scores 3.7% higher on MTEB benchmarks, and costs 6.5x more ( $0.13 vs$ 0.02 per 1M tokens). For most applications, the small model is sufficient.

Does text-embedding-3-small support multilingual text?#

Yes. It handles multiple languages natively and scores 44.0 on the MIRACL multilingual benchmark. No special configuration is needed — just pass text in any supported language.

Can I use text-embedding-3-small with LangChain?#

Yes. LangChain has built-in support:

python

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key="your-crazyrouter-key",
    openai_api_base="https://api.crazyrouter.com/v1"
)

vectors = embeddings.embed_documents(["Your text here"])

Can I use text-embedding-3-small with LlamaIndex?#

Yes:

python

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key="your-crazyrouter-key",
    api_base="https://api.crazyrouter.com/v1"
)

How does text-embedding-3-small compare to free/open-source models?#

Open-source models like BGE-M3 (MTEB 63.2) and E5-Mistral-7B (MTEB 66.6) can match or exceed its quality. The trade-off is you need GPU infrastructure to run them. text-embedding-3-small wins on convenience and total cost for small-to-medium workloads.

Summary#

text-embedding-3-small is the default choice for production embedding workloads in 2026. At $0.02/1M tokens (or$ 0.016 via Crazyrouter), it delivers strong quality across search, RAG, and classification — with the flexibility of dimension reduction and multilingual support.

Choose text-embedding-3-small when: you want the best price-to-performance ratio for embedding workloads.

Choose text-embedding-3-large when: you need maximum quality and the 6.5x cost increase is acceptable.

Access it through Crazyrouter for 20% lower pricing and a unified API that also gives you GPT-5, Claude, Gemini, and 300+ other models — all with one API key.

"Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model (2026)"