Login
Back to Blog
"Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model (2026)"

"Text-Embedding-3-Small: Complete Guide to OpenAI's Most Popular Embedding Model (2026)"

C
Crazyrouter Team
May 3, 2026
18 viewsEnglishTutorial
Share:

text-embedding-3-small is OpenAI's cost-effective embedding model, released in January 2024. It converts text into 1536-dimensional vectors that capture semantic meaning — the foundation for semantic search, RAG pipelines, recommendation systems, and classification tasks.

This guide covers everything: pricing, token limits, dimensions, API usage, dimension reduction, performance benchmarks, and how it compares to text-embedding-3-large.

Text-Embedding-3-Small Quick Reference#

SpecValue
Model nametext-embedding-3-small
ProviderOpenAI
Default dimensions1536
Adjustable dimensions256 – 1536
Max input tokens8,191
Max batch size2,048 inputs per request
Pricing (OpenAI direct)$0.020 per 1M tokens
Pricing (Crazyrouter)$0.016 per 1M tokens
MTEB benchmark score62.3
MultilingualYes
Output formatfloat or base64
Release dateJanuary 25, 2024
StatusActive (not deprecated)

Text-Embedding-3-Small Pricing#

text-embedding-3-small costs 0.020per1milliontokensonOpenAIdirectly.Through[Crazyrouter](https://crazyrouter.com),thepricedropsto0.020 per 1 million tokens** on OpenAI directly. Through [Crazyrouter](https://crazyrouter.com), the price drops to **0.016 per 1M tokens — a 20% discount.

To put that in perspective:

Document VolumeApprox. TokensCost (OpenAI)Cost (Crazyrouter)
100 pages of text~75,000$0.0015$0.0012
10,000 pages~7.5M$0.15$0.12
1 million pages~750M$15.00$12.00
Wikipedia (English, full)~4.4B$88.00$70.40

Cost Comparison with Other Embedding Models#

ModelPrice / 1M TokensPrice via CrazyrouterDimensions
text-embedding-3-small$0.020$0.0161536
text-embedding-3-large$0.130$0.1003072
text-embedding-ada-002$0.1001536
Google text-embedding-005$0.00625$0.005768
Cohere embed-v4$0.1001024
Voyage voyage-3-large$0.1802048

text-embedding-3-small is 6.5x cheaper than text-embedding-3-large and 5x cheaper than the older ada-002 — while outperforming ada-002 on benchmarks.

For a deeper comparison of all embedding models, see our AI Embeddings Comparison 2026 Guide.

Text-Embedding-3-Small Dimensions#

The default output is a 1536-dimensional vector. But text-embedding-3-small supports dimension reduction via the dimensions parameter — you can request any value from 256 to 1536.

This is done using Matryoshka Representation Learning (MRL). The model is trained so that the first N dimensions of the vector carry the most important information. Truncating to fewer dimensions loses some nuance but keeps most of the semantic signal.

Dimension vs. Quality Tradeoff#

DimensionsMTEB ScoreStorage per VectorRelative Quality
1536 (default)62.36,144 bytes100%
1024~61.54,096 bytes~98.7%
768~60.83,072 bytes~97.6%
512~59.72,048 bytes~95.8%
256~57.81,024 bytes~92.8%

When to Reduce Dimensions#

  • 256 dimensions: Prototyping, low-resource environments, or when storage is the bottleneck
  • 512 dimensions: Good balance for mobile apps or edge deployments
  • 768 dimensions: Matches Google's embedding size — useful for migration
  • 1536 dimensions: Production workloads where quality matters most
python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Request 512 dimensions instead of the default 1536
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How does semantic search work?",
    dimensions=512
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 512

Text-Embedding-3-Small Token Limit and Context Length#

text-embedding-3-small accepts up to 8,191 tokens per input string. This is the model's context window for embedding.

Key details:

  • Tokenizer: cl100k_base (same as GPT-4)
  • 1 token ≈ 4 characters in English, ≈ 0.75 words
  • 8,191 tokens ≈ 6,100 words ≈ 24,000 characters
  • Text exceeding 8,191 tokens is truncated (not rejected)

How to Count Tokens Before Sending#

python
import tiktoken

encoder = tiktoken.get_encoding("cl100k_base")

text = "Your document text here..."
tokens = encoder.encode(text)
print(f"Token count: {len(tokens)}")

# If over limit, chunk the text
MAX_TOKENS = 8191
if len(tokens) > MAX_TOKENS:
    chunks = [tokens[i:i+MAX_TOKENS] for i in range(0, len(tokens), MAX_TOKENS)]
    texts = [encoder.decode(chunk) for chunk in chunks]
    print(f"Split into {len(texts)} chunks")

Handling Long Documents#

For documents longer than 8,191 tokens, you have two options:

  1. Chunking: Split into overlapping segments and embed each one
  2. Summarize first: Use an LLM to summarize, then embed the summary
python
def chunk_text(text, max_tokens=8000, overlap=200):
    """Split text into overlapping chunks that fit the token limit."""
    encoder = tiktoken.get_encoding("cl100k_base")
    tokens = encoder.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = start + max_tokens
        chunk = encoder.decode(tokens[start:end])
        chunks.append(chunk)
        start = end - overlap  # overlap for context continuity
    return chunks

How to Use the Text-Embedding-3-Small API#

Python (OpenAI SDK)#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is retrieval augmented generation?"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536
print(f"Tokens used: {response.usage.total_tokens}")

Batch Embedding (Multiple Texts)#

Send up to 2,048 texts in a single request for better throughput:

python
texts = [
    "First document about machine learning",
    "Second document about natural language processing",
    "Third document about computer vision",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]
print(f"Got {len(embeddings)} embeddings, each with {len(embeddings[0])} dimensions")
print(f"Total tokens: {response.usage.total_tokens}")

cURL#

bash
curl https://api.crazyrouter.com/v1/embeddings \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "What is retrieval augmented generation?",
    "dimensions": 1536
  }'

Response Format#

json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797347, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Base64 Output Format#

For bandwidth-sensitive applications, request base64-encoded output:

python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here",
    encoding_format="base64"
)
# Returns base64-encoded bytes instead of float array

Error Handling#

python
from openai import OpenAI, RateLimitError, APIError

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

import time

def get_embedding(text, model="text-embedding-3-small", max_retries=3):
    """Get embedding with retry logic for rate limits."""
    for attempt in range(max_retries):
        try:
            response = client.embeddings.create(
                model=model,
                input=text
            )
            return response.data[0].embedding
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        except APIError as e:
            print(f"API error: {e}")
            raise
    raise Exception("Max retries exceeded")

Text-Embedding-3-Small vs Text-Embedding-3-Large#

This is the most common comparison. Here's the full breakdown:

Featuretext-embedding-3-smalltext-embedding-3-large
Default dimensions15363072
Adjustable dimensions256 – 1536256 – 3072
Max tokens8,1918,191
MTEB score62.364.6
Price / 1M tokens$0.020$0.130
Price via Crazyrouter$0.016$0.100
Relative qualityBaseline+3.7% better
Relative costBaseline6.5x more expensive
Storage (float32)6 KB/vector12 KB/vector

When to Choose text-embedding-3-small#

  • Budget-conscious projects
  • High-volume embedding workloads (millions of documents)
  • RAG applications where "good enough" retrieval is fine
  • Prototyping and development
  • Applications where latency matters (smaller vectors = faster similarity search)

When to Choose text-embedding-3-large#

  • Search quality is the top priority and budget allows it
  • Legal, medical, or financial domains where precision matters
  • Small document collections where the 6.5x cost difference is negligible
  • You can use dimension reduction (e.g., 1024 dims) to get large-model quality at reduced storage

The Dimension Reduction Trick#

text-embedding-3-large at 1024 dimensions often outperforms text-embedding-3-small at 1536 dimensions — while using less storage:

python
# Large model at reduced dimensions — often the sweet spot
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Your text here",
    dimensions=1024  # Reduced from 3072
)
# Better quality than small@1536, less storage than small@1536

Text-Embedding-3-Small vs Ada-002#

text-embedding-ada-002 was OpenAI's previous generation embedding model. Here's why you should migrate:

Featuretext-embedding-3-smalltext-embedding-ada-002
MTEB score62.361.0
Price / 1M tokens$0.020$0.100
Dimensions1536 (adjustable)1536 (fixed)
Dimension reduction✅ Yes❌ No
Multilingual (MIRACL)44.031.4

text-embedding-3-small is 5x cheaper, higher quality, and supports dimension reduction. There's no reason to stay on ada-002.

Migration note: Embeddings from different models are not compatible. Switching requires re-embedding all your documents.

Text-Embedding-3-Small Multilingual Support#

text-embedding-3-small supports multilingual text natively. On the MIRACL multilingual benchmark, it scores 44.0 — a massive improvement over ada-002's 31.4.

Supported languages include English, Chinese, Japanese, Korean, Spanish, French, German, Portuguese, Russian, Arabic, Hindi, and many more.

python
# Multilingual embeddings — same API, no configuration needed
texts = [
    "How does machine learning work?",          # English
    "机器学习是如何工作的?",                      # Chinese
    "機械学習はどのように機能しますか?",            # Japanese
    "¿Cómo funciona el aprendizaje automático?",  # Spanish
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# Cross-lingual similarity works out of the box
import numpy as np
embeddings = [item.embedding for item in response.data]
en_embedding = np.array(embeddings[0])
zh_embedding = np.array(embeddings[1])

similarity = np.dot(en_embedding, zh_embedding) / (
    np.linalg.norm(en_embedding) * np.linalg.norm(zh_embedding)
)
print(f"EN-ZH similarity: {similarity:.4f}")  # High similarity for same meaning

For applications that are primarily multilingual (100+ languages, cross-lingual retrieval at scale), also consider Cohere embed-v4 or the open-source BGE-M3.

Is Text-Embedding-3-Small Deprecated?#

No. As of 2026, text-embedding-3-small is fully active and supported by OpenAI. It is not deprecated and there is no announced deprecation date.

The model that is deprecated is the older text-embedding-ada-002. OpenAI recommends migrating from ada-002 to the text-embedding-3 series.

Timeline:

  • text-embedding-ada-002: Released December 2022, now legacy
  • text-embedding-3-small: Released January 2024, current recommended model
  • text-embedding-3-large: Released January 2024, current premium model

Building Semantic Search with Text-Embedding-3-Small#

Here's a complete working example:

python
from openai import OpenAI
import numpy as np

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://api.crazyrouter.com/v1"
)

# Step 1: Embed your document collection
documents = [
    "Python is a high-level programming language known for its readability.",
    "JavaScript is the language of the web, running in every browser.",
    "Rust provides memory safety without garbage collection.",
    "Go was designed at Google for building scalable network services.",
    "TypeScript adds static typing to JavaScript for better tooling.",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
doc_embeddings = np.array([item.embedding for item in response.data])

# Step 2: Search by meaning
query = "Which language is best for web development?"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=query
)
query_embedding = np.array(query_response.data[0].embedding)

# Step 3: Rank by cosine similarity
similarities = np.dot(doc_embeddings, query_embedding) / (
    np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)

# Step 4: Return top results
ranked = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)
print("Search results:")
for idx, score in ranked[:3]:
    print(f"  [{score:.4f}] {documents[idx]}")

Scaling with a Vector Database#

For production workloads with millions of documents, use a vector database instead of in-memory numpy:

python
# Example with Pinecone
import pinecone

pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("my-embeddings")

# Upsert embeddings
vectors = [
    {"id": f"doc_{i}", "values": emb.tolist(), "metadata": {"text": doc}}
    for i, (doc, emb) in enumerate(zip(documents, doc_embeddings))
]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=query_embedding.tolist(),
    top_k=3,
    include_metadata=True
)
for match in results.matches:
    print(f"  [{match.score:.4f}] {match.metadata['text']}")

Other popular vector databases that work well with text-embedding-3-small:

  • Pinecone: Fully managed, easiest to start
  • Weaviate: Open source, supports hybrid search
  • Qdrant: Open source, high performance
  • ChromaDB: Lightweight, great for prototyping
  • pgvector: PostgreSQL extension, no new infrastructure needed

Performance Benchmarks#

MTEB (Massive Text Embedding Benchmark)#

ModelAverageClassificationClusteringRetrieval
text-embedding-3-small62.367.141.251.7
text-embedding-3-large64.669.844.155.4
text-embedding-ada-00261.066.340.149.3
Cohere embed-v466.271.546.356.8
Voyage voyage-3-large67.172.047.258.1

Multilingual (MIRACL)#

ModelMIRACL Score
text-embedding-3-small44.0
text-embedding-3-large54.9
text-embedding-ada-00231.4

Latency#

text-embedding-3-small is one of the fastest commercial embedding models. Typical latency through Crazyrouter:

Batch SizeAvg Latency
1 text~50ms
10 texts~80ms
100 texts~200ms
1000 texts~800ms

Best Practices#

  1. Batch requests: Send multiple texts per API call (up to 2,048) to reduce overhead
  2. Cache embeddings: Never re-embed the same text — store results in a vector database or cache
  3. Normalize vectors: The API returns normalized vectors by default, but verify if using dimension reduction
  4. Choose dimensions wisely: Start with 1536, reduce only if storage or latency is a real constraint
  5. Use consistent models: Never mix embeddings from different models in the same index
  6. Chunk long documents: Split texts over 8,191 tokens with overlap for context continuity
  7. Monitor token usage: Track usage.total_tokens in responses to manage costs

Frequently Asked Questions#

What is text-embedding-3-small?#

text-embedding-3-small is OpenAI's cost-effective text embedding model. It converts text into 1536-dimensional numerical vectors that capture semantic meaning, enabling applications like semantic search, RAG, classification, and clustering.

How much does text-embedding-3-small cost?#

0.020per1milliontokensonOpenAIdirectly.Through[Crazyrouter](https://crazyrouter.com),itcosts0.020 per 1 million tokens on OpenAI directly. Through [Crazyrouter](https://crazyrouter.com), it costs 0.016 per 1M tokens — 20% cheaper. Embedding 10,000 pages of text costs roughly $0.15.

What is the token limit for text-embedding-3-small?#

8,191 tokens per input string. That's approximately 6,100 words or 24,000 characters in English. Text exceeding this limit is silently truncated.

How many dimensions does text-embedding-3-small output?#

1,536 dimensions by default. You can reduce this to any value between 256 and 1,536 using the dimensions parameter in the API request.

Is text-embedding-3-small deprecated?#

No. It is fully active and supported as of 2026. The deprecated model is the older text-embedding-ada-002.

What's the difference between text-embedding-3-small and text-embedding-3-large?#

text-embedding-3-large outputs 3,072 dimensions (vs 1,536), scores 3.7% higher on MTEB benchmarks, and costs 6.5x more (0.13vs0.13 vs 0.02 per 1M tokens). For most applications, the small model is sufficient.

Does text-embedding-3-small support multilingual text?#

Yes. It handles multiple languages natively and scores 44.0 on the MIRACL multilingual benchmark. No special configuration is needed — just pass text in any supported language.

Can I use text-embedding-3-small with LangChain?#

Yes. LangChain has built-in support:

python
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key="your-crazyrouter-key",
    openai_api_base="https://api.crazyrouter.com/v1"
)

vectors = embeddings.embed_documents(["Your text here"])

Can I use text-embedding-3-small with LlamaIndex?#

Yes:

python
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key="your-crazyrouter-key",
    api_base="https://api.crazyrouter.com/v1"
)

How does text-embedding-3-small compare to free/open-source models?#

Open-source models like BGE-M3 (MTEB 63.2) and E5-Mistral-7B (MTEB 66.6) can match or exceed its quality. The trade-off is you need GPU infrastructure to run them. text-embedding-3-small wins on convenience and total cost for small-to-medium workloads.

Summary#

text-embedding-3-small is the default choice for production embedding workloads in 2026. At 0.02/1Mtokens(or0.02/1M tokens (or 0.016 via Crazyrouter), it delivers strong quality across search, RAG, and classification — with the flexibility of dimension reduction and multilingual support.

Choose text-embedding-3-small when: you want the best price-to-performance ratio for embedding workloads.

Choose text-embedding-3-large when: you need maximum quality and the 6.5x cost increase is acceptable.

Access it through Crazyrouter for 20% lower pricing and a unified API that also gives you GPT-5, Claude, Gemini, and 300+ other models — all with one API key.

Related Articles