
"Google Veo 3 API Guide: Video Generation with Audio for Developers"
Google Veo 3 API Guide: Video Generation with Audio for Developers#
Google Veo 3 changed the game for AI video generation by being the first model to generate synchronized audio alongside video. No more stitching together separate video and audio pipelines — Veo 3 produces complete audiovisual content from a single text prompt.
This guide covers everything developers need to know about integrating Veo 3 into production applications as of May 2026.
What Is Google Veo 3?#
Veo 3 is Google DeepMind's latest video generation model. Key capabilities:
- Native audio generation — dialogue, sound effects, ambient audio synchronized with video
- Up to 8 seconds of high-quality video at 720p or 1080p
- Cinematic quality — realistic lighting, physics, and human motion
- Text-to-video — generate from natural language prompts
- Image-to-video — animate a reference image
What sets Veo 3 apart from competitors (Runway Gen-4, Kling 2.1, Sora) is the integrated audio. Other models generate silent video that requires separate audio processing.
API Access Options#
| Method | Endpoint | Pricing | Rate Limits |
|---|---|---|---|
| Google AI Studio | generativelanguage.googleapis.com | Free tier + paid | 5 req/min |
| Vertex AI | us-central1-aiplatform.googleapis.com | Enterprise pricing | Higher limits |
| Crazyrouter | crazyrouter.com/v1 | 40-60% cheaper | Pooled limits |
Getting Started with Google AI Studio#
import google.generativeai as genai
import time
genai.configure(api_key="your-google-api-key")
# Generate video with audio
model = genai.GenerativeModel("veo-3")
response = model.generate_content(
"A barista making a latte in a cozy coffee shop. The espresso machine hisses, "
"milk steams, and soft jazz plays in the background. Morning light streams through "
"the window. Cinematic, shallow depth of field.",
generation_config={
"response_modalities": ["video"],
"video_config": {
"resolution": "720p",
"duration_seconds": 6,
"include_audio": True
}
}
)
# Get the generated video
video = response.candidates[0].content.parts[0]
with open("output.mp4", "wb") as f:
f.write(video.data)
Using Vertex AI (Production)#
from google.cloud import aiplatform
from google.protobuf import struct_pb2
aiplatform.init(project="your-project", location="us-central1")
endpoint = aiplatform.Endpoint(
endpoint_name="publishers/google/models/veo-3"
)
instance = struct_pb2.Struct()
instance.update({
"prompt": "A drone shot flying over a tropical beach at sunset. "
"Waves crash on the shore, seagulls call overhead. "
"4K cinematic quality.",
"video_config": {
"duration_seconds": 8,
"resolution": "1080p",
"fps": 24,
"include_audio": True,
"audio_config": {
"include_dialogue": False,
"include_sfx": True,
"include_music": True
}
}
})
response = endpoint.predict(instances=[instance])
video_bytes = response.predictions[0]["video"]
Using Crazyrouter (OpenAI-Compatible)#
from openai import OpenAI
import base64
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://crazyrouter.com/v1"
)
# Crazyrouter wraps Veo 3 in OpenAI-compatible format
response = client.chat.completions.create(
model="veo-3",
messages=[{
"role": "user",
"content": "Generate a 6-second video: A cat knocking a glass off a table "
"in slow motion. The glass shatters on the floor. Include sound effects."
}],
extra_body={
"video_config": {
"duration_seconds": 6,
"resolution": "720p",
"include_audio": True
}
}
)
# Response contains base64 video
video_data = base64.b64decode(response.choices[0].message.content)
with open("cat_video.mp4", "wb") as f:
f.write(video_data)
Prompt Engineering for Veo 3#
Veo 3 responds well to cinematic language. Here's a framework for effective prompts:
Prompt Structure#
[Subject] + [Action] + [Setting] + [Audio Description] + [Style/Camera]
Good Prompt Examples#
"A chef flambes a pan of shrimp in a professional kitchen. The flame
whooshes up, oil sizzles, and the chef narrates 'perfectly caramelized.'
Shot on 35mm film, warm color grading, medium close-up."
"Two friends laughing at a cafe table. One tells a joke, the other
spits out their coffee. Background chatter and clinking cups.
Handheld camera, natural lighting, indie film aesthetic."
"A thunderstorm rolling over a wheat field. Lightning cracks, thunder
rumbles, rain intensifies. Time-lapse clouds, wide angle, dramatic
contrast."
Audio Control Tips#
- Explicitly describe sounds you want: "the door creaks open"
- Specify dialogue in quotes: "she says 'hello there'"
- Control music: "soft piano in the background" or "no background music"
- Layer sounds: "birds chirping, distant traffic, footsteps on gravel"
Handling Rate Limits#
Veo 3 has strict rate limits due to compute intensity:
| Tier | Requests/Minute | Requests/Day | Concurrent |
|---|---|---|---|
| Free (AI Studio) | 2 | 50 | 1 |
| Paid (AI Studio) | 5 | 500 | 3 |
| Vertex AI | 10 | 2,000 | 5 |
| Crazyrouter | 8 | 1,000 | 4 |
Retry Logic with Exponential Backoff#
import time
import random
def generate_with_retry(prompt, max_retries=5):
for attempt in range(max_retries):
try:
response = model.generate_content(prompt, ...)
return response
except Exception as e:
if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
time.sleep(wait)
else:
raise
raise Exception("Max retries exceeded")
Queue-Based Architecture for Production#
import asyncio
from collections import deque
class Veo3Queue:
def __init__(self, max_concurrent=3, rpm_limit=5):
self.semaphore = asyncio.Semaphore(max_concurrent)
self.rpm_limit = rpm_limit
self.request_times = deque()
async def generate(self, prompt, config):
# Rate limit check
now = time.time()
while len(self.request_times) >= self.rpm_limit:
oldest = self.request_times[0]
if now - oldest > 60:
self.request_times.popleft()
else:
await asyncio.sleep(60 - (now - oldest))
now = time.time()
async with self.semaphore:
self.request_times.append(time.time())
return await self._call_api(prompt, config)
Veo 3 Pricing#
| Provider | Per Video (6s, 720p) | Per Video (8s, 1080p) | Monthly (100 videos) |
|---|---|---|---|
| Google AI Studio | $0.35 | $0.70 | $35-70 |
| Vertex AI | $0.30 | $0.60 | $30-60 |
| Crazyrouter | $0.15 | $0.35 | $15-35 |
Veo 3 vs Competitors#
| Feature | Veo 3 | Runway Gen-4 Turbo | Kling 2.1 | Sora |
|---|---|---|---|---|
| Native Audio | ✅ | ❌ | ❌ | ✅ |
| Max Duration | 8s | 10s | 10s | 20s |
| Max Resolution | 1080p | 4K | 1080p | 1080p |
| API Available | ✅ | ✅ | ✅ | ❌ |
| Image-to-Video | ✅ | ✅ | ✅ | ✅ |
| Consistency | High | High | Medium | High |
| Speed | 30-60s | 15-30s | 20-45s | 60-120s |
FAQ#
How do I get access to Veo 3 API?#
Sign up at Google AI Studio (ai.google.dev) for free tier access, or use Vertex AI for production workloads. Crazyrouter provides access through an OpenAI-compatible endpoint with lower pricing.
Can Veo 3 generate dialogue?#
Yes. Veo 3 can generate spoken dialogue synchronized with lip movements. Include dialogue in quotes in your prompt. Quality is best for short phrases (under 10 words per speaker).
What's the maximum video length?#
8 seconds per generation. For longer videos, generate multiple clips and stitch them together. Consistency between clips requires careful prompt engineering or image-to-video with the last frame.
Is Veo 3 better than Runway Gen-4?#
For audio-inclusive content, yes — Veo 3's native audio generation eliminates a separate pipeline step. For pure visual quality at higher resolutions, Runway Gen-4 Turbo currently edges ahead with 4K support.
How much does Veo 3 cost per video?#
A 6-second 720p video costs approximately 0.15 through Crazyrouter. Costs scale with duration and resolution.
Summary#
Google Veo 3 is the first production-ready video generation API with native audio. For developers building video content pipelines, it eliminates the complexity of separate audio synthesis and synchronization.
Access Veo 3 through Crazyrouter for 50%+ savings on per-video costs with the same quality and an OpenAI-compatible API format that simplifies integration.


