"Google Veo 3 API Guide: Video Generation with Audio for Developers"

Google Veo 3 API Guide: Video Generation with Audio for Developers#

Google Veo 3 changed the game for AI video generation by being the first model to generate synchronized audio alongside video. No more stitching together separate video and audio pipelines — Veo 3 produces complete audiovisual content from a single text prompt.

This guide covers everything developers need to know about integrating Veo 3 into production applications as of May 2026.

What Is Google Veo 3?#

Veo 3 is Google DeepMind's latest video generation model. Key capabilities:

Native audio generation — dialogue, sound effects, ambient audio synchronized with video
Up to 8 seconds of high-quality video at 720p or 1080p
Cinematic quality — realistic lighting, physics, and human motion
Text-to-video — generate from natural language prompts
Image-to-video — animate a reference image

What sets Veo 3 apart from competitors (Runway Gen-4, Kling 2.1, Sora) is the integrated audio. Other models generate silent video that requires separate audio processing.

API Access Options#

Method	Endpoint	Pricing	Rate Limits
Google AI Studio	`generativelanguage.googleapis.com`	Free tier + paid	5 req/min
Vertex AI	`us-central1-aiplatform.googleapis.com`	Enterprise pricing	Higher limits
Crazyrouter	`crazyrouter.com/v1`	40-60% cheaper	Pooled limits

Getting Started with Google AI Studio#

python

import google.generativeai as genai
import time

genai.configure(api_key="your-google-api-key")

# Generate video with audio
model = genai.GenerativeModel("veo-3")

response = model.generate_content(
    "A barista making a latte in a cozy coffee shop. The espresso machine hisses, "
    "milk steams, and soft jazz plays in the background. Morning light streams through "
    "the window. Cinematic, shallow depth of field.",
    generation_config={
        "response_modalities": ["video"],
        "video_config": {
            "resolution": "720p",
            "duration_seconds": 6,
            "include_audio": True
        }
    }
)

# Get the generated video
video = response.candidates[0].content.parts[0]
with open("output.mp4", "wb") as f:
    f.write(video.data)

Using Vertex AI (Production)#

python

from google.cloud import aiplatform
from google.protobuf import struct_pb2

aiplatform.init(project="your-project", location="us-central1")

endpoint = aiplatform.Endpoint(
    endpoint_name="publishers/google/models/veo-3"
)

instance = struct_pb2.Struct()
instance.update({
    "prompt": "A drone shot flying over a tropical beach at sunset. "
              "Waves crash on the shore, seagulls call overhead. "
              "4K cinematic quality.",
    "video_config": {
        "duration_seconds": 8,
        "resolution": "1080p",
        "fps": 24,
        "include_audio": True,
        "audio_config": {
            "include_dialogue": False,
            "include_sfx": True,
            "include_music": True
        }
    }
})

response = endpoint.predict(instances=[instance])
video_bytes = response.predictions[0]["video"]

Using Crazyrouter (OpenAI-Compatible)#

python

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

# Crazyrouter wraps Veo 3 in OpenAI-compatible format
response = client.chat.completions.create(
    model="veo-3",
    messages=[{
        "role": "user",
        "content": "Generate a 6-second video: A cat knocking a glass off a table "
                   "in slow motion. The glass shatters on the floor. Include sound effects."
    }],
    extra_body={
        "video_config": {
            "duration_seconds": 6,
            "resolution": "720p",
            "include_audio": True
        }
    }
)

# Response contains base64 video
video_data = base64.b64decode(response.choices[0].message.content)
with open("cat_video.mp4", "wb") as f:
    f.write(video_data)

Prompt Engineering for Veo 3#

Veo 3 responds well to cinematic language. Here's a framework for effective prompts:

Prompt Structure#

code

[Subject] + [Action] + [Setting] + [Audio Description] + [Style/Camera]

Good Prompt Examples#

code

"A chef flambes a pan of shrimp in a professional kitchen. The flame 
whooshes up, oil sizzles, and the chef narrates 'perfectly caramelized.' 
Shot on 35mm film, warm color grading, medium close-up."

"Two friends laughing at a cafe table. One tells a joke, the other 
spits out their coffee. Background chatter and clinking cups. 
Handheld camera, natural lighting, indie film aesthetic."

"A thunderstorm rolling over a wheat field. Lightning cracks, thunder 
rumbles, rain intensifies. Time-lapse clouds, wide angle, dramatic 
contrast."

Audio Control Tips#

Explicitly describe sounds you want: "the door creaks open"
Specify dialogue in quotes: "she says 'hello there'"
Control music: "soft piano in the background" or "no background music"
Layer sounds: "birds chirping, distant traffic, footsteps on gravel"

Handling Rate Limits#

Veo 3 has strict rate limits due to compute intensity:

Tier	Requests/Minute	Requests/Day	Concurrent
Free (AI Studio)	2	50	1
Paid (AI Studio)	5	500	3
Vertex AI	10	2,000	5
Crazyrouter	8	1,000	4

Retry Logic with Exponential Backoff#

python

import time
import random

def generate_with_retry(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = model.generate_content(prompt, ...)
            return response
        except Exception as e:
            if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
                wait = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Queue-Based Architecture for Production#

python

import asyncio
from collections import deque

class Veo3Queue:
    def __init__(self, max_concurrent=3, rpm_limit=5):
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.rpm_limit = rpm_limit
        self.request_times = deque()

    async def generate(self, prompt, config):
        # Rate limit check
        now = time.time()
        while len(self.request_times) >= self.rpm_limit:
            oldest = self.request_times[0]
            if now - oldest > 60:
                self.request_times.popleft()
            else:
                await asyncio.sleep(60 - (now - oldest))
                now = time.time()

        async with self.semaphore:
            self.request_times.append(time.time())
            return await self._call_api(prompt, config)

Veo 3 Pricing#

Provider	Per Video (6s, 720p)	Per Video (8s, 1080p)	Monthly (100 videos)
Google AI Studio	$0.35	$0.70	$35-70
Vertex AI	$0.30	$0.60	$30-60
Crazyrouter	$0.15	$0.35	$15-35

Veo 3 vs Competitors#

Feature	Veo 3	Runway Gen-4 Turbo	Kling 2.1	Sora
Native Audio	✅	❌	❌	✅
Max Duration	8s	10s	10s	20s
Max Resolution	1080p	4K	1080p	1080p
API Available	✅	✅	✅	❌
Image-to-Video	✅	✅	✅	✅
Consistency	High	High	Medium	High
Speed	30-60s	15-30s	20-45s	60-120s

FAQ#

How do I get access to Veo 3 API?#

Sign up at Google AI Studio (ai.google.dev) for free tier access, or use Vertex AI for production workloads. Crazyrouter provides access through an OpenAI-compatible endpoint with lower pricing.

Can Veo 3 generate dialogue?#

Yes. Veo 3 can generate spoken dialogue synchronized with lip movements. Include dialogue in quotes in your prompt. Quality is best for short phrases (under 10 words per speaker).

What's the maximum video length?#

8 seconds per generation. For longer videos, generate multiple clips and stitch them together. Consistency between clips requires careful prompt engineering or image-to-video with the last frame.

Is Veo 3 better than Runway Gen-4?#

For audio-inclusive content, yes — Veo 3's native audio generation eliminates a separate pipeline step. For pure visual quality at higher resolutions, Runway Gen-4 Turbo currently edges ahead with 4K support.

How much does Veo 3 cost per video?#

A 6-second 720p video costs approximately $0.35 through Google AI Studio or$ 0.15 through Crazyrouter. Costs scale with duration and resolution.

Summary#

Google Veo 3 is the first production-ready video generation API with native audio. For developers building video content pipelines, it eliminates the complexity of separate audio synthesis and synchronization.

Access Veo 3 through Crazyrouter for 50%+ savings on per-video costs with the same quality and an OpenAI-compatible API format that simplifies integration.

"Google Veo 3 API Guide: Video Generation with Audio for Developers"