Login
Back to Blog
"Google Veo 3 API Guide: Video Generation with Audio for Developers"

"Google Veo 3 API Guide: Video Generation with Audio for Developers"

C
Crazyrouter Team
May 5, 2026
1 viewsEnglishTutorial
Share:

Google Veo 3 API Guide: Video Generation with Audio for Developers#

Google Veo 3 changed the game for AI video generation by being the first model to generate synchronized audio alongside video. No more stitching together separate video and audio pipelines — Veo 3 produces complete audiovisual content from a single text prompt.

This guide covers everything developers need to know about integrating Veo 3 into production applications as of May 2026.

What Is Google Veo 3?#

Veo 3 is Google DeepMind's latest video generation model. Key capabilities:

  • Native audio generation — dialogue, sound effects, ambient audio synchronized with video
  • Up to 8 seconds of high-quality video at 720p or 1080p
  • Cinematic quality — realistic lighting, physics, and human motion
  • Text-to-video — generate from natural language prompts
  • Image-to-video — animate a reference image

What sets Veo 3 apart from competitors (Runway Gen-4, Kling 2.1, Sora) is the integrated audio. Other models generate silent video that requires separate audio processing.

API Access Options#

MethodEndpointPricingRate Limits
Google AI Studiogenerativelanguage.googleapis.comFree tier + paid5 req/min
Vertex AIus-central1-aiplatform.googleapis.comEnterprise pricingHigher limits
Crazyroutercrazyrouter.com/v140-60% cheaperPooled limits

Getting Started with Google AI Studio#

python
import google.generativeai as genai
import time

genai.configure(api_key="your-google-api-key")

# Generate video with audio
model = genai.GenerativeModel("veo-3")

response = model.generate_content(
    "A barista making a latte in a cozy coffee shop. The espresso machine hisses, "
    "milk steams, and soft jazz plays in the background. Morning light streams through "
    "the window. Cinematic, shallow depth of field.",
    generation_config={
        "response_modalities": ["video"],
        "video_config": {
            "resolution": "720p",
            "duration_seconds": 6,
            "include_audio": True
        }
    }
)

# Get the generated video
video = response.candidates[0].content.parts[0]
with open("output.mp4", "wb") as f:
    f.write(video.data)

Using Vertex AI (Production)#

python
from google.cloud import aiplatform
from google.protobuf import struct_pb2

aiplatform.init(project="your-project", location="us-central1")

endpoint = aiplatform.Endpoint(
    endpoint_name="publishers/google/models/veo-3"
)

instance = struct_pb2.Struct()
instance.update({
    "prompt": "A drone shot flying over a tropical beach at sunset. "
              "Waves crash on the shore, seagulls call overhead. "
              "4K cinematic quality.",
    "video_config": {
        "duration_seconds": 8,
        "resolution": "1080p",
        "fps": 24,
        "include_audio": True,
        "audio_config": {
            "include_dialogue": False,
            "include_sfx": True,
            "include_music": True
        }
    }
})

response = endpoint.predict(instances=[instance])
video_bytes = response.predictions[0]["video"]

Using Crazyrouter (OpenAI-Compatible)#

python
from openai import OpenAI
import base64

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

# Crazyrouter wraps Veo 3 in OpenAI-compatible format
response = client.chat.completions.create(
    model="veo-3",
    messages=[{
        "role": "user",
        "content": "Generate a 6-second video: A cat knocking a glass off a table "
                   "in slow motion. The glass shatters on the floor. Include sound effects."
    }],
    extra_body={
        "video_config": {
            "duration_seconds": 6,
            "resolution": "720p",
            "include_audio": True
        }
    }
)

# Response contains base64 video
video_data = base64.b64decode(response.choices[0].message.content)
with open("cat_video.mp4", "wb") as f:
    f.write(video_data)

Prompt Engineering for Veo 3#

Veo 3 responds well to cinematic language. Here's a framework for effective prompts:

Prompt Structure#

code
[Subject] + [Action] + [Setting] + [Audio Description] + [Style/Camera]

Good Prompt Examples#

code
"A chef flambes a pan of shrimp in a professional kitchen. The flame 
whooshes up, oil sizzles, and the chef narrates 'perfectly caramelized.' 
Shot on 35mm film, warm color grading, medium close-up."

"Two friends laughing at a cafe table. One tells a joke, the other 
spits out their coffee. Background chatter and clinking cups. 
Handheld camera, natural lighting, indie film aesthetic."

"A thunderstorm rolling over a wheat field. Lightning cracks, thunder 
rumbles, rain intensifies. Time-lapse clouds, wide angle, dramatic 
contrast."

Audio Control Tips#

  • Explicitly describe sounds you want: "the door creaks open"
  • Specify dialogue in quotes: "she says 'hello there'"
  • Control music: "soft piano in the background" or "no background music"
  • Layer sounds: "birds chirping, distant traffic, footsteps on gravel"

Handling Rate Limits#

Veo 3 has strict rate limits due to compute intensity:

TierRequests/MinuteRequests/DayConcurrent
Free (AI Studio)2501
Paid (AI Studio)55003
Vertex AI102,0005
Crazyrouter81,0004

Retry Logic with Exponential Backoff#

python
import time
import random

def generate_with_retry(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = model.generate_content(prompt, ...)
            return response
        except Exception as e:
            if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
                wait = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Queue-Based Architecture for Production#

python
import asyncio
from collections import deque

class Veo3Queue:
    def __init__(self, max_concurrent=3, rpm_limit=5):
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.rpm_limit = rpm_limit
        self.request_times = deque()

    async def generate(self, prompt, config):
        # Rate limit check
        now = time.time()
        while len(self.request_times) >= self.rpm_limit:
            oldest = self.request_times[0]
            if now - oldest > 60:
                self.request_times.popleft()
            else:
                await asyncio.sleep(60 - (now - oldest))
                now = time.time()

        async with self.semaphore:
            self.request_times.append(time.time())
            return await self._call_api(prompt, config)

Veo 3 Pricing#

ProviderPer Video (6s, 720p)Per Video (8s, 1080p)Monthly (100 videos)
Google AI Studio$0.35$0.70$35-70
Vertex AI$0.30$0.60$30-60
Crazyrouter$0.15$0.35$15-35

Veo 3 vs Competitors#

FeatureVeo 3Runway Gen-4 TurboKling 2.1Sora
Native Audio
Max Duration8s10s10s20s
Max Resolution1080p4K1080p1080p
API Available
Image-to-Video
ConsistencyHighHighMediumHigh
Speed30-60s15-30s20-45s60-120s

FAQ#

How do I get access to Veo 3 API?#

Sign up at Google AI Studio (ai.google.dev) for free tier access, or use Vertex AI for production workloads. Crazyrouter provides access through an OpenAI-compatible endpoint with lower pricing.

Can Veo 3 generate dialogue?#

Yes. Veo 3 can generate spoken dialogue synchronized with lip movements. Include dialogue in quotes in your prompt. Quality is best for short phrases (under 10 words per speaker).

What's the maximum video length?#

8 seconds per generation. For longer videos, generate multiple clips and stitch them together. Consistency between clips requires careful prompt engineering or image-to-video with the last frame.

Is Veo 3 better than Runway Gen-4?#

For audio-inclusive content, yes — Veo 3's native audio generation eliminates a separate pipeline step. For pure visual quality at higher resolutions, Runway Gen-4 Turbo currently edges ahead with 4K support.

How much does Veo 3 cost per video?#

A 6-second 720p video costs approximately 0.35throughGoogleAIStudioor0.35 through Google AI Studio or 0.15 through Crazyrouter. Costs scale with duration and resolution.

Summary#

Google Veo 3 is the first production-ready video generation API with native audio. For developers building video content pipelines, it eliminates the complexity of separate audio synthesis and synchronization.

Access Veo 3 through Crazyrouter for 50%+ savings on per-video costs with the same quality and an OpenAI-compatible API format that simplifies integration.

Related Articles