Prompting Veo: Shot Descriptions That Actually Land

Veo rewards structure. Name the camera move first, the subject second, the light last. Here is the five slot template that cuts rerender cost in half.

By veo4api editorial/Apr 18, 2026/4 min read

Veo does not respond to vibes. It responds to structure. The prompts that land on the first take share a specific shape, and once you see it you cannot unsee it. This is a working template you can apply today against Veo 3.1 at $0.40 per second at 1080p, and it should carry over to Veo 4 when the endpoint ships since the parameter names are expected to stay the same.

The template has five slots in this order: camera, subject, action, environment, light. Name them in that order in your prompt, and the model builds the shot in the same order internally. Name them out of order and you get averaged mush.

Here is the shape. Camera: what the lens is doing. Subject: who or what is in frame. Action: what they are doing. Environment: where and when. Light: quality, direction, color temperature. That is it. No adjectives hunting for a noun. Every word earns its place.

Annotated shot description broken into the five slot template for Veo prompting

A worked example. Bad prompt first: "beautiful cinematic shot of a woman walking through a forest, golden hour, stunning light." Veo reads that as four competing directives with no hierarchy. The woman drifts. The forest warps. The light is either too much or not enough. You rerender three times and give up.

Now the structured version: "slow tracking shot from behind, a woman in a wool coat, walking between tall pine trunks, late autumn forest with fog in the middle distance, low sun backlighting the fog into soft gold." Same length. Completely different result. The camera is locked. The subject is specific. The action is a single verb. The environment has two details, not ten. The light has direction and color, not a feeling.

Here is the call. Swap in your own five slot prompt and run it.

Plate / JAVASCRIPT example.ts

1import { fal } from "@fal-ai/client";
2
3// or fal-ai/veo4/text-to-video once available
4const result = await fal.subscribe("fal-ai/veo3.1/text-to-video", {
5  input: {
6    prompt: "slow tracking shot from behind, a woman in a wool coat, walking between tall pine trunks, late autumn forest with fog in the middle distance, low sun backlighting the fog into soft gold",
7    aspect_ratio: "16:9",
8    duration: "8s",
9    resolution: "1080p",
10    generate_audio: false
11  },
12  logs: true
13});
14
15console.log(result.data.video.url);

A few rules to keep the slots tight.

One verb per slot. "Walking, looking around, brushing her hair back" is three shots, not one. Pick one. If you want the others, shoot them as separate clips and cut them together. Veo at 8 second max is not a scene engine. It is a shot engine.

Two adjectives per noun, tops. "A red weathered metal door" is fine. "A beautifully ornate deeply red weathered old rusted metal door with carved details" is a coin flip on what you get back. Pick the two most load bearing adjectives and drop the rest.

Specify the lens or do not. If you say "35mm lens" or "wide angle" the model will honor it roughly. If you leave it out, the model picks one based on the other slots. What you should not do is give conflicting signals, like "close up" and "wide shot" in the same prompt. Pick a framing and commit.

Three takes of the same prompt showing how structure tightens the output

Now the light slot, because it is where most prompts fall apart. Bad: "good lighting." Slightly better: "cinematic lighting." Good: "low sun backlight, warm key on the face, cool fill in the shadows." You are giving the model a lighting plan, not a mood. Mood comes out of the plan. If you skip the plan you get whatever the model averages from the rest of the prompt, and that is usually flat.

One more thing. If you want a specific color palette, put it in the environment slot, not the light slot. "Late autumn forest with rust and ochre ground cover" will do more for your color than any "warm tones" note. Veo reads the physical world better than it reads abstract color theory.

This template is not the only way to prompt Veo. It is the way that cuts your rerender count. On a run of 20 clips, you can expect to land maybe eight on the first take with loose prompts. With the five slot structure, that number goes up to roughly fifteen. At $0.40 per second and 8 second clips, that is the difference between spending $96 and spending $48 to get your 20 good takes. Worth the extra thirty seconds of writing.

Return to the archive

Prompting Veo: Shot Descriptions That Actually Land

Veo 4 vs Veo 3.1: A Decision Tree

Multi-Shot Sequences in Veo: Continuity Tricks

Veo Pricing Math: Budget a Cinematic Trailer End to End