Cinematography Terms Every AI Video Prompter Should Know
A working vocabulary of shots, angles, camera movement, lenses, and lighting that AI video models actually respond to. With prompt examples you can copy.
Cinematography Terms Every AI Video Prompter Should Know
Most AI video prompts fail in a predictable way. The writer describes the scene like they are explaining it to a friend on a phone call. All adjectives and vibes, no spatial language. The model does its best, the result looks generic, and the writer concludes that AI video is overhyped.
This is not an AI problem. It is a vocabulary problem.
Modern video models, Veo 3, Sora 2, Runway Gen-4 and the rest, were trained on a century of film grammar. They understand "low angle close-up on the protagonist, anamorphic 50mm, shallow depth of field, golden hour rim light" with surprising precision. They struggle with "cool shot of a guy looking sad." The difference is not the length of the prompt. It is whether the words map to anything in the training data.
The fix is borrowing the words a cinematographer would use. You do not need to memorize a film school glossary. You need maybe twenty terms, organized into five buckets. That is what this post is.
Why specific vocabulary outperforms vague description
Every cinematography term is a coordinate in the model's latent space. When you write "low angle shot," the model has seen tens of thousands of examples of low angle shots and knows exactly what visual properties to produce: the camera tilts upward, the subject looks taller and more imposing, the ceiling or sky takes up most of the upper frame. When you write "shot from below to make him look powerful," the model does its best to translate that into the same idea, but you have introduced ambiguity and lost precision.
The same logic applies across every category in this guide. Specific terms compress more visual information into fewer words and produce more consistent results. They also let you iterate. If your output is close but the shot feels too wide, you change "wide shot" to "medium shot" and rerun. With vague description, you do not know what to change because you do not know what each adjective is doing.
Shot sizes (how much of the subject is in frame)
Shot sizes describe the framing distance between the camera and the subject. Five terms cover roughly 95% of the cases you will write for.
Extreme wide shot (EWS): the subject is small in the frame, surrounded by environment. Used to establish a location. Example: "extreme wide shot of a lone figure walking across a salt flat at sunset."
Wide shot (WS) or long shot: the subject is fully visible head to toe, with room around them. Example: "wide shot of a chef working at a steel counter in an industrial kitchen."
Medium shot (MS): the subject is framed from roughly the waist up. Most dialogue and product shots live here.
Close-up (CU): the subject's face fills most of the frame, from chin to top of head. The shot for emotion.
Extreme close-up (ECU): a single feature fills the frame. An eye, a finger pressing a button, a drop of water on a leaf. Used sparingly for impact.
One practical tip: AI models tend to default to medium-ish framing unless you specify. If you want anything else, say it explicitly.
Camera angles (where the camera sits relative to the subject)
Angles change the emotional read of a shot far more than most people realize.
Eye level: the camera sits at the subject's eye height. Neutral, conversational, the default human perspective.
Low angle: the camera looks up at the subject. Makes them feel taller, dominant, heroic, sometimes threatening.
High angle: the camera looks down on the subject. Makes them feel smaller, vulnerable, contained.
Dutch angle (or canted angle): the camera is rotated so the horizon tilts. Used to signal unease, tension, dream logic.
Bird's eye view (or overhead): straight down from directly above. Powerful for choreography, food shots, maps coming alive.
Worm's eye view: straight up from ground level. Dramatic and rare.
You can combine these. "Low angle medium shot" means waist-up framing with the camera tilted upward. "High angle close-up" looks down on a face. Mixing the categories is where prompts start to feel directed instead of described.
Camera movement (what the camera is doing during the shot)
A static frame is not the only option. AI models handle motion remarkably well now, but only if you tell them what kind.
Static: the camera does not move. Sometimes worth saying explicitly to prevent the model from adding drift.
Pan: the camera rotates horizontally while staying in place. Useful for revealing a landscape or following a subject across a room.
Tilt: vertical rotation. Looking up a tall building. Looking down to reveal a body of water below.
Dolly: the camera physically moves toward or away from the subject on a track. "Dolly in" pushes closer. "Dolly out" pulls back. Different from zoom.
Tracking shot: the camera moves alongside the subject, often at the same speed. Used for walking-and-talking, chase sequences, parallax-rich reveals.
Crane (or jib): the camera moves vertically through space, often combined with horizontal motion. Great for sweeping reveals.
Handheld: the camera shakes with operator movement. Documentary feel, intimacy, urgency.
Gimbal (or steadicam): smooth motion that floats through space. Modern, polished, expensive-looking.
One thing to watch for: AI models can get over-enthusiastic with motion if you give them ambiguous instructions. "Cinematic camera movement" tends to produce dramatic dolly-and-crane sequences that may not fit the scene. Be specific about what kind of movement and how much.
Lens choices (focal length and its effect)
Focal length changes how space gets compressed in the image. This is the most underused category in AI prompts.
Wide angle (14mm to 35mm): exaggerates space and depth. Things close to the lens look big, things far away look small. Great for cramped interiors that need to feel bigger, and for action.
Normal (around 50mm): roughly matches the perspective of human vision. Neutral, naturalistic, the workhorse focal length.
Short telephoto (85mm to 135mm): compresses space, flatters faces, separates the subject from the background. The classic portrait range.
Telephoto (200mm and up): heavy compression, strong subject isolation, backgrounds turn into soft fields of color. Wildlife, sports, surveillance moods.
Anamorphic: technically a lens type rather than a focal length, but the look is distinct. Wide screen aspect ratio, oval bokeh, characteristic blue-streak lens flares. Cinematic shorthand.
A simple guideline: if you want intimacy, reach for 85mm to 135mm. If you want immersion or environmental drama, go wider. If you want a Christopher Nolan look, ask for anamorphic explicitly.
Depth of field (what is in focus)
Depth of field controls how much of the image is sharp from front to back. It is a single dial with two ends.
Shallow depth of field: only a thin slice of the image is sharp, usually the subject. Everything else blurs into bokeh. Looks intimate, expensive, modern.
Deep depth of field: most or all of the image is sharp. Looks documentary, observational, classical.
Rack focus: the focus shifts from one subject to another during the shot. Used for narrative reveals.
For most creator-style content, shallow depth of field with the subject sharp and the background blurred is the safe default. For wider environmental shots where the location is part of the story, deep focus reads better.
Lighting (the most underrated lever)
Lighting tells the audience how to feel about a scene before any acting happens. A handful of terms unlock most of the looks people associate with film.
Key light: the main light on the subject.
Fill light: a softer secondary light that reduces shadow contrast on the opposite side.
Back light (or rim light): a light placed behind the subject that creates a glowing edge around the silhouette.
High key: bright, low contrast, lots of fill. Sitcom, beauty, commercials.
Low key: dark, high contrast, deep shadows. Drama, noir, horror.
Practical lighting: the visible light sources are inside the scene. Lamps, candles, neon signs. Reads natural and modern.
Motivated lighting: the off-screen lights appear to come from a plausible on-screen source. Helps the scene feel grounded.
Golden hour: the soft warm light roughly one hour after sunrise or before sunset. Long shadows, glowing skin.
Blue hour: the brief window of cool ambient light after sunset, before full darkness. Quiet, moody, urban.
If you only learn one lighting term, learn rim light. Asking for "warm rim light from behind the subject, soft fill from the front" upgrades any portrait shot dramatically.
Putting it together: prompt examples
Here is a generic vague prompt and three variations using the vocabulary above. Same scene, very different outputs.
Generic: "A guy sitting at a bar looking thoughtful."
Better, intimate: "Medium close-up on a man at a bar, eye level, 85mm, shallow depth of field, soft warm practical lighting from a neon sign behind him, low key, slight handheld drift."
Better, cinematic: "Wide shot of a man alone at a long mahogany bar, low angle, anamorphic 50mm, deep focus, golden hour light spilling through tall windows, slow dolly in."
Better, surreal: "Extreme close-up on a man's eye reflecting bar lights, Dutch angle, telephoto 135mm, extremely shallow depth of field, rim light from above, the rest of the frame in shadow."
The same character. Three very different films, just from word choices.
Common mistakes to avoid
A few patterns I see often when reviewing prompts.
Over-stacking adjectives: "epic cinematic dramatic hyperdetailed beautiful stunning." These do not stack the way the writer hopes. Pick one or two precise terms and trust them. The model has its own bias toward beauty.
Contradicting yourself: "extremely shallow depth of field, everything in sharp focus." The model will average the contradiction and produce neither.
Asking for impossible camera physics: "low angle bird's eye view." Pick one.
Vague modifiers attached to specific terms: "kind of low angle." Either it is low or it is not. Specificity is the whole point of using the term.
Forgetting the subject: prompts that are 80% camera language and 20% subject description tend to look beautifully framed and oddly empty. Cinematography is in service of a subject, not a replacement for one.
A working quick-reference
For practical use, here is the shortest possible cheat sheet to keep in a notes app:
- Pick a shot size (wide, medium, close-up, extreme close-up)
- Pick an angle (eye level, low, high, Dutch)
- Pick a movement (static, pan, dolly in, tracking, handheld)
- Pick a lens (wide, 50mm, 85mm, anamorphic)
- Pick a depth of field (shallow, deep)
- Add lighting (golden hour, rim light, low key, etc.)
Roughly one selection from each row. Then describe the actual subject and action in plain language. That structure produces remarkably consistent output across models.
Where to practice
If you want to test these terms without running through paid generations, Reeprompt drafts video prompts you can iterate on quickly, and PromptForge gives you structured JSON output for the more rigorous Veo and Sora workflows. Both are free.
The fastest way to internalize this vocabulary is to take a prompt that did not produce what you wanted, identify which category was vague (usually angle, lens, or lighting), swap in a specific term from the lists above, and rerun. After ten or twenty iterations the vocabulary starts to feel native, and your prompts get shorter rather than longer. Which is the right outcome. The goal is not flowery language. The goal is precision.
Build a structured prompt.
Free with daily credits. The right tool for what you just read.
Related reading
Other articles
ai-prompts
Advanced Prompt Engineering for Viral Video Content in 2025: Psychological Triggers & Algorithm Mastery
Master advanced AI video prompt engineering techniques that drive viral content in 2025. Learn psychological triggers, algorithm optimization, and platform-specific strategies.
13 min read
ai-prompts
Complete Guide to AI Video Prompts in 2025: From Beginner to Expert
Master AI video prompt engineering with our comprehensive 2025 guide. Learn advanced techniques, avoid common mistakes, and create viral content with ReezoAI.
11 min read
ai-prompts
How Gemini Omni Changes the Way You Write Short-Form Video Prompts
Google launched Gemini Omni at I/O 2026. It accepts image, audio, video, and text as one prompt and writes video from that input directly. Here is what that changes for short-form creators.
11 min read