Be like water. The brand is the water. The model is the vessel. Each image and video model has its own shape — its own preferred prompt structure, its own preferred density, its own sweet spot well below the published context window. Desklight's per-model prompt adapter reshapes the brand DNA to fit each model's vessel without losing what's in it. Same water — every vessel — every render.
The naive read of a token limit is "the model truncates at the ceiling." That's the wrong mental model. Long before the bytes get cut off, the model stops following them. The published context window is the wall. The wall you actually crash into is somewhere in the middle of the room — and you don't hear the impact, you just get worse output.
This is settled research, not vibes. Liu et al. — Lost in the Middle (TACL 2024) — measured a U-shaped attention curve across long-context language models: instructions placed at the start and the end of a long prompt get followed; instructions placed in the middle get ignored, even by models advertised for long context. DetailMaster (2025) measured the same effect specifically for text-to-image: prompt adherence drops as length grows. The model still accepts the prompt. It just stops doing half of what it says.
The failure mode is consistent across providers. Stack three camera moves into one beat, and the model picks one and ignores the others — but you don't know which. Repeat the same instruction in different words across two paragraphs, and the model splits the difference. Drop a vague style word next to a specific one, and the specific one gets diluted. None of this throws. None of it logs. The bytes the model ignored were never logged anywhere — they just didn't shape the output.
So truncating to fit is the wrong operation. The right operation is reshaping to fit: every section of the brand DNA still flows through, but in the form the model actually uses. The token ceiling is the wall. The sweet spot is where the water stays whole.
The ranges below come from Desklight's own prompt-science testing against each model — calibrated against published guidance where providers share it, empirically verified where they don't. The prompt adapter reads the right range for the active model and reshapes the brand into it before the call ships.
| Model | Sweet spot | What blows past it |
|---|---|---|
| Reasoning image — current frontierGemini 3 Pro Image · Nano Banana family | ~200-400 tokens of dense, non-redundant prompt. Beyond ~500 tokens, diminishing returns. Beyond ~800, the thinking step starts producing contradictions. | Long prose prompts with stacked instruction blocks. Defensive enumerations of what should NOT appear. Hex codes injected as color references instead of descriptive language. |
| Premium image edit / generateGPT Image 2 | ~1,000-1,400 chars of structured 7-element prompt, scene-first. Hard ceiling around 2,000 chars / 500 tokens. | Re-stating the same constraint in three sentences instead of one. Stacking style adjectives that cancel each other ("editorial" + "iPhone-grade"). |
| Cinematic text-to-videoByteDance Seedance 2.0 | 60-100 words across the seven elements (Subject · Action · Camera · Setting · Style · Lighting · Audio). Above 100, internal contradictions. Above 150, drift is near-guaranteed. | The literal word "fast." Stacked camera moves on one beat. Filler adjectives like "beautiful," "amazing," "stunning" that don't direct the model. |
| Multi-shot dialogue videoGoogle Veo 3.1 | 50-120 words single-shot. Multi-shot via timestamp prompting (00:00-00:02 blocks each carrying their own beat). Hard token ceiling around 1,024. | Open-ended motion with no resolution clause. Vague camera direction when paired with a specific subject. Trigger words that fire the safety filter mid-render. |
| Speed-optimized image-to-videoKuaishou Kling 2.5 Turbo | 40-60 words for text-to-video. 15-40 words for image-to-video — motion-only, never re-describe what's already in the source frame. | Multiple camera moves on Turbo (single-move only). Re-describing the source image in the I2V prompt. More than five distinct nouns. |
| Verbose-tolerant cinematicLightricks LTX 2.3 | 4-8 sentences, prose-form. LTX's larger text connector absorbs more than most — official docs are explicit that longer consistently outperforms shorter here. One of the few models where adding detail doesn't hurt. | Bullet lists. Tag-soup style words doing the work ("epic," "cinematic"). Numerical over-constraint ("3 birds at 45°, pan 2°/sec"). |
| Photoreal motionWan 2.6 | 80-120 words, descriptive sentences (not labeled fields). Subject → Scene → Motion → Lighting → Camera → Stylization order. | Stacked camera moves. Tag-soup adjectives. The word "fast" tanks output the same way it tanks Seedance. |
Two patterns repeat. The sweet spots are tighter than most prompt builders assume — five of the seven model families top out at 120 words or less, with LTX as the verbose-tolerant exception. The failure modes don't throw. Past the sweet spot you get drift, contradiction, or silent fallthrough — never a clean error that points at the prompt. The adherence curve does its damage in the part of the request you can't see.
Same brand inputs, the same source photo from the model — different sides of the pipeline. Below: the unbranded canvas the photo model handed back, sized to the sweet spot for Gemini. Layered on top: the finished Desklight render — same source frame wrapped in the brand's typography, kicker, accent, and wordmark by the resolver, after the adapter kept the prompt where the model could actually follow it.
One layer between brand DNA and the model — the prompt adapter reads the active model's sweet spot, reshapes the assembled prompt to fit that vessel, and ships it. The brand water flows in the same every time. The vessel changes. The water never spills.
Every Desklight render starts the same way. Allie writes a calendar entry — headline, kicker, copy, photo direction. The brand profile contributes its DNA — voice register, palette, photography lane, mood keywords, never-list. The layout planner picks an archetype and a composition shape. All of that flows into a typed render spec.
The render spec is rich. Verbose, even — by design, because brand DNA is rich, and brands deserve full weight. But the model on the other end has a much smaller adherence range than the spec carries. So between the spec and the model sits the adapter: it knows the active model, looks up the published sweet spot, and reshapes the prompt to fit that exact vessel before any byte goes over the wire.
It is not the same prompt across models. A Gemini render gets the 6-element framework with descriptive color words. A GPT Image 2 render gets the 7-element scene-first structure. A Seedance render gets cinematic sentence-fragments labeled by element. A Kling image-to-video render gets motion-only direction with no static description because the source frame already carries the scene. The brand inputs are constant; the shape the prompt takes is what shifts.
Inside the adapter, every section of the assembled prompt carries a priority. The water has structure — what's load-bearing, what's flavor, what's redundant — and the adapter foregrounds the right parts first:
The adapter logs every pass. In production, when the brand DNA had to compress to fit, the log line names exactly what reshaped: adapted 412/425 chars · collapsed=[lead] distilled=[style, composition]. The work is visible — not silent — so when a render comes back unexpected we can read backwards from the log to the prompt to the input. Reshaping is auditable. Drift is not.
A four-second clip wants one beat. An eight-second clip can carry a setup and a payoff. A twelve-second clip can hold a small narrative arc. The adapter's video budgets grow with duration so longer clips have headroom for additional beats — capped at each model's published sweet spot ceiling.
Video models reward duration-aware prompting. Every published prompt-engineering guide for the major video models notes the same observation: each shot adds roughly two seconds to the rendered clip, and each beat of action wants its own line of prompt. A clip with four shots wants four lines; a clip with one beat wants one. Cramming four shots' worth of direction into a four-second clip produces a confused, jump-cut result — the model picks the moves it can fit and skips the rest.
So the adapter's video budgets aren't flat per-model values. They're a base scaffolding cost plus a per-second of clip duration allowance, capped at the published sweet spot ceiling. A five-second Seedance clip lands well below the maximum. A ten-second clip has the headroom to carry a setup, a turn, and a resolution. A fifteen-second clip — past most models' rendered duration cap anyway — gets clamped at the ceiling because more words past that point produce more contradictions, not more story.
The same per-second logic applies to all five video models in active rotation, but the curves are different. Kling Turbo's ceiling is the lowest because its sweet spot is the tightest. Veo's ceiling is the highest because its multi-shot timestamp prompting can carry the most narrative density. LTX accepts longer prose because its text connector is built for it. The adapter reads the right curve for the right model.
The most insidious symptom of an un-adapted prompt isn't an error. It's silence. Three failure modes that the adapter prevents — and that, without it, you'd notice only by squinting at the output:
Provider-side fallthrough. Some image APIs respond to over-budget prompts by returning a generic processing error. Most provider stacks treat that error as a transient and fall through to the next model in their chain. The user picked Model A in their workspace settings; Model A choked on the verbose prompt; the chain quietly produced output from Model B. The render came back. The settings UI still says Model A. The actual model that ran is buried in a log entry the user will never read. Same brand, different model, slower latency, different look — and no signal.
Drift in long-running campaigns. If post one in a week-long campaign barely fit the sweet spot, and post two added one more reference image (which the prompt now describes), and post three added a new mood keyword (which gets injected into the Style section), by the time you reach post four the prompt has crept past the ceiling. The model still renders. The output is still on-brand-ish. But the brand's photography lane has slowly bled into stock-photographic territory and you can't point at when it happened.
Wasted spend on the wrong model. When the chain falls through to a more expensive provider — a premium-rail model that costs three to four times what the user picked — the bill arrives without warning. The user thinks they're spending five dollars per render and discovers a hundred-dollar week. The fix isn't to cap the wallet (we already do); it's to make sure the wallet is funding the model the user picked.
The adapter is the fix for all three. The active model gets a prompt shaped to its own sweet spot. The active model runs. The render is consistent. The wallet is charged at the rate the user expects. The settings UI tells the truth.
Models are commodity. Prompts shaped to those models are not. The adapter is one of the layers that turns DeskLight from "we call image APIs" into "every brand looks like itself, every render, on every model in rotation."
The full architecture — the translation compressor on the brand-DNA-to-spec side, the prompt adapter on the spec-to-model side, validators-as-code on the agent side, exemplar-driven layout on the typography side — is what lets one calendar entry from Allie become a render that feels like the brand wrote it. The adapter is the piece nearest the model, doing the smallest, most surgical work in the chain. Most days you'd never notice it ran. Some days, like the day Gemini's preview models started rejecting our verbose prompts and falling through to a slower, pricier model nobody had asked for, it's the difference between a campaign shipping and a campaign feeling broken.
Every modern image and video model publishes a token ceiling — the hard limit above which the API rejects a request. The sweet spot is the much smaller window inside that ceiling where the model produces its highest-quality output. Below the floor, you miss key details. Above the ceiling, you get errors. Between them, in the sweet spot, the model produces clean, coherent, brand-consistent results.
Each model is a different vessel — its own preferred structure, its own preferred density, its own sweet spot. A Kling-shaped 60-word motion prompt fed to LTX produces a flat result. An LTX-shaped 200-word prompt fed to Kling Turbo trips its element-overload failure mode. The brand DNA is the water. The vessel changes per model. The prompt adapter reshapes the water to fit each one.
It doesn't truncate cleanly. It drifts. Liu et al. (Lost in the Middle, TACL 2024) measured a U-shaped attention curve: instructions at the start and end of a long context get followed; instructions in the middle get ignored — even by models advertised for long context. DetailMaster (2025) showed the same effect for text-to-image: adherence drops as length grows. The model still accepts the prompt. It just stops following half of it. You get worse output, not an error — and you don't see the drift because the bytes the model ignored were never logged.
Yes, within reason. A four-second clip wants one beat of action and one camera move. An eight-second clip can carry a setup and a payoff. A twelve-second clip can hold a small narrative arc. The adapter's video budgets grow with duration so longer clips have headroom for additional beats — capped at each model's published sweet spot ceiling.
Because effective context is much smaller than published context. Long prose prompts produce inconsistent results across renders, especially for brand work — past the sweet spot you get internal contradictions, competing camera moves, mismatched lighting, conflicting style cues. The model has more material to disagree with itself about. Tighter, denser, sweet-spot-sized prompts produce more consistent results. The adapter optimizes for adherence, not size.
The prompt adapter ships on every render in the live product today. Same water, every vessel — without anyone watching the prompt size or the model rail. Pay-as-you-go starts with a $5 credit. Your first brand goes live in 12 seconds.