Prompt writing is the single biggest lever in AI image generation. Two users with the same tool, the same model, and the same monthly budget will produce wildly different work depending on how they write their prompts. The good news: this is a learnable skill, and most of it transfers across tools. The bad news: there is no magic phrase. There is only structure, specificity, and iteration.
This guide lays out how to write AI image prompts that actually produce the picture you had in mind — with five worked examples, a checklist, and the common failures that trip everybody up at the start.
The basic structure: subject, style, modifier, quality
Almost every effective prompt follows the same four-part structure, in roughly this order:
- Subject. What the image is of. A person, a scene, an object, a creature.
- Style. The visual register. Photograph, anime, oil painting, 3D render, watercolor.
- Modifiers. Composition, lighting, mood, color palette, framing.
- Quality cues. Sharpness, detail level, technical references.
This is not the only order that works, but it is the one that works most reliably across diffusion-based tools. The first few tokens carry the most weight, so naming the subject early gives the model the most signal.
A short example:
"A young woman with red hair, sitting on a wooden park bench, oil painting style, warm autumn light, shallow depth of field, detailed brush strokes, soft color palette."
That is twenty-three words. Subject (a young woman with red hair), placement (on a wooden park bench), style (oil painting), lighting (warm autumn light), composition (shallow depth of field), quality (detailed brush strokes), color guidance (soft color palette). Each phrase does one job.
Positive vs negative prompts
Most diffusion-based generators support both. The positive prompt is what you want; the negative prompt is what you do not want. Negative prompts are the most under-used lever in prompt writing — most beginners never use them, and most experienced users have a default negative prompt they paste into every generation.
A reasonable default negative prompt for portraits:
"blurry, deformed hands, extra fingers, extra limbs, low contrast, watermark, signature, cropped, out of frame, low quality, jpeg artifacts"
This is not a magic incantation; it is a list of common failure modes the model otherwise sometimes produces. The negative prompt steers it away from those without changing the positive prompt.
If a tool does not expose a negative prompt — DALL-E via ChatGPT, some smaller services — you fold the avoidance into the positive prompt indirectly ("clean composition, both hands clearly visible, sharp focus").
Weight and attention syntax
Some tools — Stable Diffusion forks, Charmloop's generator, Civitai, and most open-weight model wrappers — let you weight individual tokens. The two conventions you will see most often:
- Parentheses for emphasis.
(red hair:1.3) or ((red hair)) pushes the model harder toward red hair. Most implementations interpret a number above 1.0 as "more weight" and below 1.0 as "less weight."
- Square brackets for de-emphasis.
[background detail] reduces the weight on a token without removing it entirely.
Use this sparingly. Weighting every token defeats the purpose; the relative weights stop meaning anything. Use weight when one specific feature keeps getting ignored — a hair color the model defaults away from, a clothing detail the model keeps simplifying.
Tools without weight syntax — Midjourney's --iw flag works differently, DALL-E does not expose weights at all — handle emphasis through prompt ordering and repetition instead. "Red hair" mentioned at the start and again as "vibrant red curls" mid-prompt is the same effect with different syntax.
How style references actually work
When you write "in the style of Greg Rutkowski" or "anime style" or "Pixar 3D render," the model is matching to patterns in its training data. It is not "looking up" the artist. This has practical implications:
- Genuine style names work. Photography terms ("Kodak Portra 400," "Leica M10," "85mm portrait lens"), painting movements ("Art Nouveau," "Impressionist"), and broad style families ("anime," "cyberpunk concept art") are well-represented in training data and produce reliable results.
- Specific artist names are a moving target. Many models have been deliberately trained away from individual living artists' names. Results vary by tool. Civitai and self-hosted Stable Diffusion forks tend to preserve more artist-name signal than the closed proprietary models.
- Loaded style aesthetics — LoRAs, embeddings, fine-tunes — go further than prompt text. If a tool supports custom LoRAs (Civitai, Charmloop's higher tiers, self-hosted), a trained LoRA on a style does what a hundred style words in a prompt cannot. We have a longer guide on LoRAs for the deeper dive.
The honest summary: style words in the prompt get you most of the way. Custom LoRAs or fine-tuned models get you the rest.
Worked examples
Five prompts for five different image types. Each one breaks down the structure and the choices.
Example 1 — Portrait
"Close-up portrait of a Mediterranean woman in her early thirties, dark curly hair, brown eyes, soft natural makeup, beige linen shirt, photograph, Kodak Portra 400, soft window light from the left, shallow depth of field, sharp focus on eyes, detailed skin texture, neutral background."
Subject (specific person), style (photograph), composition (close-up, light from the left), quality (Kodak Portra 400, shallow depth of field, sharp focus). Notice the specificity — not "a woman" but "a Mediterranean woman in her early thirties." The model gives back what you give it.
Example 2 — Scene with no person
"A small wooden cabin in a snow-covered pine forest at dusk, warm amber light from the windows, smoke from the chimney, low-angle shot, fresh snowfall, cinematic atmosphere, soft volumetric lighting, painterly digital art, muted blue-and-orange color palette."
Subject (cabin in forest), time of day (dusk), atmospheric cues (warm windows, smoke), composition (low-angle), style (painterly digital art), color guidance (blue-and-orange palette). Scenes benefit from explicit time-of-day and weather; a generic "forest scene" lands generic.
Example 3 — Character (consistency-aware)
"A young man with tousled blond hair, blue eyes, freckles across the nose, late twenties, wearing a worn leather jacket over a gray t-shirt, three-quarter portrait, urban background slightly blurred, photograph, golden-hour side lighting, sharp focus, cinematic color grade."
This is the kind of prompt you'd use to define a character for repeated generation. Note the specificity of the face traits — tousled blond hair, blue eyes, freckles — because vague descriptors ("handsome man") produce a different person every run. Lock the traits early; iterate the scene around them. See our guide on making consistent AI characters for the full technique.
Example 4 — Product or object
"A vintage leather camera bag on a wooden desk, top-down view, soft overcast light from a window, beside a brass desk lamp, a notebook with leather cover, and a pair of round wire-frame glasses, photograph, neutral color grade, sharp focus, detailed leather texture, professional product photography."
Product shots benefit from explicit composition (top-down view), explicit context (the items around it), and explicit technical references (professional product photography). Lighting matters more for object work than for figure work — "soft overcast light from a window" reads completely differently from "harsh studio light."
Example 5 — Stylized illustration
"Anime illustration of a young woman riding a bicycle through cherry blossoms at sunset, flowing dark hair, white summer dress, cinematic composition with motion blur on the wheels, soft pastel color palette, painted background in the style of Makoto Shinkai, glowing rim light on the subject, detailed petals scattering in the wind."
Anime and other stylized work benefits from naming the stylistic family ("anime illustration") and citing recognizable directors or movements when the tool supports it. Cherry blossoms, sunset, motion blur — these are concrete details the model can render rather than vague mood words.
Common prompt failures
The patterns that ruin generations, ranked by how often they show up:
- Over-stuffing. Prompts that try to specify everything ("full-body shot, three-quarter portrait, close-up, dynamic composition…") confuse the model. Each modifier dilutes the others. Pick one composition, one style, one lighting setup.
- Contradictions. "Sunny day, dark moody atmosphere, bright cheerful color palette" is internally inconsistent. The model picks one or averages, badly. Read the prompt back to yourself; if any two phrases pull in opposite directions, cut one.
- Vague subjects. "A beautiful woman" generates a generic, vaguely-airbrushed person. "A tall woman in her late twenties with copper hair, freckles, and a slight asymmetric smile" generates a specific person. Specificity wins.
- Quality-modifier stuffing. "Masterpiece, best quality, ultra-detailed, 8k, photorealistic, hyperdetailed, sharp focus, professional photograph, award-winning, trending on ArtStation." This was the early-Midjourney pattern and most newer tools no longer reward it. Two or three quality modifiers do the same work as ten.
- Negative prompts longer than positive prompts. If your negative prompt is forty words and your positive prompt is fifteen, the negative is doing more steering than the positive. Trim the negative to the actual common failures, not a wishlist.
- Treating the first generation as the verdict. Diffusion is random. The first generation is a sample, not a result. Generate four or eight from the same prompt before judging the prompt itself.
The prompt checklist
A short list to run through before hitting generate:
- Is the subject named in the first eight words?
- Is there exactly one style/medium named?
- Is the composition stated (close-up, full body, top-down, etc.)?
- Is the lighting stated (golden hour, soft studio, etc.)?
- Are there two to four quality modifiers, not ten?
- Does the negative prompt list actual failure modes you have seen, not generic wishlist items?
- Are there any contradictions (e.g. sunny + moody, sharp + dreamy)?
- Is the prompt twenty to forty words for a standard generation?
If five or more boxes are ticked, run it. If fewer, edit before generating.
Where Charmloop fits
Charmloop's generator supports the standard positive-and-negative prompt model, weight syntax in parentheses, and per-character style locking through the model system. The catalog page lets you start from an existing character whose visual identity is already defined, then write a prompt that places that character in a specific scene — which removes the "subject definition" problem entirely. See the catalog for the character library or the generator to write a prompt against a base model.
For the broader buyer's framework — quality, consistency, cost — the honest guide to choosing an AI image generator covers what to evaluate before settling on a tool. Once you have a tool you like, this prompt structure works across most of them.
What changes next
Prompt syntax is loosening over time, not tightening. The newer models — Flux, SDXL successors, the proprietary frontier models — understand longer, more natural-language prompts and need fewer of the "magic quality modifier" tricks that defined the 2023–2024 generation. The structure in this guide will keep working; the volume of quality modifiers you need will keep shrinking. That is a good direction.