Stable Diffusion Prompt Guide: SDXL and SD3 Formulas for Better AI Images

The best AI images start with the best prompts. That has not changed. What has changed is the model. Stable Diffusion has gone from SD 1.5 to SDXL to SD3 and now SD3.5. Each version understands prompts differently. Old tricks that worked on SD 1.5 can produce garbage on SDXL. This guide covers what works right now across the latest Stable Diffusion models.
What is Stable Diffusion in 2026?
Stable Diffusion is an open-source text-to-image AI model created by Stability AI. It was the first major image model released as open source, meaning anyone can run it on their own hardware. No subscription required. No rate limits. Full control.
The current landscape looks like this:
- SD 1.5 -- The original. Still used for specific fine-tuned models and LoRAs. Lightweight, runs on consumer GPUs.
- SDXL -- The current workhorse. 1024x1024 native resolution. Dramatically better composition, hands, and faces. Two-stage pipeline with a base model and refiner.
- SD3 / SD3.5 -- The newest generation. Uses a completely new architecture (MMDiT). Better text rendering, improved prompt adherence, stronger composition. SD3.5 Large is the flagship, SD3.5 Medium is the efficiency model.
- Stable Diffusion via API -- Stability AI offers their models through the Stability API. No local setup needed.
We also built SizzlePop.ai which is powered by Stable Diffusion.
How to Get Started
Cloud options (no GPU required):
- Stability AI API -- Official API, pay per generation
- ComfyUI on RunPod -- Full node-based workflow in the cloud
- Civitai -- Community platform with built-in generation
Local setup (requires GPU with 8GB+ VRAM):
- ComfyUI -- Node-based, most flexible, industry standard
- Automatic1111 / Forge -- Traditional web UI, huge extension ecosystem
ComfyUI has become the default for serious users. It is node-based, which means you build visual workflows. Steep learning curve but unmatched power.
The Prompt Formula
We break every Stable Diffusion prompt into three components:
[Core Subject] , [Style & Medium] , [Finishing Touches]
This structure works across SD 1.5, SDXL, and SD3. The difference is how much detail each model needs.
1. Core Subject
What is in the image? Be specific. Describe the subject, what they are doing, and where they are.
Weak: a dog
Better: a golden retriever puppy
Best: a golden retriever puppy sitting in tall grass, looking up with bright eyes, tongue out
The more specific you are about the subject, the less the model has to guess. Guessing leads to generic output.
2. Style and Medium
This steers the entire visual direction. Without a style, the model defaults to a photographic average of everything it was trained on. That average is boring.
Photography styles: portrait photography, street photography, macro photography, aerial drone shot, fashion editorial, documentary photography
Art styles: oil painting, watercolor, digital illustration, anime, pencil sketch, vector art, pixel art, 3D render
Artist references: in the style of Greg Rutkowski, Alphonse Mucha, Studio Ghibli, Makoto Shinkai, Wes Anderson color palette
Era and mood: 1970s film grain, cyberpunk neon, dark academia, cottagecore, brutalist
Chain multiple style descriptors with commas. Put the most important ones first because SDXL and SD3 weigh early tokens more heavily.
3. Finishing Touches
Fine-tune the technical look of the image. These go at the end of your prompt.
| Category | Examples |
|---|---|
| Lighting | golden hour, studio lighting, dramatic side light, soft diffused light, neon glow, rim lighting |
| Quality | highly detailed, sharp focus, 8k uhd, professional, masterpiece |
| Camera | 85mm lens, shallow depth of field, wide angle, bird's eye view, low angle shot |
| Color | muted tones, high contrast, pastel palette, monochrome, warm color grading |
| Mood | cinematic, ethereal, gritty, serene, intense, nostalgic |
SDXL-Specific Tips
SDXL handles prompts differently from SD 1.5. Here is what to know:
- Write longer, more descriptive prompts. SDXL has a dual text encoder (CLIP ViT-L + OpenCLIP ViT-G). It understands more nuance. Short keyword-only prompts produce worse results than with SD 1.5.
- Use the refiner. SDXL's two-stage pipeline (base + refiner) adds fine detail. In ComfyUI, switch from base to refiner at around 80% of steps for the sharpest results.
- Native resolution matters. SDXL was trained at 1024x1024. Generating at other resolutions works but stick to supported aspect ratios: 1024x1024, 1152x896, 896x1152, 1216x832, 832x1216, 1344x768, 768x1344.
- Quality tokens still help. Adding "masterpiece, best quality, highly detailed" still improves SDXL output. It was trained on data that associates these phrases with higher quality images.
SD3 / SD3.5-Specific Tips
SD3 uses a fundamentally different architecture. Prompt strategy shifts accordingly.
- Natural language works better. SD3's T5 text encoder understands full sentences and spatial relationships. "A red ball on top of a blue cube" actually works now.
- Text rendering is real. SD3 can render short text in images. Enclose the text in quotation marks:
a storefront sign that reads "OPEN 24 HOURS" - Less keyword spam needed. SD3 understands context. You do not need to pad prompts with quality boosters as aggressively. Focus on describing what you actually want.
- Spatial descriptions work. "A cat on the left, a dog on the right, a tree in the background" -- SD3 handles compositional instructions that older models ignored.
Negative Prompts
Negative prompts tell the model what to exclude. They are critical for clean output.
Universal Negative Prompt (SDXL)
bad anatomy, bad hands, extra fingers, extra limbs, deformed, blurry, watermark, text, logo, signature, low quality, worst quality, jpeg artifacts, cropped, out of frame
Portrait Negative Prompt
bad anatomy, bad hands, extra fingers, deformed face, asymmetric eyes, crossed eyes, bad teeth, extra limbs, mutation, disfigured, blurry, low quality
SD3 Negative Prompts
SD3 is less dependent on negative prompts than SDXL. Keep them shorter and more targeted. Focus on specific issues you see in outputs rather than dumping a generic list.
10 Prompt Examples: Bad vs Good
1. Automotive
- Bad:
really cool fast car - Good:
Matte black Porsche 911 GT3 RS on a wet mountain road at dawn, motion blur on the wheels, dramatic fog, automotive photography, shot with Sony A7R V, 70-200mm lens, cinematic color grading
2. Portrait
- Bad:
a beautiful woman - Good:
Portrait of a woman in her 30s with freckles and auburn hair, wearing a cream linen shirt, soft window light from the left, shallow depth of field, Fujifilm film simulation, natural and relaxed expression
3. Landscape
- Bad:
a big island and volcano erupting into the sky - Good:
Volcanic island with an erupting crater sending ash clouds into a crimson sunset sky, dark lava flowing into turquoise ocean, dramatic natural disaster scene, aerial drone perspective, National Geographic style
4. Fantasy Illustration
- Bad:
a wizard - Good:
An elderly wizard with a long silver beard, standing in a candlelit tower library, holding a glowing crystal orb, surrounded by floating books, dark fantasy illustration in the style of Alan Lee, warm candlelight, moody atmosphere
5. Product Photography
- Bad:
headphones on a table - Good:
Matte black wireless headphones floating against a gradient dark blue background, soft studio lighting with a single highlight reflection, commercial product photography, clean and minimal
6. Food
- Bad:
a plate of pasta - Good:
Hand-made tagliatelle with bolognese ragu, freshly grated parmesan, torn basil leaves, on a rustic ceramic plate, overhead shot, soft natural light from a nearby window, food photography, warm color grading
7. Architecture
- Bad:
a modern house - Good:
Minimalist concrete and glass residence nestled into a hillside, infinity pool reflecting the twilight sky, warm interior lights visible through floor-to-ceiling windows, architectural photography, 24mm wide angle
8. Sci-Fi
- Bad:
spaceship in space - Good:
Massive interstellar cargo ship approaching a space station orbiting a gas giant, visible wear and battle damage on the hull, small shuttle craft in formation, hard sci-fi aesthetic, cinematic wide shot, volumetric lighting from the nearby star
9. Fashion
- Bad:
woman in a dress - Good:
Fashion editorial of a model in a flowing emerald silk gown, standing on a windswept cliff at golden hour, hair and fabric caught in the wind, shot from a low angle, Vogue magazine style, dramatic sky
10. Abstract
- Bad:
abstract art - Good:
Abstract fluid art with swirling deep ocean blues, metallic gold veins, and white marble textures, macro close-up, resembling satellite imagery of a planet's surface, 8k resolution, rich detail
LoRAs, Embeddings, and Fine-Tuned Models
One of Stable Diffusion's biggest advantages over closed models is customization.
- LoRAs (Low-Rank Adaptation): Small model add-ons that teach the base model a specific style, character, or concept. Find thousands on Civitai. Use them in your prompt with
<lora:name:weight>syntax. - Textual Inversions / Embeddings: Small files that define a concept in the model's latent space. Common ones include quality embeddings like "EasyNegative" that replace long negative prompts.
- Fine-tuned checkpoints: Entire model variants trained for specific styles. Examples: Juggernaut XL (photorealism), DreamShaper (versatile), Animagine XL (anime).
The combination of a good checkpoint + targeted LoRAs + solid prompting is what separates average Stable Diffusion output from professional-grade results.
Prompt Weighting
Control emphasis on specific parts of your prompt using parentheses and weights.
SDXL / Automatic1111 syntax:
(word)-- 1.1x emphasis((word))-- 1.21x emphasis(word:1.5)-- 1.5x emphasis(word:0.5)-- 0.5x (de-emphasis)
ComfyUI / SD3 syntax: Varies by node, but most support the same (word:weight) format.
Example: A (golden retriever:1.3) puppy playing in (autumn leaves:1.2), soft warm light, (sharp focus:1.1)
Do not go above 1.5. Higher weights cause artifacts and distortion.
In Conclusion
Think of yourself as a photographer sending shot instructions to a remote crew. You cannot be there to point and adjust. Your prompt is the only communication you have. Be specific. Be visual. Describe what you see in your head. Use the three-part formula: subject, style, finishing touches. Match your prompting strategy to your model -- SDXL wants detail, SD3 wants natural language. And use negative prompts to clean up what the model gets wrong. The tools are free and open source. The only limit is how well you describe what you want.
For more on AI image generation, check out our Midjourney prompt guide and our roundup of the best text-to-image tools available right now.
Related articles: Midjourney V6 Prompt Guide · Best Text-to-Image AI Tools · How to Use Midjourney on Your Website
Author
Want more like this?
I write about AI implementation, automation, and growth marketing. No hype.



