
ComfyUI Best Practices: My Production Image Pipeline on an RTX 5090
ComfyUI gives you total control over Stable Diffusion — and total responsibility for getting it right. After months of running a production SDXL pipeline on an RTX 5090, these are the practices that actually moved the needle: where detail really comes from, how to get tack-sharp eyes, why my LoRA kept crashing, and how to build workflows you reuse instead of rebuild every time.
Why ComfyUI Over a One-Click UI
I run my image generation locally on a machine I call La Bestia — an RTX 5090 box that does nothing but render. I moved everything to ComfyUI for one reason: it is the only interface where the pipeline is the artifact. Every node, every connection, every sampler setting is explicit and reproducible. When a render comes out perfect, I can save the exact graph and run it a thousand more times.
That power has a cost. ComfyUI will happily let you build a workflow that wastes VRAM, upscales noise, and produces soft, painted-looking faces — and it won't tell you why. Almost everything below is a lesson I learned by shipping a bad render first and then figuring out what the graph was actually doing.
The mental model that fixed everything:
Detail is created in the native generation pass. Every node after that — upscalers included — can only enlarge or refine the detail that already exists. If the eyes weren't sharp coming out of the sampler, no amount of upscaling will save them.
Best Practice #1: Native Resolution Beats Upscaling for Real Detail
My biggest early mistake was treating upscaling as a detail generator. I'd generate a small image and then run it through a 4x upscaler expecting crisp irises and individual eyelashes. They never came. Upscaling enlarges what is there — it does not invent fine structure that the sampler never drew.
SDXL resolves fine detail during the native pass at its trained resolutions. Generate as much real detail as your VRAM allows first, then upscale to enlarge it cleanly. My production order is:
- Native generation at a valid SDXL resolution (896×1152 for portraits, 1344×768 for wide shots).
- A FaceDetailer pass to re-render the face at higher internal resolution.
- A model upscale (4x-UltraSharp) followed by a final 2x to reach ~1792×2304.
Stick to SDXL's native aspect buckets. Feeding it 512×512 or some arbitrary off-ratio size invites duplicated heads, melted hands, and bad composition. The model was trained around ~1 megapixel ratios — respect that and half your problems disappear.
Best Practice #2: Soft Eyes Are a Framing Problem, Not a Resolution Problem
This one cost me a week of chasing the wrong fix. I had two presets that produced identical output resolution through byte-for-byte identical pipelines — same native size, same FaceDetailer, same upscale chain. One produced tack-sharp eyes. The other produced soft, smeary eyes. For days I blamed “resolution.” It was wrong.
The real cause was framing. A tight, close portrait gives the face a large share of the native pixels, so the sampler — and then FaceDetailer — has a lot of face to work with. A wide or full-body shot spends most of its pixels on the room, the clothes, the background. The face ends up tiny in the native generation, which means there's barely any facial detail for the rest of the graph to refine.
The takeaway:
If you need a wide shot and sharp eyes, you can't rely on output resolution alone. You have to give FaceDetailer enough internal resolution to rebuild the small detected face — which is exactly Best Practice #3.
Best Practice #3: Tune FaceDetailer — It's Your Highest-Leverage Node
FaceDetailer (from the Impact Pack) detects the face, crops it, re-renders it at a higher internal resolution with a controlled denoise, and composites it back. For portraits it is the single most important node in the graph. Default settings are fine for tight close-ups and fall apart on anything wider.
The settings that finally gave me tack-sharp eyes with real catchlights, even on open framing:
guide_size: 768 -> 1024 max_size: 1280 -> 1536 denoise: 0.26 -> 0.33
Raising guide_size and max_size forces the detected face to be re-rendered at more internal pixels, so irises, eyelashes, and catchlights are actually drawn rather than upscaled. Nudging denoise up to ~0.33 gives the detailer enough freedom to add structure without drifting off the original likeness.
Go too far on denoise (past ~0.4) and the face starts changing identity and the jaw can square up unnaturally. Like everything in ComfyUI, it's a dial, not a switch — find the value that adds detail without rewriting the face, then lock it in as a standard.
Best Practice #4: When a LoRA OOMs Even on a 5090, Use the Distilled Checkpoint
A 24GB-plus card does not make you immune to out-of-memory crashes. When I started experimenting with newer 22B-class video/image models and stacked a LoRA on top, the RTX 5090 ran out of VRAM and the render died mid-pass. The instinct is to force ComfyUI into a high-VRAM mode and push harder. That made it worse.
What actually worked:
- Load the distilled variant of the checkpoint directly — it carries most of the quality at a fraction of the memory footprint.
- Drop the LoRA on the heaviest models. A LoRA stacked on a 22B checkpoint is what tips you over the edge; the distilled base alone fits comfortably.
- Leave ComfyUI in its default VRAM mode. Forcing high-VRAM allocation up front left no headroom for the sampler's working tensors.
- If you must keep the LoRA, cut batch size to 1 and lower the native resolution before upscaling.
The broader lesson: VRAM ceilings are about the peak of your whole graph, not the size of any single model. A checkpoint that loads fine can still OOM the moment the sampler allocates its latent buffers, the VAE decodes, or a second model gets stacked on top.
Best Practice #5: Understand What Each Knob Actually Controls
A lot of ComfyUI frustration comes from attributing results to the wrong control. Two that trip people up constantly:
- Seed ≠ likeness — The seed varies the composition and pose. It does not control who the subject is. If you're using a character LoRA, the LoRA owns the identity — changing the seed gives you a different shot of the same person, not a different person.
- Token weight is a real lever — When a checkpoint or LoRA drifts a feature (mine kept pushing eye color toward blue/green no matter the prompt), raising the weight of the corrective token — or adding a color match in the FaceDetailer pass — fixes it far more reliably than rewriting the whole prompt.
- CFG is contrast, not quality — High CFG doesn't mean 'better,' it means 'follow the prompt harder,' which on SDXL often means burnt, oversaturated output. I sit around 5–7 for photorealism.
Best Practice #6: Build Presets, Not One-Off Graphs
The thing that turned ComfyUI from a toy into a production tool for me was treating workflows like code. I keep a small set of named presets, each a frozen graph with locked-in FaceDetailer settings and a known checkpoint. When I need a new look, I clone the closest preset and change only the prompt and framing — I don't rebuild the pipeline from scratch.
Concretely, this is how I keep it sane:
- One reference preset that everything else is cloned from. When I find a better FaceDetailer or upscaler setting, I promote it to the standard and propagate it.
- Workflows live as saved JSON, version-controlled and synced between machines — the graph is the source of truth, not a screenshot.
- The API format matters: the editor graph and the /prompt API format are different. If you want to fire renders programmatically (n8n, a script, an agent), convert the editor graph to API format once and POST it.
- Drive it headlessly when you can. ComfyUI exposes an HTTP API — POST a workflow to /prompt, poll /history for the result. That's how I queue renders from automation instead of clicking.
Meta note:
The hero image at the top of this post was generated on that exact pipeline — a clean SDXL checkpoint on the 5090, native generation, no character LoRA. Eating my own cooking.
The Short Version
- Generate real detail in the native pass; upscale only to enlarge it.
- Stick to SDXL native aspect ratios — no arbitrary off-ratio sizes.
- Soft eyes usually mean the face was too small in the native frame, not that the output was low-res.
- FaceDetailer is your highest-leverage node — raise guide_size/max_size, nudge denoise to ~0.33.
- If a LoRA OOMs your GPU, use the distilled checkpoint, drop the LoRA, keep default VRAM mode.
- Seed changes the shot, the LoRA owns the identity, CFG is contrast — know what each knob does.
- Freeze your good graphs as presets and clone them; drive renders over the HTTP API.
Need an AI Image Pipeline Built and Automated?
I build production image and video pipelines with ComfyUI, wire them into n8n and the Claude API, and run them on dedicated GPU hardware. If you want a repeatable, automated render pipeline instead of clicking around a UI all day — let's talk.
Related Posts
AI Models
Claude Sonnet 4.6 for Production AI Agents: Real-World Benchmarks
Latency numbers, tool-use reliability, and when Opus is actually worth the 5x price — based on weeks of production traffic.
AI Models
Gemini 3.1 Pro Review: Where It Beats Claude (and Where It Doesn't)
An honest production review of Gemini 3.1 Pro — long-context wins, tool-use friction, and the workflows where it still loses.
Voice AI
Voice AI for Roofing Companies: A Retell AI Setup That Books Jobs
A Retell AI voice agent setup tuned for roofing intake — qualification flows, booking logic, and the metrics that matter.