Prompt: A hyper-realistic image of a man relaxing on a sunny Mediterranean terrace, dressed in breezy coastal resort fashion: lightweight linen button-down shirt, white cuffed chinos, woven leather sandals, vintage sunglasses, and a woven straw fedora. He sits comfortably in a rattan lounge chair surrounded by vibrant ceramic planters, sun-bleached stone flooring, trailing bougainvillea vines, sparkling blue sea in the distant background, sharp daylight and natural shadows, tranquil afternoon ambiance, detailed skin and fabric textures

Generating Qwen images with Controlnet unlocks a powerful way to guide your AI creations using visual structure, lines, and forms drawn or extracted from reference images. Want better control over your AI image generation? Here's how to use Qwen Image with InstantX Union ControlNet to guide your creations with poses, edges, and depth maps.

With just a simple pose, edge, depth map, or quick sketch, you can shape exactly how your output looks. Whether you're working on precise designs or expressive portraits, this workflow gives you the control you need without the complexity.

Here's what we'll cover:
1. Why InstantX Union beats DiffSynth
2. Getting the workflow set up
3. Required models and custom nodes
4. Step-by-step walkthrough
5. Real examples and troubleshooting

Why InstantX Union is Better than DiffSynth?

Source: Qwen Image InstantX Union

InstantX Union ControlNet combines four control types (canny, soft edge, depth, and pose) into one model file. Instead of downloading separate models for each control type, you get everything in one package.

Unlike DiffSynth, which makes you load different models for different tasks, InstantX Union lets you switch between control types instantly. Less storage, less setup, same quality.

Prompt: A cozy Bohemian living room, layered patterned rugs over rustic hardwood floors, low-slung vintage couches with colorful embroidered pillows, eclectic gallery wall art, clusters of hanging macramé planters and trailing greenery, carved wooden coffee table, rattan accent chairs, textured woven blankets, lantern-style ambient lighting, warm earthy color palette, relaxed and inviting atmosphere, artistic and collected look

For Qwen Image users, this means creating complex, high-quality images with simpler setup, better compatibility, and instant access to the most common control modes all in a unified, user-friendly package.

Download Workflow

Installation guide

  1. Download the workflow file
  2. Open ComfyUI (local or ThinkDiffusion)
  3. Drag the workflow file into the ComfyUI window
  4. If you see red nodes, install missing components:
  • ComfyUI Manager > Install Missing Custom Nodes

Verified to work on ThinkDiffusion Build: September 5, 2025

ComfyUI v0.3.57 with the use qwen_image_fp8_e4m3fn.safetensors
models

Note: We specify the build date because ComfyUI and custom node versions updated after this date may change the behavior or outputs of the workflow.

Minimum Machine Size: Ultra

Use the specified machine size or higher to ensure it meets the VRAM and performance requirements of the workflow

💡
Download the workflow and drag & drop it into your ComfyUI window, whether locally or on ThinkDiffusion. If you're using ThinkDiffusion, minimum requirement is the Turbo 24gb machine, but we do recommend the Ultra 48gb machine.

Custom Nodes

If there are red nodes in the workflow, it means that the workflow lacks the certain required nodes. Install the custom nodes in order for the workflow to work.

  1. Go to the ComfyUI Manager  > Click Install Missing Custom Nodes
  1. Check the list below if there's a list of custom nodes that needs to be installed and click the install.

Required Models

For this guide you'll need to download these 4 recommended models.

1. qwen_image_fp8_e4m3fn.safetensors
2. qwen_2.5_vl_7b_fp8_scaled.safetensors
3. qwen_image_vae.safetensors
4. Qwen-Image-InstantX-ControlNet-Union.safetensors
  1. Go to ComfyUI Manager  > Click Model Manager
  1. Search for the models above and when you find the exact model that you're looking for, click install, and make sure to press refresh when you are finished.

If Model Manager doesn't have them: Use direct download links (included with workflow) and upload through ThinkDiffusion MyFiles > Upload URL. Refer our docs for more guidance on this.

You could also use the model path source instead: by pasting the model's link address into ThinkDiffusion MyFiles using upload URL.

Model Name Model Link Address ThinkDiffusion Upload Directory
qwen_image_fp8_e4m3fn.safetensors
📋 Copy Path
.../comfyui/models/diffusion_models/
qwen_2.5_vl_7b_fp8_scaled.safetensors
📋 Copy Path
.../comfyui/models/text_encoders/
qwen_image_vae.safetensors
📋 Copy Path
.../comfyui/models/vae/
Qwen-Image-InstantX-ControlNet-Union.safetensors
📋 Copy Path
.../comfyui/models/controlnet/

Step-by-step Workflow Guide

This workflow was pretty easy to set up and runs well from the default settings. Here are a few steps where you might want to take extra note.

Steps Recommended Nodes
1. Load Input Image

Load an image. Image should be in good quality. Any resolution will do.
2. Set a Controlnet

Set your desired controlnet based on your preferences. If there a human in the image you can use the pose.
3. Set the Models

Set the exact models as seen on the image.
4. Write a Prompt

Write a detailed of what of you kind of new image you want be in the input image.
5. Check Sampling

Set the sampling as seen on the image.
6. Check Output

💡
I use Canny for crisp, accurate line control in things like architecture or detailed designs. I switch to Soft Edge when I want smoother, more natural guidance for portraits or landscapes. I rely on Depth whenever 3D space and realistic perspective are important things like background consistency or lighting. For Pose, I apply it to human figures when I need precise control over body position or gestures, making sure characters look natural and expressive.
💡
I experimented with several ControlNet models beyond the standard four, exploring a range of options to expand the workflow’s capabilities. However, I observed that using these alternative models often leads to unpredictable or suboptimal results, such as unusual visual distortions or unintended effects. For this reason, I recommend exercising caution when integrating non-standard ControlNet models into your workflow and thoroughly testing each model to ensure consistent output quality.

Examples

Prompt: A highly-detailed robotic rhinoceros, gleaming chrome armor and neon blue LED accents, imposing mechanical form, walking through a bustling futuristic cyberpunk city at night, surrounded by towering skyscrapers with holographic billboards, atmospheric neon lights, reflective wet streets, flying vehicles in the background, electric mist and digital rain, vibrant but moody color scheme, cinematic composition, inspired by sci-fi and cyberpunk visuals, sharp focus, dynamic lighting, techno-futuristic aesthetic
Prompt: A playful dog with expressive features, set in a dreamlike landscape of floating islands with impossible geometry, lush grass, sparkling waterfalls cascading into the clouds, vibrant pastel skies, hints of rainbow light, whimsical and surreal atmosphere, tranquil and magical mood, high detail, painterly finish
Prompt: A sinister-looking boy with sharp eyes and an intense expression, standing in a shadowy villain lair filled with glowing red and green control panels, massive digital screens flashing ominous warnings, metallic walls lined with exposed wires and pulsing energy conduits, dark atmospheric lighting, mysterious swirling smoke at his feet, futuristic weapons and artifacts scattered around, sleek black and dark purple clothing with high-collar jacket and metallic accents, glowing symbol on his glove, intimidating and clever appearance, cinematic mood, high detail
Prompt: An ancient stone building with massive weathered walls, rough-hewn blocks and primitive mortar, narrow arched doorways, tiny slit windows for defense, heavy wooden gates reinforced with iron, moss and creeping vines covering the crumbling exterior, worn flagstones leading to the entrance, rustic torch sconces, simple geometric carvings, historic atmosphere reminiscent of early medieval or prehistoric architecture, cloudy skies and soft diffused lighting, emphasis on age and durability

Troubleshooting

Red Nodes: Install missing custom nodes through ComfyUI Manager
Out of Memory: Use smaller expansion factors or switch to Ultra machine
Poor Quality: Check input image resolution and adjust kontext strength
Visible Seams: Lower strength and ensure good prompt description

If you’re having issues with installation or slow hardware, you can try any of these workflows on a more powerful GPU in your browser with ThinkDiffusion.

Join the ThinkDiffusion Discord Server!
ThinkDiffusion is your Stable Diffusion workspace in the cloud with unrestricted, bleeding edge opensource AI art tools. | 5510 members