Text to Image
Video Lecture
Section | Video Links |
---|---|
Text to Image | ![]() |
Video Timings
00:00 Rebuilding the Text-to-Image Workflow from Scratch00:15 Loading the Stable Diffusion 1.5 Checkpoint
00:30 Understanding Checkpoint Parameters: Pruned, EMA Only, FP16, Safe Tensors
01:00 Connecting the Model to the K Sampler
01:30 Introducing VAE Decode and Latent Space
01:50 Setting Up Image Saving and Output Prefixes
02:10 Defining Positive and Negative Prompts with CLIP Text Encode
02:50 Connecting Prompts and VAE to the KSampler
03:40 Adding the Empty Latent Image Input to KSampler
04:00 Setting Image Dimensions for Stable Diffusion 1.5
04:30 First Image Generation and Understanding the Seed
05:00 Randomizing Seed and Generating Multiple Images
05:40 Controlling Image Generation with Fixed Seed
06:00 Monitoring Image Generation Queue and Times
06:15 Adjusting 'Steps' for Image Quality and Speed
07:00 Experimenting with Classifier Free Guidance (CFG)
08:00 Exploring Different Sampler and Scheduler Combinations
08:30 Testing Various Prompts and Negative Prompts
09:00 Identifying Model Limitations (Faces, Text) and Strengths (Art Styles)
10:00 Understanding Latent Image Size and Corruption
11:00 Customizing the User Interface Graph Link Render Mode
11:40 Grouping Nodes for a Cleaner Workflow Layout
12:15 Accessing and Re-importing Generated Images
12:45 Workflow Embedding in PNG Images for Sharing
13:00 Creating a Desktop Shortcut and Persistence of Workflows
14:00 Loading Workflows Directly from Web Browsers
Description
We will recreate the basic Text to Image workflow using the v1-5-pruned-emaonly-fp16.safetensors
model. This is the optimised version of the Stable Diffusion v1.5 model.
pruned
means that this version of the model has had unnecessary parameters removed. This reduces its size and computational cost.
emaonly
means that the checkpoint file was generated with the "Exponential Moving Average" (EMA) method, which is often used to improve generalization during training.
The .safetensors
extension refers to a model serialization format that is faster and safer than earlier methods. The earlier format with extension .ckpt
uses the Python pickle method which can contain arbitrary code, increasing its susceptibility to potential security vulnerabilities.
Recreate the basic text to image workflow from scratch
- Load Checkpoint : Loads a checkpoint model (e.g., SD 1.5).
- KSampler : The denoising engine. Uses the prompt, noise, and model to iteratively generate an image in latent space.
- VAE Decode : Variational Autoencoder. Converts the latent image into a visible RGB image.
- Save Image : Saves the final generated image to disk.
- CLIP Text Encode (Positive Prompt) : Encodes your main text prompt into a format the model can use.
- CLIP Text Encode (Negative Prompt) : Encodes undesired elements (e.g., "blurry, distorted") to help the model avoid them.
- Empty Latent Image : Creates an initial noise image (latent space) of the desired resolution.
Some Example Prompts
a breathtaking alpine valley at sunrise
a car on a dusty road
a cat on a skateboard
a bicycle in amsterdam
speeding through a city with bright lights. strobe effect
a person reading a newspaper
a portrait of a person, in the style of picasso
modern architectural buildings with clean lines, beautiful gardens with water features, situated on the edge of a cliff, overlooking the fjords
Workflow embedded in image
Using a compatible browser, you can drag this image into ComfyUI and run the same workflow that generated this actual image.