Text to Image

Video Lecture

Section	Video Links
Text to Image

(Pay Per View)

You can use PayPal to purchase a one time viewing of this video for $1.49 USD.

Pay Per View Terms

One viewing session of this video will cost the equivalent of $1.49 USD in your currency.
After successful purchase, the video will automatically start playing.
You can pause, replay and go fullscreen as many times as needed in one single session for up to an hour.
Do not refresh the browser since it will invalidate the session.
If you want longer-term access to all videos, consider purchasing full access through Udemy or YouTube Memberships instead.
This Pay Per View option does not permit downloading this video for later viewing or sharing.
All videos are Copyright © 2019-2025 Sean Bradley, all rights reserved.

Video Timings

00:00 Rebuilding the Text-to-Image Workflow from Scratch
00:15 Loading the Stable Diffusion 1.5 Checkpoint
00:30 Understanding Checkpoint Parameters: Pruned, EMA Only, FP16, Safe Tensors
01:00 Connecting the Model to the K Sampler
01:30 Introducing VAE Decode and Latent Space
01:50 Setting Up Image Saving and Output Prefixes
02:10 Defining Positive and Negative Prompts with CLIP Text Encode
02:50 Connecting Prompts and VAE to the KSampler
03:40 Adding the Empty Latent Image Input to KSampler
04:00 Setting Image Dimensions for Stable Diffusion 1.5
04:30 First Image Generation and Understanding the Seed
05:00 Randomizing Seed and Generating Multiple Images
05:40 Controlling Image Generation with Fixed Seed
06:00 Monitoring Image Generation Queue and Times
06:15 Adjusting 'Steps' for Image Quality and Speed
07:00 Experimenting with Classifier Free Guidance (CFG)
08:00 Exploring Different Sampler and Scheduler Combinations
08:30 Testing Various Prompts and Negative Prompts
09:00 Identifying Model Limitations (Faces, Text) and Strengths (Art Styles)
10:00 Understanding Latent Image Size and Corruption
11:00 Customizing the User Interface Graph Link Render Mode
11:40 Grouping Nodes for a Cleaner Workflow Layout
12:15 Accessing and Re-importing Generated Images
12:45 Workflow Embedding in PNG Images for Sharing
13:00 Creating a Desktop Shortcut and Persistence of Workflows
14:00 Loading Workflows Directly from Web Browsers

Description

We will recreate the basic Text to Image workflow using the v1-5-pruned-emaonly-fp16.safetensors model. This is the optimised version of the Stable Diffusion v1.5 model.

pruned means that this version of the model has had unnecessary parameters removed. This reduces its size and computational cost.

emaonly means that the checkpoint file was generated with the "Exponential Moving Average" (EMA) method, which is often used to improve generalization during training.

The .safetensors extension refers to a model serialization format that is faster and safer than earlier methods. The earlier format with extension .ckpt uses the Python pickle method which can contain arbitrary code, increasing its susceptibility to potential security vulnerabilities.

Recreate the basic text to image workflow from scratch

Load Checkpoint : Loads a checkpoint model (e.g., SD 1.5).
KSampler : The denoising engine. Uses the prompt, noise, and model to iteratively generate an image in latent space.
VAE Decode : Variational Autoencoder. Converts the latent image into a visible RGB image.
Save Image : Saves the final generated image to disk.
CLIP Text Encode (Positive Prompt) : Encodes your main text prompt into a format the model can use.
CLIP Text Encode (Negative Prompt) : Encodes undesired elements (e.g., "blurry, distorted") to help the model avoid them.
Empty Latent Image : Creates an initial noise image (latent space) of the desired resolution.

Some Example Prompts

a breathtaking alpine valley at sunrise
a car on a dusty road
a cat on a skateboard
a bicycle in amsterdam
speeding through a city with bright lights. strobe effect
a person reading a newspaper
a portrait of a person, in the style of picasso
modern architectural buildings with clean lines, beautiful gardens with water features, situated on the edge of a cliff, overlooking the fjords

Workflow embedded in image

Using a compatible browser, you can drag this image into ComfyUI and run the same workflow that generated this actual image.

Scheduler Graphs

The scheduler controls how the "sigmas" (noise levels, or variance schedule) are distributed across the denoising steps.

Each scheduler generates a sequence of noise scales over the course of N steps, and those values affect how much weight is given to the model's prediction vs. the running latent at each iteration.

Design Patterns in Python
		Kindle Edition Paperback
Design Patterns in TypeScript
		Kindle Edition Paperback

Design Patterns in Python
		Kindle Edition Paperback
Design Patterns in TypeScript
		Kindle Edition Paperback