Skip to content

Stable Video Diffusion

Video Lecture

Section Video Links
Stable Video Diffusion Stable Video Diffusion Stable Video Diffusion

Description

Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it.

Downloads : svd.safetensors | svd_xt.safetensors | svd_xt_1_1.safetensors

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 checkpoints/
│   │   ├── svd.safetensors
│   │   ├── svd_xt.safetensors
│   │   └── svd_xt_1_1.safetensors

The base svd.safetensors model was trained to generate 14 frames at 1024x576.

svd_xt.safetensors was trained to generate 25 frames at 1024x576.

svd_xt_1_1.safetensors is a more finely tuned version of img2vid-xt.

SVD_img2vid_Conditioning

The SVD_img2vid_Conditioning node controls the motion behavior during image-to-video generation.

The SVD_img2vid_Conditioning creates an image conditioning for use by the KSampler, rather that a textual conditioning that we've seen in all other models before this lesson. An image conditioning will contain high-dimensional latent tensors representing visual features, such as shape, motion cues, spatial layout, color distribution, composition.

For best quality, the width and height should be 1024x576. You can also get get results using 576x1024.

The Frames should be 14 when using img2vid, and 25 when using img2vid-xt or img2vid-xt-1.1.

The Motion Bucket ID is default 127 and normally produces balanced results. You can change the value from 1 to 1023. The value refers to a pre selected set of discrete "motion buckets" that the model was trained on. The value controls the intensity and complexity of motion in the generated video. Lower numbers will make the movement appear more static, verses higher numbers more dramatic. But numbers higher the 127 tend to produce more unstable results.

The Augmentation Level is a noise multiplier. This can effect the image quality and camera movement. Higher numbers result on more noise added to the latents images, which can make the output appear more messy, and camera movement more chaotic. Try numbers between 0 and 1.

Motion Bucket ID and Augmentation Level have very unpredictable outcomes. Each initial image will result in different unexpected behaviors. The only solution is to experiment with the values.

Initial Image Workflow
fish coral reef
beech
English Village Tilt Focus
Jet Flying Thru Street
Trucks
a car on a dusty road
Village 2
fish coral reef

Models Backup (huggingface)