Lip Sync using Wan 2.2 (S2V)
Video Lecture
| Section | Video Links |
|---|---|
| Wan 2.2 S2V Lip Sync | ![]() |
Description
We will use the GGUF quantised Wan2.2 models.
- umt5-xxl-encoder-Q8_0.gguf (6.04 GB)
- high_noise_model.safetensors (1.23 GB) ⇾ rename to
Wan2.2-I2V-A14B-lora-high_noise.safetensors - wav2vec2_large_english_fp16.safetensors (631 MB)
- Wan2.2-S2V-14B-Q8_0.gguf (19.6 GB)
- wan2.1_vae.safetensors (254 MB)
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 clip/
│ │ └── umt5-xxl-encoder-Q8_0.gguf
│ ├── 📂 loras/
│ │ ├── Wan2.2-I2V-A14B-lora-high_noise.safetensors
│ ├── 📂 audio_encoders/
│ │ └── wav2vec2_large_english_fp16.safetensors
│ ├── 📂 unet/
│ │ └── Wan2.2-S2V-14B-Q8_0.gguf
│ └── 📂 vae/
│ └── wan2.1_vae.safetensors
Sample Workflows
For S2V workflow use the WanSoundImageToVideo and WanSoundImageToVideoExtend nodes.
Download Example Audio (woman) and save into your ComfyUI/input/ folder.
| Initial Image | Input Video | Workflow |
|---|---|---|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
WGET Commands
If you are using Runpod, or a similar hosted GPU service, then you can access your running pod/instance using a terminal.
# # | |
# # | |
# # | |
# # | |
# # | |
Wait for files to download fully before running your workflows.
Troubleshooting
Error : Input type (float) and bias type (c10::Half) should be the same
The problem is a dtype mismatch.
We can hard patch the ComfyUI audio encoder script. Open:
ComfyUI/comfy/audio_encoders/audio_encoders.py
Find:
out, all_layers = self.model(audio.to(self.load_device))
Change to:
self.model = self.model.to(self.load_device)
# Match input dtype to model dtype
audio = audio.to(self.load_device, dtype=next(self.model.parameters()).dtype)
out, all_layers = self.model(audio)
Warning
Indent using the Space key, not the Tab key.
- out, all_layers = self.model(audio.to(self.load_device))
+ self.model = self.model.to(self.load_device)
+
+ # Match input dtype to model dtype
+ audio = audio.to(self.load_device, dtype=next(self.model.parameters()).dtype)
+
+ out, all_layers = self.model(audio)












































