Skip to content

Voice Clone

Custom TTS node that clones voice from a reference audio and speaks entered text.

Install Voice Clone Custom Node

Install the ComfyUI Voice Clone custom node using the manager,

Or, install using your command/terminal prompt.

  1. Navigate to your ComfyUI/custom_nodes folder.
  2. Run,
    git clone https://github.com/Sean-Bradley/ComfyUI-Voice-Clone.git
    
  3. Navigate to your ComfyUI_windows_portable folder.
  4. Run,
    python_embeded\python -m pip install -r ComfyUI/custom_nodes/ComfyUI-Voice-Clone/requirements.txt
    
  5. Restart ComfyUI

Install Models

All required models can be downloaded from https://huggingface.co/ResembleAI/chatterbox/tree/main

Ensure that your folder structure and downloaded files resemble this below.

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 tts/
│   │   └── 📂 chatterbox/
|   |       ├── added_tokens.json
|   |       ├── conds.pt
|   |       ├── merges.txt
|   |       ├── s3gen.safetensors
|   |       ├── s3gen_meanflow.safetensors
|   |       ├── special_tokens_map.json
|   |       ├── t3_turbo_v1.safetensors
|   |       ├── tokenizer_config.json
|   |       ├── ve.safetensors
|   |       └── vocab.json

Sample Workflows

Voice Clone

Drag this link into ComyfUI to see the workflow.

Voice Clone

Voice Replace

Drag this link into ComyfUI to see the workflow.

Voice Replace

Sample Audios

Download (Right Click, Save Audio As...) Description
Audio snippets assembled from So Much for So Little animated cartoon. Copyright © 1949 Warner Bros. Cartoons
Audio snippets assembled from Puss n' Booty animated cartoon. Copyright © 1943 Warner Bros. Cartoons
Audio snippets assembled from Scrap Happy Daffy animated cartoon. Copyright © 1949 Warner Bros. Cartoons
Audio snippets assembled from Night of the Living Dead (1968). Copyright © 1968 Image Ten, Inc
Audio snippets assembled from Night of the Living Dead (1968). Copyright © 1968 Image Ten, Inc
Audio snippets assembled from Psycho (1960). Copyright © 1960 Shamley Productions, Inc.
Audio snippets assembled from Psycho (1960). Copyright © 1960 Shamley Productions, Inc.

Settings

Setting Description
temperature Sampling temperature for the text-to-speech decoder. Higher values increase randomness and variety in the generated audio; lower values make outputs more conservative and deterministic. Valid range: 0.15 - 2.0.
top_p Nucleus (top-p) sampling cumulative probability threshold. The decoder samples from the smallest set of tokens whose cumulative probability ≥ top_p. top_p = 1.0 disables nucleus filtering (i.e., sample from full distribution). Valid range: 0.0 - 1.0
repetition_penalty Penalizes repetition during generation. Values > 1.0 discourage repeating the same tokens/frames, reducing looping/redundancy in speech. Valid range: 1.0 - 2.0
voice_embedding (optional) If provided, an audio reference is used as an audio prompt for voice cloning.
top_k At each step of generation, the model predicts probabilities for many possible next tokens (text or acoustic tokens). The next token is sampled only from those top k candidates.
normalize Normalize the audio output volume.
disable_watermark By default, audio output is watermarked using PerTh Watermarking. You can disable this by selecting true.

Paralinguistic tags

[clear throat] [sigh] [shush] [cough] [groan] [sniff] [gasp] [chuckle] [laugh]

ComfyUI Voice Clone

resemble-ai/chatterbox (github)

List of animated films in public domain United States (wikipedia)