Appendum
Which Nvidia GPU supports which Data Type
Nvidia GPU | Data Types |
---|---|
50 series (blackwell) | fp16, bf16, fp8, fp4 |
40 series (ada) | fp16, bf16, fp8 |
30 series (ampere) | fp16, bf16 |
20 series (turing) | fp16 |
10 series (pascal) and below | only slow full precision fp32. |
FP32 Version of Stable Diffusion 1.5 Pruned EMAOnly
Section | Video Links |
---|---|
Using ComfyUI with a 10 Series Nvidia GPU | ![]() |
If using a 10 Series NVidia GPU, modern generative AI will not be an enjoyable experience.
Many of the earlier lessons in this course use the Stable Diffusion 1.5 Pruned EMAOnly FP16 model.
If you have a 10 series Nvidia card, then you will be very limited in choices, since many AI models are released in FP16 format.
There is a FP32 version of SD1.5 Pruned EMAOnly that you can try instead.
It is twice as large to download and load into memory than the FP16 version, but may be faster to use if your 10 Series has enough VRAM.
SD1.5 Pruned EMAOnly Version | filesize | Link |
---|---|---|
FP16 | 2.13 GB | Files and versions (huggingface) |
FP32 | 4.27 GB | Files and versions (huggingface) |
Why is FP16 not advisable on 10 series GPUs
10 Series GPUs don't have native FP16 acceleration, so they need to emulate FP16 and this will cause extra resource management on the GPU.
10 Series don't have tensor cores, which are specialized hardware units designed to accelerate FP16 operations.
On 10-series GPUs, FP16 operations fall back to standard CUDA cores, which are optimized for FP32.
Which 10 Series GPUs can I use
GPU | VRAM | Notes |
---|---|---|
GTX 1050 / Ti | 2–4 GB | Might crash with FP32 due to VRAM limits. |
GTX 1060 3GB | 3 GB | Might crash with FP32 due to VRAM limits. |
GTX 1060 6GB | 6 GB | May just barely run FP32 SD1.5; FP16 helps memory, not speed. |
GTX 1070 | 8 GB | Can run FP32 model more comfortably. FP16 model saves memory but doesn’t improve speed. |
GTX 1080 | 8 GB | Can run FP32 model more comfortably. FP16 model saves memory but doesn’t improve speed. |
GTX 1080 Ti | 11 GB | Runs FP32 fine; FP16 likely slightly slower due to conversion overhead. |
Recommended KSampler Latent Image Input Sizes
In order to get the best results from the KSampler when using a particular checkpoint, then it is important to consider the input latent-image
dimensions.
Below is a table that shows recommended latent image input dimensions for some popular checkpoints.
Model | Training Image Resolution | Ideal KSampler Input Size | Approx. VRAM Required |
---|---|---|---|
SD 1.5 | 512×512 | 512×512 | ~4 GB |
SD 2.1 | 512×512 or 768×768 | 512×512 or 768×768 | ~6 GB |
SDXL | 1024×1024 | 1024×1024 (Other SDXL Sizes) | ~8–12 GB |
SD 3.5 | ~1024×1024 | 1024×1024 (dynamic sizes) | ~12–16 GB |
FLUX.1 Schnell | ~1024×1024 | 1024×1024 (dynamic sizes) | ~13–33 GB |
FLUX.1 Dev | ~1024×1024 | 1024×1024 (dynamic sizes) | ~23–24 GB (FP16) |
DreamShaper 8 | 512x512 (SD1.5 Base) | 1024x1024 | ~4-5 GB |
AbsoluteReality | 512x512 (SD1.5 Base) | 1024x1024 | ~8-12 GB |
DreamShaper XL | 1024x1024 (SDXL Base) | 1024x1024 (Other SDXL Sizes) | ~10-14 GB |
Other SDXL Sizes
Dimension | Ratio |
---|---|
1024x1024 | (1:1) |
1152x896 | (9:7) |
896x1152 | (7:9) |
1216x832 | (19:13) |
832x1216 | (13:19) |
1344x768 | (7:4) |
768x1344 | (4:7) |
1536x640 | (12:5) |
640x1536 | (5:12) |
Useful Links
v1-5-pruned-emaonly-fp16.safetensors (huggingface)
v1-5-pruned-emaonly.safetensors FP32 (huggingface)
FLUX Schnell FP8 (huggingface)
What’s the Difference Between Single-, Double-, Multi- and Mixed-Precision Computing? (Nvidia Blog)