feat: add support for SD2.x with TINY U-Nets (#939)

This commit is contained in:
akleine 2025-11-09 15:47:37 +01:00 committed by GitHub
parent 0fa3e1a383
commit d2d3944f50
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 68 additions and 47 deletions

View File

@ -1,40 +1,66 @@
# Running distilled models: SSD1B and SD1.x with tiny U-Nets # Running distilled models: SSD1B and SDx.x with tiny U-Nets
## Preface ## Preface
This kind of models have a reduced U-Net part. These models feature a reduced U-Net architecture. Unlike standard SDXL models, the SSD-1B U-Net contains only one middle block and fewer attention layers in its up- and down-blocks, resulting in significantly smaller file sizes. Using these models can reduce inference time by more than 33%. For more details, refer to Segmind's paper: https://arxiv.org/abs/2401.02677v1.
Unlike other SDXL models the U-Net of SSD1B has only one middle block and lesser attention layers in up and down blocks, resulting in relatively smaller files. Running these models saves more than 33% of the time. For more details, refer to Segmind's paper on https://arxiv.org/abs/2401.02677v1 . Similarly, SD1.x- and SD2.x-style models with a tiny U-Net consist of only 6 U-Net blocks, leading to very small files and time savings of up to 50%. For more information, see the paper: https://arxiv.org/pdf/2305.15798.pdf.
Unlike other SD 1.x models Tiny-UNet models consist of only 6 U-Net blocks, resulting in relatively smaller files (approximately 1 GB). Running these models saves almost 50% of the time. For more details, refer to the paper: https://arxiv.org/pdf/2305.15798.pdf .
## SSD1B ## SSD1B
Unfortunately not all of this models follow the standard model parameter naming mapping. Note that not all of these models follow the standard parameter naming conventions. However, several useful SSD-1B models are available online, such as:
Anyway there are some very useful SSD1B models available online, such as:
* https://huggingface.co/segmind/SSD-1B/resolve/main/SSD-1B-A1111.safetensors * https://huggingface.co/segmind/SSD-1B/resolve/main/SSD-1B-A1111.safetensors
* https://huggingface.co/hassenhamdi/SSD-1B-fp8_e4m3fn/resolve/main/SSD-1B_fp8_e4m3fn.safetensors * https://huggingface.co/hassenhamdi/SSD-1B-fp8_e4m3fn/resolve/main/SSD-1B_fp8_e4m3fn.safetensors
Also there are useful LORAs available: Useful LoRAs are also available:
* https://huggingface.co/seungminh/lora-swarovski-SSD-1B/resolve/main/pytorch_lora_weights.safetensors * https://huggingface.co/seungminh/lora-swarovski-SSD-1B/resolve/main/pytorch_lora_weights.safetensors
* https://huggingface.co/kylielee505/mylcmlorassd/resolve/main/pytorch_lora_weights.safetensors * https://huggingface.co/kylielee505/mylcmlorassd/resolve/main/pytorch_lora_weights.safetensors
You can use this files **out-of-the-box** - unlike models in next section. These files can be used out-of-the-box, unlike the models described in the next section.
## SD1.x with tiny U-Nets ## SD1.x, SD2.x with tiny U-Nets
There are some Tiny SD 1.x models available online, such as: These models require conversion before use. You will need a Python script provided by the diffusers team, available on GitHub:
* https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/scripts/convert_diffusers_to_original_stable_diffusion.py
### SD2.x
NotaAI provides the following model online:
* https://huggingface.co/nota-ai/bk-sdm-v2-tiny
Creating a .safetensors file involves two steps. First, run this short Python script to download the model from Hugging Face:
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("nota-ai/bk-sdm-v2-tiny",cache_dir="./")
```
Second, create the .safetensors file by running:
```bash
python convert_diffusers_to_original_stable_diffusion.py \
--model_path models--nota-ai--bk-sdm-v2-tiny/snapshots/68277af553777858cd47e133f92e4db47321bc74 \
--checkpoint_path bk-sdm-v2-tiny.safetensors --half --use_safetensors
```
This will generate the **file bk-sdm-v2-tiny.safetensors**, which is now ready for use with sd.cpp.
### SD1.x
Several Tiny SD 1.x models are available online, such as:
* https://huggingface.co/segmind/tiny-sd * https://huggingface.co/segmind/tiny-sd
* https://huggingface.co/segmind/portrait-finetuned * https://huggingface.co/segmind/portrait-finetuned
* https://huggingface.co/nota-ai/bk-sdm-tiny * https://huggingface.co/nota-ai/bk-sdm-tiny
These models need some conversion, for example because partially tensors are **non contiguous** stored. To create a usable checkpoint file, follow these **easy** steps: These models also require conversion, partly because some tensors are stored in a non-contiguous manner. To create a usable checkpoint file, follow these simple steps:
Download and prepare the model using Python:
### Download model from Hugging Face ##### Download the model using Python on your computer, for example this way:
Download the model using Python on your computer, for example this way:
```python ```python
import torch import torch
@ -46,35 +72,22 @@ for param in unet.parameters():
pipe.save_pretrained("segmindtiny-sd", safe_serialization=True) pipe.save_pretrained("segmindtiny-sd", safe_serialization=True)
``` ```
### Convert that to a ckpt file ##### Run the conversion script:
To convert the downloaded model to a checkpoint file, you need another Python script. Download the conversion script from here:
* https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/scripts/convert_diffusers_to_original_stable_diffusion.py
### Run convert script
Now, run that conversion script:
```bash ```bash
python convert_diffusers_to_original_stable_diffusion.py \ python convert_diffusers_to_original_stable_diffusion.py \
--model_path ./segmindtiny-sd \ --model_path ./segmindtiny-sd \
--checkpoint_path ./segmind_tiny-sd.ckpt --half --checkpoint_path ./segmind_tiny-sd.ckpt --half
``` ```
The file **segmind_tiny-sd.ckpt** will be generated and is now ready to use with sd.cpp The file segmind_tiny-sd.ckpt will be generated and is now ready for use with sd.cpp. You can follow a similar process for the other models mentioned above.
You can follow a similar process for other models mentioned above from Hugging Face.
### Another ckpt file on the net ### Another available .ckpt file:
There is another model file available online:
* https://huggingface.co/ClashSAN/small-sd/resolve/main/tinySDdistilled.ckpt * https://huggingface.co/ClashSAN/small-sd/resolve/main/tinySDdistilled.ckpt
If you want to use that, you have to adjust some **non-contiguous tensors** first: To use this file, you must first adjust its non-contiguous tensors:
```python ```python
import torch import torch

View File

@ -1788,6 +1788,9 @@ SDVersion ModelLoader::get_sd_version() {
if (is_inpaint) { if (is_inpaint) {
return VERSION_SD2_INPAINT; return VERSION_SD2_INPAINT;
} }
if (!has_middle_block_1) {
return VERSION_SD2_TINY_UNET;
}
return VERSION_SD2; return VERSION_SD2;
} }
return VERSION_COUNT; return VERSION_COUNT;

View File

@ -26,6 +26,7 @@ enum SDVersion {
VERSION_SD1_TINY_UNET, VERSION_SD1_TINY_UNET,
VERSION_SD2, VERSION_SD2,
VERSION_SD2_INPAINT, VERSION_SD2_INPAINT,
VERSION_SD2_TINY_UNET,
VERSION_SDXL, VERSION_SDXL,
VERSION_SDXL_INPAINT, VERSION_SDXL_INPAINT,
VERSION_SDXL_PIX2PIX, VERSION_SDXL_PIX2PIX,
@ -52,7 +53,7 @@ static inline bool sd_version_is_sd1(SDVersion version) {
} }
static inline bool sd_version_is_sd2(SDVersion version) { static inline bool sd_version_is_sd2(SDVersion version) {
if (version == VERSION_SD2 || version == VERSION_SD2_INPAINT) { if (version == VERSION_SD2 || version == VERSION_SD2_INPAINT || version == VERSION_SD2_TINY_UNET) {
return true; return true;
} }
return false; return false;

View File

@ -23,6 +23,7 @@ const char* model_version_to_str[] = {
"SD 1.x Tiny UNet", "SD 1.x Tiny UNet",
"SD 2.x", "SD 2.x",
"SD 2.x Inpaint", "SD 2.x Inpaint",
"SD 2.x Tiny UNet",
"SDXL", "SDXL",
"SDXL Inpaint", "SDXL Inpaint",
"SDXL Instruct-Pix2Pix", "SDXL Instruct-Pix2Pix",

View File

@ -180,6 +180,7 @@ protected:
int num_head_channels = -1; // channels // num_heads int num_head_channels = -1; // channels // num_heads
int context_dim = 768; // 1024 for VERSION_SD2, 2048 for VERSION_SDXL int context_dim = 768; // 1024 for VERSION_SD2, 2048 for VERSION_SDXL
bool use_linear_projection = false; bool use_linear_projection = false;
bool tiny_unet = false;
public: public:
int model_channels = 320; int model_channels = 320;
@ -208,15 +209,17 @@ public:
num_head_channels = 64; num_head_channels = 64;
num_heads = -1; num_heads = -1;
use_linear_projection = true; use_linear_projection = true;
} else if (version == VERSION_SD1_TINY_UNET) {
num_res_blocks = 1;
channel_mult = {1, 2, 4};
} }
if (sd_version_is_inpaint(version)) { if (sd_version_is_inpaint(version)) {
in_channels = 9; in_channels = 9;
} else if (sd_version_is_unet_edit(version)) { } else if (sd_version_is_unet_edit(version)) {
in_channels = 8; in_channels = 8;
} }
if (version == VERSION_SD1_TINY_UNET || version == VERSION_SD2_TINY_UNET) {
num_res_blocks = 1;
channel_mult = {1, 2, 4};
tiny_unet = true;
}
// dims is always 2 // dims is always 2
// use_temporal_attention is always True for SVD // use_temporal_attention is always True for SVD
@ -290,7 +293,7 @@ public:
context_dim)); context_dim));
} }
input_block_chans.push_back(ch); input_block_chans.push_back(ch);
if (version == VERSION_SD1_TINY_UNET) { if (tiny_unet) {
input_block_idx++; input_block_idx++;
} }
} }
@ -311,7 +314,7 @@ public:
d_head = num_head_channels; d_head = num_head_channels;
n_head = ch / d_head; n_head = ch / d_head;
} }
if (version != VERSION_SD1_TINY_UNET) { if (!tiny_unet) {
blocks["middle_block.0"] = std::shared_ptr<GGMLBlock>(get_resblock(ch, time_embed_dim, ch)); blocks["middle_block.0"] = std::shared_ptr<GGMLBlock>(get_resblock(ch, time_embed_dim, ch));
if (version != VERSION_SDXL_SSD1B) { if (version != VERSION_SDXL_SSD1B) {
blocks["middle_block.1"] = std::shared_ptr<GGMLBlock>(get_attention_layer(ch, blocks["middle_block.1"] = std::shared_ptr<GGMLBlock>(get_attention_layer(ch,
@ -358,7 +361,7 @@ public:
} }
if (i > 0 && j == num_res_blocks) { if (i > 0 && j == num_res_blocks) {
if (version == VERSION_SD1_TINY_UNET) { if (tiny_unet) {
output_block_idx++; output_block_idx++;
if (output_block_idx == 2) { if (output_block_idx == 2) {
up_sample_idx = 1; up_sample_idx = 1;
@ -495,7 +498,7 @@ public:
} }
hs.push_back(h); hs.push_back(h);
} }
if (version == VERSION_SD1_TINY_UNET) { if (tiny_unet) {
input_block_idx++; input_block_idx++;
} }
if (i != len_mults - 1) { if (i != len_mults - 1) {
@ -512,7 +515,7 @@ public:
// [N, 4*model_channels, h/8, w/8] // [N, 4*model_channels, h/8, w/8]
// middle_block // middle_block
if (version != VERSION_SD1_TINY_UNET) { if (!tiny_unet) {
h = resblock_forward("middle_block.0", ctx, h, emb, num_video_frames); // [N, 4*model_channels, h/8, w/8] h = resblock_forward("middle_block.0", ctx, h, emb, num_video_frames); // [N, 4*model_channels, h/8, w/8]
if (version != VERSION_SDXL_SSD1B) { if (version != VERSION_SDXL_SSD1B) {
h = attention_layer_forward("middle_block.1", ctx, h, context, num_video_frames); // [N, 4*model_channels, h/8, w/8] h = attention_layer_forward("middle_block.1", ctx, h, context, num_video_frames); // [N, 4*model_channels, h/8, w/8]
@ -554,7 +557,7 @@ public:
} }
if (i > 0 && j == num_res_blocks) { if (i > 0 && j == num_res_blocks) {
if (version == VERSION_SD1_TINY_UNET) { if (tiny_unet) {
output_block_idx++; output_block_idx++;
if (output_block_idx == 2) { if (output_block_idx == 2) {
up_sample_idx = 1; up_sample_idx = 1;