diff --git a/README.md b/README.md
index 59174fe..756c125 100644
--- a/README.md
+++ b/README.md
@@ -4,19 +4,33 @@
# stable-diffusion.cpp
-Inference of Stable Diffusion and Flux in pure C/C++
+Diffusion model(SD,Flux,Wan,...) inference in pure C/C++
+
+***Note that this project is under active development. \
+API and command-line parameters may change frequently.***
## Features
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
- Super lightweight and without external dependencies
-- SD1.x, SD2.x, SDXL and [SD3/SD3.5](./docs/sd3.md) support
- - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
-- [Flux-dev/Flux-schnell Support](./docs/flux.md)
-- [FLUX.1-Kontext-dev](./docs/kontext.md)
-- [Chroma](./docs/chroma.md)
-- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
-- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
+- Supported models
+ - Image Models
+ - SD1.x, SD2.x, [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo)
+ - SDXL, [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo)
+ - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
+ - [SD3/SD3.5](./docs/sd3.md)
+ - [Flux-dev/Flux-schnell](./docs/flux.md)
+ - [Chroma](./docs/chroma.md)
+ - Image Edit Models
+ - [FLUX.1-Kontext-dev](./docs/kontext.md)
+ - Video Models
+ - [Wan2.1/Wan2.2](./docs/wan.md)
+ - [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
+ - Control Net support with SD 1.5
+ - LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
+ - Latent Consistency Models support (LCM/LCM-LoRA)
+ - Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
+ - Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN)
- 16-bit, 32-bit float support
- 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support
- Accelerated memory-efficient CPU inference
@@ -26,15 +40,9 @@ Inference of Stable Diffusion and Flux in pure C/C++
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
- No need to convert to `.ggml` or `.gguf` anymore!
- Flash Attention for memory usage optimization
-- Original `txt2img` and `img2img` mode
- Negative prompt
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
-- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
-- Latent Consistency Models support (LCM/LCM-LoRA)
-- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
-- Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN)
- VAE tiling processing for reduce memory usage
-- Control Net support with SD 1.5
- Sampling method
- `Euler A`
- `Euler`
@@ -287,8 +295,10 @@ arguments:
If threads <= 0, then threads will be set to the number of CPU physical cores
-m, --model [MODEL] path to full model
--diffusion-model path to the standalone diffusion model
+ --high-noise-diffusion-model path to the standalone high noise diffusion model
--clip_l path to the clip-l text encoder
--clip_g path to the clip-g text encoder
+ --clip_vision path to the clip-vision encoder
--t5xxl path to the t5xxl text encoder
--vae [VAE] path to vae
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
@@ -303,8 +313,9 @@ arguments:
If not specified, the default is the type of the weight file
--tensor-type-rules [EXPRESSION] weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0")
--lora-model-dir [DIR] lora model directory
- -i, --init-img [IMAGE] path to the input image, required by img2img
+ -i, --init-img [IMAGE] path to the init image, required by img2img
--mask [MASK] path to the mask image, required by img2img with mask
+ -i, --end-img [IMAGE] path to the end image, required by flf2v
--control-image [IMAGE] path to image condition, control net
-r, --ref-image [PATH] reference image for Flux Kontext models (can be used multiple times)
-o, --output OUTPUT path to write result image to (default: ./output.png)
@@ -319,6 +330,23 @@ arguments:
--skip-layers LAYERS Layers to skip for SLG steps: (default: [7,8,9])
--skip-layer-start START SLG enabling point: (default: 0.01)
--skip-layer-end END SLG disabling point: (default: 0.2)
+ --scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete)
+ --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd}
+ sampling method (default: "euler_a")
+ --steps STEPS number of sample steps (default: 20)
+ --high-noise-cfg-scale SCALE (high noise) unconditional guidance scale: (default: 7.0)
+ --high-noise-img-cfg-scale SCALE (high noise) image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale)
+ --high-noise-guidance SCALE (high noise) distilled guidance scale for models with guidance input (default: 3.5)
+ --high-noise-slg-scale SCALE (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: 0)
+ 0 means disabled, a value of 2.5 is nice for sd3.5 medium
+ --high-noise-eta SCALE (high noise) eta in DDIM, only for DDIM and TCD: (default: 0)
+ --high-noise-skip-layers LAYERS (high noise) Layers to skip for SLG steps: (default: [7,8,9])
+ --high-noise-skip-layer-start (high noise) SLG enabling point: (default: 0.01)
+ --high-noise-skip-layer-end END (high noise) SLG disabling point: (default: 0.2)
+ --high-noise-scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete)
+ --high-noise-sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd}
+ (high noise) sampling method (default: "euler_a")
+ --high-noise-steps STEPS (high noise) number of sample steps (default: 20)
SLG will be enabled at step int([STEPS]*[START]) and disabled at int([STEPS]*[END])
--strength STRENGTH strength for noising/unnoising (default: 0.75)
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20)
@@ -326,14 +354,10 @@ arguments:
1.0 corresponds to full destruction of information in init image
-H, --height H image height, in pixel space (default: 512)
-W, --width W image width, in pixel space (default: 512)
- --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd}
- sampling method (default: "euler_a")
- --steps STEPS number of sample steps (default: 20)
--rng {std_default, cuda} RNG (default: cuda)
-s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0)
-b, --batch-count COUNT number of images to generate
- --scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete)
- --clip-skip N ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
+ --clip-skip N ignore last_dot_pos layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
<= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
--vae-tiling process vae in tiles to reduce memory usage
--vae-on-cpu keep vae in cpu (for low vram)
@@ -351,6 +375,8 @@ arguments:
--chroma-disable-dit-mask disable dit mask for chroma
--chroma-enable-t5-mask enable t5 mask for chroma
--chroma-t5-mask-pad PAD_SIZE t5 mask pad size of chroma
+ --video-frames video frames (default: 1)
+ --fps fps (default: 24)
-v, --verbose print extra info
```
@@ -438,3 +464,5 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp
- [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
- [generative-models](https://github.com/Stability-AI/generative-models/)
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
+- [Wan2.1](https://github.com/Wan-Video/Wan2.1)
+- [Wan2.2](https://github.com/Wan-Video/Wan2.2)
\ No newline at end of file
diff --git a/assets/wan/Wan2.1_1.3B_t2v.mp4 b/assets/wan/Wan2.1_1.3B_t2v.mp4
new file mode 100644
index 0000000..0356071
Binary files /dev/null and b/assets/wan/Wan2.1_1.3B_t2v.mp4 differ
diff --git a/assets/wan/Wan2.1_14B_flf2v.mp4 b/assets/wan/Wan2.1_14B_flf2v.mp4
new file mode 100644
index 0000000..5576423
Binary files /dev/null and b/assets/wan/Wan2.1_14B_flf2v.mp4 differ
diff --git a/assets/wan/Wan2.1_14B_i2v.mp4 b/assets/wan/Wan2.1_14B_i2v.mp4
new file mode 100644
index 0000000..d111bd0
Binary files /dev/null and b/assets/wan/Wan2.1_14B_i2v.mp4 differ
diff --git a/assets/wan/Wan2.1_14B_t2v.mp4 b/assets/wan/Wan2.1_14B_t2v.mp4
new file mode 100644
index 0000000..1ed98a6
Binary files /dev/null and b/assets/wan/Wan2.1_14B_t2v.mp4 differ
diff --git a/assets/wan/Wan2.2_14B_flf2v.mp4 b/assets/wan/Wan2.2_14B_flf2v.mp4
new file mode 100644
index 0000000..e1aa5a6
Binary files /dev/null and b/assets/wan/Wan2.2_14B_flf2v.mp4 differ
diff --git a/assets/wan/Wan2.2_14B_i2v.mp4 b/assets/wan/Wan2.2_14B_i2v.mp4
new file mode 100644
index 0000000..38b8984
Binary files /dev/null and b/assets/wan/Wan2.2_14B_i2v.mp4 differ
diff --git a/assets/wan/Wan2.2_14B_t2i.png b/assets/wan/Wan2.2_14B_t2i.png
new file mode 100644
index 0000000..9c07688
Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2i.png differ
diff --git a/assets/wan/Wan2.2_14B_t2v.mp4 b/assets/wan/Wan2.2_14B_t2v.mp4
new file mode 100644
index 0000000..1e8135d
Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2v.mp4 differ
diff --git a/assets/wan/Wan2.2_14B_t2v_lora.mp4 b/assets/wan/Wan2.2_14B_t2v_lora.mp4
new file mode 100644
index 0000000..f490c0f
Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2v_lora.mp4 differ
diff --git a/assets/wan/Wan2.2_5B_i2v.mp4 b/assets/wan/Wan2.2_5B_i2v.mp4
new file mode 100644
index 0000000..da3efd1
Binary files /dev/null and b/assets/wan/Wan2.2_5B_i2v.mp4 differ
diff --git a/assets/wan/Wan2.2_5B_t2v.mp4 b/assets/wan/Wan2.2_5B_t2v.mp4
new file mode 100644
index 0000000..f68b8b9
Binary files /dev/null and b/assets/wan/Wan2.2_5B_t2v.mp4 differ
diff --git a/docs/wan.md b/docs/wan.md
new file mode 100644
index 0000000..e975df3
--- /dev/null
+++ b/docs/wan.md
@@ -0,0 +1,141 @@
+# How to Use
+
+## Download weights
+
+- Download Wan
+ - Wan2.1
+ - Wan2.1 T2V 1.3B
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
+ - Wan2.1 T2V 14B
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main
+ - Wan2.1 I2V 14B 480P
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main
+ - Wan2.1 I2V 14B 720P
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf/tree/main
+ - Wan2.1 FLF2V 14B 720P
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf/tree/main
+ - Wan2.2
+ - Wan2.2 TI2V 5B
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main
+ - Wan2.2 T2V A14B
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/QuantStack/Wan2.2-T2V-A14B-GGUF/tree/main
+ - Wan2.2 I2V A14B
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
+ - gguf: https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF/tree/main
+- Download vae
+ - wan_2.1_vae (for all the wan model except Wan2.2 TI2V 5B)
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors
+ - wan_2.2_vae (for Wan2.2 TI2V 5B only)
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/vae/wan2.2_vae.safetensors
+- Download umt5_xxl
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp16.safetensors
+ - gguf: https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main
+
+- Download clip_vison_h (for Wan2.1 I2V/FLF2V only)
+ - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors
+
+
+## Examples
+
+Since GitHub does not support AVI files, the file I uploaded was converted from AVI to MP4.
+
+### Wan2.1 T2V 1.3B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1_t2v_1.3B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部, 畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --video-frames 33
+```
+
+
+
+### Wan2.1 T2V 14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-t2v-14b-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33
+```
+
+
+
+
+
+### Wan2.1 I2V 14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-i2v-14b-480p-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --clip_vision ..\..\ComfyUI\models\clip_vision\clip_vision_h.safetensors -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu -i ..\assets\cat_with_sd_cpp_42.png
+```
+
+
+
+### Wan2.2 T2V A14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33
+```
+
+
+
+### Wan2.2 I2V A14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33 --offload-to-cpu -i ..\assets\cat_with_sd_cpp_42.png
+```
+
+
+
+### Wan2.2 T2V A14B T2I
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu
+```
+
+
+
+### Wan2.2 T2V 14B with Lora
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 4 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 4 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --lora-model-dir ..\..\ComfyUI\models\loras --video-frames 33
+```
+
+
+
+
+
+### Wan2.2 TI2V 5B
+
+#### T2V
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.2_ti2v_5B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan2.2_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33
+```
+
+
+
+#### I2V
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.2_ti2v_5B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan2.2_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 -i ..\assets\cat_with_sd_cpp_42.png
+```
+
+
+
+### Wan2.1 FLF2V 14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-flf2v-14b-720p-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --clip_vision ..\..\ComfyUI\models\clip_vision\clip_vision_h.safetensors -p "glass flower blossom" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu --init-img ..\..\ComfyUI\input\start_image.png --end-img ..\..\ComfyUI\input\end_image.png
+```
+
+
+
+
+### Wan2.2 FLF2V 14B
+
+```
+.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -p "glass flower blossom" -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu --init-img ..\..\ComfyUI\input\start_image.png --end-img ..\..\ComfyUI\input\end_image.png
+```
+
+
diff --git a/examples/cli/main.cpp b/examples/cli/main.cpp
index 3fb93ec..5c6070b 100644
--- a/examples/cli/main.cpp
+++ b/examples/cli/main.cpp
@@ -262,9 +262,9 @@ void print_usage(int argc, const char* argv[]) {
printf(" --diffusion-fa use flash attention in the diffusion model (for low vram)\n");
printf(" Might lower quality, since it implies converting k and v to f16.\n");
printf(" This might crash if it is not supported by the backend.\n");
- printf(" --diffusion-conv-direct use Conv2d direct in the diffusion model");
+ printf(" --diffusion-conv-direct use Conv2d direct in the diffusion model\n");
printf(" This might crash if it is not supported by the backend.\n");
- printf(" --vae-conv-direct use Conv2d direct in the vae model (should improve the performance)");
+ printf(" --vae-conv-direct use Conv2d direct in the vae model (should improve the performance)\n");
printf(" This might crash if it is not supported by the backend.\n");
printf(" --control-net-cpu keep controlnet in cpu (for low vram)\n");
printf(" --canny apply canny preprocessor (edge detection)\n");