diff --git a/README.md b/README.md index 59174fe..756c125 100644 --- a/README.md +++ b/README.md @@ -4,19 +4,33 @@ # stable-diffusion.cpp -Inference of Stable Diffusion and Flux in pure C/C++ +Diffusion model(SD,Flux,Wan,...) inference in pure C/C++ + +***Note that this project is under active development. \ +API and command-line parameters may change frequently.*** ## Features - Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp) - Super lightweight and without external dependencies -- SD1.x, SD2.x, SDXL and [SD3/SD3.5](./docs/sd3.md) support - - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors). -- [Flux-dev/Flux-schnell Support](./docs/flux.md) -- [FLUX.1-Kontext-dev](./docs/kontext.md) -- [Chroma](./docs/chroma.md) -- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support -- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support. +- Supported models + - Image Models + - SD1.x, SD2.x, [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) + - SDXL, [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) + - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors). + - [SD3/SD3.5](./docs/sd3.md) + - [Flux-dev/Flux-schnell](./docs/flux.md) + - [Chroma](./docs/chroma.md) + - Image Edit Models + - [FLUX.1-Kontext-dev](./docs/kontext.md) + - Video Models + - [Wan2.1/Wan2.2](./docs/wan.md) + - [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support. + - Control Net support with SD 1.5 + - LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora) + - Latent Consistency Models support (LCM/LCM-LoRA) + - Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd) + - Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN) - 16-bit, 32-bit float support - 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support - Accelerated memory-efficient CPU inference @@ -26,15 +40,9 @@ Inference of Stable Diffusion and Flux in pure C/C++ - Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models - No need to convert to `.ggml` or `.gguf` anymore! - Flash Attention for memory usage optimization -- Original `txt2img` and `img2img` mode - Negative prompt - [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now) -- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora) -- Latent Consistency Models support (LCM/LCM-LoRA) -- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd) -- Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN) - VAE tiling processing for reduce memory usage -- Control Net support with SD 1.5 - Sampling method - `Euler A` - `Euler` @@ -287,8 +295,10 @@ arguments: If threads <= 0, then threads will be set to the number of CPU physical cores -m, --model [MODEL] path to full model --diffusion-model path to the standalone diffusion model + --high-noise-diffusion-model path to the standalone high noise diffusion model --clip_l path to the clip-l text encoder --clip_g path to the clip-g text encoder + --clip_vision path to the clip-vision encoder --t5xxl path to the t5xxl text encoder --vae [VAE] path to vae --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) @@ -303,8 +313,9 @@ arguments: If not specified, the default is the type of the weight file --tensor-type-rules [EXPRESSION] weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") --lora-model-dir [DIR] lora model directory - -i, --init-img [IMAGE] path to the input image, required by img2img + -i, --init-img [IMAGE] path to the init image, required by img2img --mask [MASK] path to the mask image, required by img2img with mask + -i, --end-img [IMAGE] path to the end image, required by flf2v --control-image [IMAGE] path to image condition, control net -r, --ref-image [PATH] reference image for Flux Kontext models (can be used multiple times) -o, --output OUTPUT path to write result image to (default: ./output.png) @@ -319,6 +330,23 @@ arguments: --skip-layers LAYERS Layers to skip for SLG steps: (default: [7,8,9]) --skip-layer-start START SLG enabling point: (default: 0.01) --skip-layer-end END SLG disabling point: (default: 0.2) + --scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete) + --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd} + sampling method (default: "euler_a") + --steps STEPS number of sample steps (default: 20) + --high-noise-cfg-scale SCALE (high noise) unconditional guidance scale: (default: 7.0) + --high-noise-img-cfg-scale SCALE (high noise) image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale) + --high-noise-guidance SCALE (high noise) distilled guidance scale for models with guidance input (default: 3.5) + --high-noise-slg-scale SCALE (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: 0) + 0 means disabled, a value of 2.5 is nice for sd3.5 medium + --high-noise-eta SCALE (high noise) eta in DDIM, only for DDIM and TCD: (default: 0) + --high-noise-skip-layers LAYERS (high noise) Layers to skip for SLG steps: (default: [7,8,9]) + --high-noise-skip-layer-start (high noise) SLG enabling point: (default: 0.01) + --high-noise-skip-layer-end END (high noise) SLG disabling point: (default: 0.2) + --high-noise-scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete) + --high-noise-sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd} + (high noise) sampling method (default: "euler_a") + --high-noise-steps STEPS (high noise) number of sample steps (default: 20) SLG will be enabled at step int([STEPS]*[START]) and disabled at int([STEPS]*[END]) --strength STRENGTH strength for noising/unnoising (default: 0.75) --style-ratio STYLE-RATIO strength for keeping input identity (default: 20) @@ -326,14 +354,10 @@ arguments: 1.0 corresponds to full destruction of information in init image -H, --height H image height, in pixel space (default: 512) -W, --width W image width, in pixel space (default: 512) - --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd} - sampling method (default: "euler_a") - --steps STEPS number of sample steps (default: 20) --rng {std_default, cuda} RNG (default: cuda) -s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0) -b, --batch-count COUNT number of images to generate - --scheduler {discrete, karras, exponential, ays, gits} Denoiser sigma scheduler (default: discrete) - --clip-skip N ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1) + --clip-skip N ignore last_dot_pos layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1) <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x --vae-tiling process vae in tiles to reduce memory usage --vae-on-cpu keep vae in cpu (for low vram) @@ -351,6 +375,8 @@ arguments: --chroma-disable-dit-mask disable dit mask for chroma --chroma-enable-t5-mask enable t5 mask for chroma --chroma-t5-mask-pad PAD_SIZE t5 mask pad size of chroma + --video-frames video frames (default: 1) + --fps fps (default: 24) -v, --verbose print extra info ``` @@ -438,3 +464,5 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp - [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model) - [generative-models](https://github.com/Stability-AI/generative-models/) - [PhotoMaker](https://github.com/TencentARC/PhotoMaker) +- [Wan2.1](https://github.com/Wan-Video/Wan2.1) +- [Wan2.2](https://github.com/Wan-Video/Wan2.2) \ No newline at end of file diff --git a/assets/wan/Wan2.1_1.3B_t2v.mp4 b/assets/wan/Wan2.1_1.3B_t2v.mp4 new file mode 100644 index 0000000..0356071 Binary files /dev/null and b/assets/wan/Wan2.1_1.3B_t2v.mp4 differ diff --git a/assets/wan/Wan2.1_14B_flf2v.mp4 b/assets/wan/Wan2.1_14B_flf2v.mp4 new file mode 100644 index 0000000..5576423 Binary files /dev/null and b/assets/wan/Wan2.1_14B_flf2v.mp4 differ diff --git a/assets/wan/Wan2.1_14B_i2v.mp4 b/assets/wan/Wan2.1_14B_i2v.mp4 new file mode 100644 index 0000000..d111bd0 Binary files /dev/null and b/assets/wan/Wan2.1_14B_i2v.mp4 differ diff --git a/assets/wan/Wan2.1_14B_t2v.mp4 b/assets/wan/Wan2.1_14B_t2v.mp4 new file mode 100644 index 0000000..1ed98a6 Binary files /dev/null and b/assets/wan/Wan2.1_14B_t2v.mp4 differ diff --git a/assets/wan/Wan2.2_14B_flf2v.mp4 b/assets/wan/Wan2.2_14B_flf2v.mp4 new file mode 100644 index 0000000..e1aa5a6 Binary files /dev/null and b/assets/wan/Wan2.2_14B_flf2v.mp4 differ diff --git a/assets/wan/Wan2.2_14B_i2v.mp4 b/assets/wan/Wan2.2_14B_i2v.mp4 new file mode 100644 index 0000000..38b8984 Binary files /dev/null and b/assets/wan/Wan2.2_14B_i2v.mp4 differ diff --git a/assets/wan/Wan2.2_14B_t2i.png b/assets/wan/Wan2.2_14B_t2i.png new file mode 100644 index 0000000..9c07688 Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2i.png differ diff --git a/assets/wan/Wan2.2_14B_t2v.mp4 b/assets/wan/Wan2.2_14B_t2v.mp4 new file mode 100644 index 0000000..1e8135d Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2v.mp4 differ diff --git a/assets/wan/Wan2.2_14B_t2v_lora.mp4 b/assets/wan/Wan2.2_14B_t2v_lora.mp4 new file mode 100644 index 0000000..f490c0f Binary files /dev/null and b/assets/wan/Wan2.2_14B_t2v_lora.mp4 differ diff --git a/assets/wan/Wan2.2_5B_i2v.mp4 b/assets/wan/Wan2.2_5B_i2v.mp4 new file mode 100644 index 0000000..da3efd1 Binary files /dev/null and b/assets/wan/Wan2.2_5B_i2v.mp4 differ diff --git a/assets/wan/Wan2.2_5B_t2v.mp4 b/assets/wan/Wan2.2_5B_t2v.mp4 new file mode 100644 index 0000000..f68b8b9 Binary files /dev/null and b/assets/wan/Wan2.2_5B_t2v.mp4 differ diff --git a/docs/wan.md b/docs/wan.md new file mode 100644 index 0000000..e975df3 --- /dev/null +++ b/docs/wan.md @@ -0,0 +1,141 @@ +# How to Use + +## Download weights + +- Download Wan + - Wan2.1 + - Wan2.1 T2V 1.3B + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models + - Wan2.1 T2V 14B + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main + - Wan2.1 I2V 14B 480P + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main + - Wan2.1 I2V 14B 720P + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf/tree/main + - Wan2.1 FLF2V 14B 720P + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf/tree/main + - Wan2.2 + - Wan2.2 TI2V 5B + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main + - Wan2.2 T2V A14B + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Wan2.2-T2V-A14B-GGUF/tree/main + - Wan2.2 I2V A14B + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models + - gguf: https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF/tree/main +- Download vae + - wan_2.1_vae (for all the wan model except Wan2.2 TI2V 5B) + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors + - wan_2.2_vae (for Wan2.2 TI2V 5B only) + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/vae/wan2.2_vae.safetensors +- Download umt5_xxl + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp16.safetensors + - gguf: https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main + +- Download clip_vison_h (for Wan2.1 I2V/FLF2V only) + - safetensors: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors + + +## Examples + +Since GitHub does not support AVI files, the file I uploaded was converted from AVI to MP4. + +### Wan2.1 T2V 1.3B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1_t2v_1.3B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部, 畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --video-frames 33 +``` + + + +### Wan2.1 T2V 14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-t2v-14b-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33 +``` + + + + + +### Wan2.1 I2V 14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-i2v-14b-480p-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --clip_vision ..\..\ComfyUI\models\clip_vision\clip_vision_h.safetensors -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu -i ..\assets\cat_with_sd_cpp_42.png +``` + + + +### Wan2.2 T2V A14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33 +``` + + + +### Wan2.2 I2V A14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --video-frames 33 --offload-to-cpu -i ..\assets\cat_with_sd_cpp_42.png +``` + + + +### Wan2.2 T2V A14B T2I + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu +``` + +Wan2 2_14B_t2i + +### Wan2.2 T2V 14B with Lora + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-T2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 3.5 --sampling-method euler --steps 4 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 4 -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 832 -H 480 --diffusion-fa --offload-to-cpu --lora-model-dir ..\..\ComfyUI\models\loras --video-frames 33 +``` + + + + + +### Wan2.2 TI2V 5B + +#### T2V + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.2_ti2v_5B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan2.2_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 +``` + + + +#### I2V + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.2_ti2v_5B_fp16.safetensors --vae ..\..\ComfyUI\models\vae\wan2.2_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 -i ..\assets\cat_with_sd_cpp_42.png +``` + + + +### Wan2.1 FLF2V 14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\wan2.1-flf2v-14b-720p-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --clip_vision ..\..\ComfyUI\models\clip_vision\clip_vision_h.safetensors -p "glass flower blossom" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu --init-img ..\..\ComfyUI\input\start_image.png --end-img ..\..\ComfyUI\input\end_image.png +``` + + + + +### Wan2.2 FLF2V 14B + +``` +.\bin\Release\sd.exe -M vid_gen --diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf --high-noise-diffusion-model ..\..\ComfyUI\models\diffusion_models\Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\wan_2.1_vae.safetensors --t5xxl ..\..\ComfyUI\models\text_encoders\umt5-xxl-encoder-Q8_0.gguf --cfg-scale 3.5 --sampling-method euler --steps 10 --high-noise-cfg-scale 3.5 --high-noise-sampling-method euler --high-noise-steps 8 -v -p "glass flower blossom" -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --video-frames 33 --offload-to-cpu --init-img ..\..\ComfyUI\input\start_image.png --end-img ..\..\ComfyUI\input\end_image.png +``` + + diff --git a/examples/cli/main.cpp b/examples/cli/main.cpp index 3fb93ec..5c6070b 100644 --- a/examples/cli/main.cpp +++ b/examples/cli/main.cpp @@ -262,9 +262,9 @@ void print_usage(int argc, const char* argv[]) { printf(" --diffusion-fa use flash attention in the diffusion model (for low vram)\n"); printf(" Might lower quality, since it implies converting k and v to f16.\n"); printf(" This might crash if it is not supported by the backend.\n"); - printf(" --diffusion-conv-direct use Conv2d direct in the diffusion model"); + printf(" --diffusion-conv-direct use Conv2d direct in the diffusion model\n"); printf(" This might crash if it is not supported by the backend.\n"); - printf(" --vae-conv-direct use Conv2d direct in the vae model (should improve the performance)"); + printf(" --vae-conv-direct use Conv2d direct in the vae model (should improve the performance)\n"); printf(" This might crash if it is not supported by the backend.\n"); printf(" --control-net-cpu keep controlnet in cpu (for low vram)\n"); printf(" --canny apply canny preprocessor (edge detection)\n");