83 Commits

Author SHA1 Message Date
stduhpf
8ecdf053ac
feat: add image preview support (#522) 2025-11-10 00:12:02 +08:00
leejet
d05e46ca5e
chore: add .clang-tidy configuration and apply modernize checks (#902) 2025-10-18 23:23:40 +08:00
leejet
0723ee51c9
refactor: optimize option printing (#900) 2025-10-18 17:50:30 +08:00
leejet
90ef5f8246
feat: add auto-resize support for reference images (was Qwen-Image-Edit only) (#898) 2025-10-18 16:37:09 +08:00
leejet
40a6a8710e
fix: resolve precision issues in SDXL VAE under fp16 (#888)
* fix: resolve precision issues in SDXL VAE under fp16

* add --force-sdxl-vae-conv-scale option

* update docs
2025-10-15 23:01:00 +08:00
Daniele
e3702585cb
feat: added prediction argument (#334) 2025-10-15 23:00:10 +08:00
leejet
2e9242e37f
feat: add Qwen Image Edit support (#877)
* add ref latent support for qwen image

* optimize clip_preprocess and fix get_first_stage_encoding

* add qwen2vl vit support

* add qwen image edit support

* fix qwen image edit pipeline

* add mmproj file support

* support dynamic number of Qwen image transformer blocks

* set prompt_template_encode_start_idx every time

* to_add_out precision fix

* to_out.0 precision fix

* update docs
2025-10-13 23:17:18 +08:00
leejet
beb99a2de2
feat: add Qwen Image support (#851)
* add qwen tokenizer

* add qwen2.5 vl support

* mv qwen.hpp -> qwenvl.hpp

* add qwen image model

* add qwen image t2i pipeline

* fix qwen image flash attn

* add qwen image i2i pipline

* change encoding of vocab_qwen.hpp to utf8

* fix get_first_stage_encoding

* apply jeffbolz f32 patch

https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302

* fix the issue that occurs when using CUDA with k-quants weights

* optimize the handling of the FeedForward precision fix

* to_add_out precision fix

* update docs
2025-10-12 23:23:19 +08:00
Wagner Bruna
aa68b875b9
refactor: deal with default img-cfg-scale at the library level (#869) 2025-10-12 23:17:52 +08:00
Wagner Bruna
5b261b9cee
feat: add a stand-alone upscale mode (#865)
* feat: add a stand-alone upscale mode

* fix prompt option check

* format code

* update README.md

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-10-12 23:10:02 +08:00
leejet
e12d5e0aaf
fix: ensure directory iteration results are sorted by filename (#858) 2025-10-11 00:18:39 +08:00
stduhpf
11f436c483
feat: add support for Flux Controls and Flex.2 (#692) 2025-10-11 00:06:57 +08:00
leejet
fd693ac6a2
refactor: remove unused --normalize-input parameter (#835) 2025-09-18 00:12:53 +08:00
rmatif
8376dfba2a
feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675)
* feat: Add timestep shift and two new schedulers

* update readme

* fix spaces

* format code

* simplify SGMUniformSchedule

* simplify shifted_timestep logic

* avoid conflict

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118
refactor: simplify the logic of pm id image loading (#827) 2025-09-14 22:50:21 +08:00
leejet
52a97b3ac1
feat: add vace support (#819)
* add wan vace t2v support

* add --vace-strength option

* add vace i2v support

* fix the processing of vace_context

* add vace v2v support

* update docs
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594
feat: add VAE encoding tiling support and adaptive overlap (#484)
* implement  tiling vae encode support

* Tiling (vae/upscale): adaptative overlap

* Tiling: fix edge case

* Tiling: fix crash when less than 2 tiles per dim

* remove extra dot

* Tiling: fix edge cases for adaptative overlap

* tiling: fix edge case

* set vae tile size via env var

* vae tiling: refactor again, base on smaller buffer for alignment

* Use bigger tiles for encode (to match compute buffer size)

* Fix edge case when tile is bigger than latent

* non-square VAE tiling (#3)

* refactor tile number calculation

* support non-square tiles

* add env var to change tile overlap

* add safeguards and better error messages for SD_TILE_OVERLAP

* add safeguards and include overlapping factor for SD_TILE_SIZE

* avoid rounding issues when specifying SD_TILE_SIZE as a factor

* lower SD_TILE_OVERLAP limit

* zero-init empty output buffer

* Fix decode latent size

* fix encode

* tile size params instead of env

* Tiled vae parameter validation (#6)

* avoid crash with invalid tile sizes, use 0 for default

* refactor default tile size, limit overlap factor

* remove explicit parameter for relative tile size

* limit encoding tile to latent size

* unify code style and format code

* update docs

* fix get_tile_sizes in decode_first_stage

---------

Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2025-09-14 16:00:29 +08:00
Wagner Bruna
c607fc3ed4
feat: use Euler sampling by default for SD3 and Flux (#753)
Thank you for your contribution.
2025-09-14 12:34:41 +08:00
Erik Scholz
49d6570c43
feat: add SmoothStep Scheduler (#813) 2025-09-11 23:17:46 +08:00
Markus Hartung
abb115cd02
fix: clarify lora quant support and small fixes (#792) 2025-09-08 22:39:25 +08:00
stduhpf
c587a43c99
feat: support incrementing ref image index (omni-kontext) (#755)
* kontext: support  ref images indices

* lora: support x_embedder

* update help message

* Support for negative indices

* support for OmniControl (offsets at index 0)

* c++11 compat

* add --increase-ref-index option

* simplify the logic and fix some issues

* update README.md

* remove unused variable

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 22:35:16 +08:00
leejet
d7f430cd69 docs: update docs and help message 2025-09-07 02:26:44 +08:00
stduhpf
141a4b4113
feat: add flow shift parameter (for SD3 and Wan) (#780)
* Add flow shift parameter (for SD3 and Wan)

* unify code style and fix some issues

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 02:16:59 +08:00
stduhpf
21ce9fe2cf
feat: add support for timestep boundary based automatic expert routing in Wan MoE (#779)
* Wan MoE: Automatic expert routing based on timestep boundary

* unify code style and fix some issues

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-09-07 01:44:10 +08:00
leejet
cb1d975e96
feat: add wan2.1/2.2 support (#778)
* add wan vae suppport

* add wan model support

* add umt5 support

* add wan2.1 t2i support

* make flash attn work with wan

* make wan a little faster

* add wan2.1 t2v support

* add wan gguf support

* add offload params to cpu support

* add wan2.1 i2v support

* crop image before resize

* set default fps to 16

* add diff lora support

* fix wan2.1 i2v

* introduce sd_sample_params_t

* add wan2.2 t2v support

* add wan2.2 14B i2v support

* add wan2.2 ti2v support

* add high noise lora support

* sync: update ggml submodule url

* avoid build failure on linux

* avoid build failure

* update ggml

* update ggml

* fix sd_version_is_wan

* update ggml, fix cpu im2col_3d

* fix ggml_nn_attention_ext mask

* add cache support to ggml runner

* fix the issue of illegal memory access

* unify image loading processing

* add wan2.1/2.2 FLF2V support

* fix end_image mask

* update to latest ggml

* add GGUFReader

* update docs
2025-09-06 18:08:03 +08:00
Wagner Bruna
2eb3845df5
fix: typo in the verbose long flag (#783) 2025-09-04 00:49:01 +08:00
stduhpf
4c6475f917
feat: show usage on unknown arg (#767) 2025-09-01 21:38:34 +08:00
Daniele
5b8996f74a
Conv2D direct support (#744)
* Conv2DDirect for VAE stage

* Enable only for Vulkan, reduced duplicated code

* Cmake option to use conv2d direct

* conv2d direct always on for opencl

* conv direct as a flag

* fix merge typo

* Align conv2d behavior to flash attention's

* fix readme

* add conv2d direct for controlnet

* add conv2d direct for esrgan

* clean code, use enable_conv2d_direct/get_all_blocks

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-08-03 01:25:17 +08:00
Wagner Bruna
7eb30d00e5
feat: add missing models and parameters to image metadata (#743)
* feat: add new scheduler types, clip skip and vae to image embedded params

- If a non default scheduler is set, include it in the 'Sampler' tag in the data
embedded into the final image.
- If a custom VAE path is set, include the vae name (without path and extension)
in embedded image params under a `VAE:` tag.
- If a custom Clip skip is set, include that Clip skip value in embedded image
params under a `Clip skip:` tag.

* feat: add separate diffusion and text models to metadata

---------

Co-authored-by: one-lithe-rune <skapusniak@lithe-runes.com>
2025-07-28 22:00:27 +08:00
stduhpf
59080d3ce1
feat: change image dimensions requirement for DiT models (#742) 2025-07-28 21:58:17 +08:00
leejet
0739361bfe fix: avoid macOS build failed 2025-07-13 20:18:10 +08:00
leejet
ca0bd9396e
refactor: update c api (#728) 2025-07-13 18:48:42 +08:00
stduhpf
a772dca27a
feat: add Instruct-Pix2pix/CosXL-Edit support (#679)
* Instruct-p2p support

* support 2 conditionings cfg

* Do not re-encode the exact same image twice

* fixes for 2-cfg

* Fix pix2pix latent inputs + improve inpainting a bit + fix naming

* prepare for other pix2pix-like models

* Support sdxl ip2p

* fix reference image embeddings

* Support 2-cond cfg properly in cli

* fix typo in help

* Support masks for ip2p models

* unify code style

* delete unused code

* use edit mode

* add img_cond

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-07-12 15:36:45 +08:00
Wagner Bruna
6d84a30c66
feat: overriding quant types for specific tensors on model conversion (#724) 2025-07-08 00:11:38 +08:00
leejet
b9e4718fac fix: correct --chroma-enable-t5-mask argument 2025-07-06 11:11:47 +08:00
Wagner Bruna
76c72628b1
fix: fix a few typos on cli help and error messages (#714) 2025-07-04 22:15:41 +08:00
leejet
a28d04dd81 fix: fix the issue in parsing --chroma-disable-dit-mask 2025-06-29 23:52:36 +08:00
leejet
45d0ebb30c style: format code 2025-06-29 23:40:55 +08:00
stduhpf
b1cc40c35c
feat: add Chroma support (#696)
---------

Co-authored-by: Green Sky <Green-Sky@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2025-06-29 23:36:42 +08:00
stduhpf
c9b5735116
feat: add FLUX.1 Kontext dev support (#707)
* Kontext support
* add edit mode

---------

Co-authored-by: leejet <leejet714@gmail.com>
2025-06-29 10:08:53 +08:00
vmobilis
81556f3136
chore: silence some warnings about precision loss (#620) 2025-03-09 12:22:39 +08:00
leejet
30b3ac8e62 fix: avoid potential dangling pointer problem 2025-03-01 16:58:26 +08:00
yslai
19d876ee30
feat: implement DDIM with the "trailing" timestep spacing and TCD (#568) 2025-02-22 21:34:22 +08:00
lalala
f27f2b2aa2
docs: add missing --mask and --guidance options to print_usage (#572) 2025-02-22 21:32:37 +08:00
vmobilis
d46ed5e184
feat: support JPEG compression (#583) 2025-02-05 16:18:02 +08:00
piallai
b5cc1422da
fix: fix typo for skip layers parameters (#492) 2024-12-28 13:12:08 +08:00
stduhpf
8f4ab9add3
feat: support Inpaint models (#511) 2024-12-28 13:04:49 +08:00
stduhpf
9148b980be
feat: remove type restrictions (#489) 2024-11-30 14:22:15 +08:00
stduhpf
7ce63e740c
feat: flexible model architecture for dit models (Flux & SD3) (#490)
* Refactor: wtype per tensor

* Fix default args

* refactor: fix flux

* Refactor photmaker v2 support

* unet: refactor the refactoring

* Refactor: fix controlnet and tae

* refactor: upscaler

* Refactor: fix runtime type override

* upscaler: use fp16 again

* Refactor: Flexible sd3 arch

* Refactor: Flexible Flux arch

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-30 14:18:53 +08:00
stduhpf
53b415f787
fix: remove default variables in c headers (#478) 2024-11-24 18:10:25 +08:00