329 Commits

Author SHA1 Message Date
leejet
98d6e71492 fix the issue that occurs when using CUDA with k-quants weights 2025-10-12 15:41:40 +08:00
leejet
6ea2a75929 apply jeffbolz f32 patch
https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302
2025-10-11 01:04:14 +08:00
leejet
d19d4a5903 Merge branch 'master' into qwen_image 2025-10-11 00:52:20 +08:00
leejet
02af48a97f
chore: fix vulkan ci (#878) master-314-02af48a 2025-10-11 00:40:57 +08:00
leejet
e12d5e0aaf
fix: ensure directory iteration results are sorted by filename (#858) 2025-10-11 00:18:39 +08:00
Serkan Sahin
940a2018e1
chore: fix dockerfile libgomp1 dependency + improvements (#852) 2025-10-11 00:17:45 +08:00
Sharuzzaman Ahmat Raslan
b451728b2f
docs: update README.md (#866) 2025-10-11 00:11:10 +08:00
stduhpf
11f436c483
feat: add support for Flux Controls and Flex.2 (#692) 2025-10-11 00:06:57 +08:00
leejet
94f4f295c1 Merge branch 'master' into qwen_image 2025-09-25 23:13:00 +08:00
leejet
35843c77ea
fix: optimize the handling of embedding weight (#859) master-309-35843c7 2025-09-25 23:09:59 +08:00
leejet
178a415d89 Merge branch 'master' into qwen_image 2025-09-25 22:01:08 +08:00
leejet
6ad46bb700 sync: update ggml 2025-09-25 21:57:43 +08:00
leejet
a3a2b2d721 fix get_first_stage_encoding 2025-09-25 00:41:56 +08:00
leejet
a8d3aa0415 Merge branch 'master' into qwen_image 2025-09-25 00:39:57 +08:00
leejet
1ba30ce005 sync: update ggml 2025-09-25 00:38:38 +08:00
leejet
2abe9451c4
fix: optimize the handling of CLIP embedding weight (#840) master-306-2abe945 2025-09-25 00:28:20 +08:00
Wagner Bruna
f3140eadbb
fix: tensor loading thread count (#854) master-305-f3140ea 2025-09-25 00:26:38 +08:00
Stefan-Olt
98ba155fc6
docs: HipBLAS / ROCm build instruction fix (#843) 2025-09-25 00:03:05 +08:00
Wagner Bruna
513f36d495
docs: include Vulkan compatibility for LoRA quants (#845) 2025-09-25 00:01:10 +08:00
rmatif
1e0d2821bb
fix: correct tensor deduplication logic (#844) master-302-1e0d282 2025-09-24 23:22:40 +08:00
leejet
5af0bb0aee change encoding of vocab_qwen.hpp to utf8 2025-09-22 23:57:59 +08:00
leejet
feb027958f add qwen image i2i pipline 2025-09-22 23:45:29 +08:00
leejet
477911fb20 fix qwen image flash attn 2025-09-22 22:19:03 +08:00
leejet
cf19c6e759 add qwen image t2i pipeline 2025-09-22 21:18:28 +08:00
leejet
d232509b6e add qwen image model 2025-09-21 23:37:13 +08:00
leejet
d8d4c268dc mv qwen.hpp -> qwenvl.hpp 2025-09-21 17:21:56 +08:00
leejet
fe4e73156f add qwen2.5 vl support 2025-09-21 00:31:48 +08:00
leejet
f88daa5114 add qwen tokenizer 2025-09-20 14:05:42 +08:00
leejet
fd693ac6a2
refactor: remove unused --normalize-input parameter (#835) master-301-fd693ac 2025-09-18 00:12:53 +08:00
Wagner Bruna
171b2222a5
fix: avoid segfault for pix2pix models without reference images (#766)
* fix: avoid segfault for pix2pix models with no reference images

* fix: default to empty reference on pix2pix models to avoid segfault

* use resize instead of reserve

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
master-300-171b222
2025-09-18 00:11:38 +08:00
leejet
567f9f14f0 fix: avoid multithreading issues in the model loader master-299-567f9f1 2025-09-18 00:00:15 +08:00
leejet
1e5f207006
chore: fix workflow (#836) master-298-1e5f207 2025-09-17 22:11:55 +08:00
leejet
79426d578e chore: set release tag by commit count 2025-09-16 23:24:36 +08:00
vmobilis
97ad3e7ff9
refactor: simplify DPM++ (2S) Ancestral (#667) master-97ad3e7 2025-09-16 23:05:25 +08:00
Erik Scholz
8909523e92
refactor: move tiling cacl and debug print into the tiling code branch (#833) master-8909523 2025-09-16 22:46:56 +08:00
rmatif
8376dfba2a
feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675)
* feat: Add timestep shift and two new schedulers

* update readme

* fix spaces

* format code

* simplify SGMUniformSchedule

* simplify shifted_timestep logic

* avoid conflict

---------

Co-authored-by: leejet <leejet714@gmail.com>
master-8376dfb
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118
refactor: simplify the logic of pm id image loading (#827) master-0ebe6fe 2025-09-14 22:50:21 +08:00
rmatif
55c2e05d98
feat: optimize tensor loading time (#790)
* opt tensor loading

* fix build failure

* revert the changes

* allow the use of n_threads

* fix lora loading

* optimize lora loading

* add mutex

* use atomic

* fix build

* fix potential duplicate issue

* avoid duplicate lookup of lora tensor

* fix progeress bar

* remove unused remove_duplicates

---------

Co-authored-by: leejet <leejet714@gmail.com>
master-55c2e05
2025-09-14 22:48:35 +08:00
leejet
52a97b3ac1
feat: add vace support (#819)
* add wan vace t2v support

* add --vace-strength option

* add vace i2v support

* fix the processing of vace_context

* add vace v2v support

* update docs
master-52a97b3
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594
feat: add VAE encoding tiling support and adaptive overlap (#484)
* implement  tiling vae encode support

* Tiling (vae/upscale): adaptative overlap

* Tiling: fix edge case

* Tiling: fix crash when less than 2 tiles per dim

* remove extra dot

* Tiling: fix edge cases for adaptative overlap

* tiling: fix edge case

* set vae tile size via env var

* vae tiling: refactor again, base on smaller buffer for alignment

* Use bigger tiles for encode (to match compute buffer size)

* Fix edge case when tile is bigger than latent

* non-square VAE tiling (#3)

* refactor tile number calculation

* support non-square tiles

* add env var to change tile overlap

* add safeguards and better error messages for SD_TILE_OVERLAP

* add safeguards and include overlapping factor for SD_TILE_SIZE

* avoid rounding issues when specifying SD_TILE_SIZE as a factor

* lower SD_TILE_OVERLAP limit

* zero-init empty output buffer

* Fix decode latent size

* fix encode

* tile size params instead of env

* Tiled vae parameter validation (#6)

* avoid crash with invalid tile sizes, use 0 for default

* refactor default tile size, limit overlap factor

* remove explicit parameter for relative tile size

* limit encoding tile to latent size

* unify code style and format code

* update docs

* fix get_tile_sizes in decode_first_stage

---------

Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
master-2c9b1e2
2025-09-14 16:00:29 +08:00
leejet
288e2d63c0 docs: update docs 2025-09-14 14:24:24 +08:00
leejet
dc46993b55
feat: increase work_ctx memory buffer size (#814) master-dc46993 2025-09-14 13:19:20 +08:00
Richard Palethorpe
a6a8569ea0
feat: Add SYCL Dockerfile (#651) 2025-09-14 13:02:59 +08:00
Erik Scholz
9e7befa320
fix: harden for large files (#643) master-9e7befa 2025-09-14 12:44:19 +08:00
Wagner Bruna
c607fc3ed4
feat: use Euler sampling by default for SD3 and Flux (#753)
Thank you for your contribution.
master-c607fc3
2025-09-14 12:34:41 +08:00
Wagner Bruna
b54bec3f18
fix: do not force VAE type to f32 on SDXL (#716)
This seems to be a leftover from the initial SDXL support: it's
not enough to avoid NaN issues, and it's not not needed for the
fixed sdxl-vae-fp16-fix .
master-b54bec3
2025-09-14 12:19:59 +08:00
Wagner Bruna
5869987fe4
fix: make weight override more robust against ggml changes (#760) master-5869987 2025-09-14 12:15:53 +08:00
Wagner Bruna
48956ffb87
feat: reduce CLIP memory usage with no embeddings (#768) master-48956ff 2025-09-14 12:08:00 +08:00
Wagner Bruna
ddc4a18b92
fix: make tiled VAE reuse the compute buffer (#821) master-ddc4a18 2025-09-14 11:41:50 +08:00
leejet
fce6afcc6a
feat: add sd3 flash attn support (#815) master-fce6afc 2025-09-11 23:24:29 +08:00