leejet
6ad46bb700
sync: update ggml
2025-09-25 21:57:43 +08:00
leejet
1ba30ce005
sync: update ggml
2025-09-25 00:38:38 +08:00
leejet
2abe9451c4
fix: optimize the handling of CLIP embedding weight ( #840 )
master-306-2abe945
2025-09-25 00:28:20 +08:00
Wagner Bruna
f3140eadbb
fix: tensor loading thread count ( #854 )
master-305-f3140ea
2025-09-25 00:26:38 +08:00
Stefan-Olt
98ba155fc6
docs: HipBLAS / ROCm build instruction fix ( #843 )
2025-09-25 00:03:05 +08:00
Wagner Bruna
513f36d495
docs: include Vulkan compatibility for LoRA quants ( #845 )
2025-09-25 00:01:10 +08:00
rmatif
1e0d2821bb
fix: correct tensor deduplication logic ( #844 )
master-302-1e0d282
2025-09-24 23:22:40 +08:00
leejet
fd693ac6a2
refactor: remove unused --normalize-input parameter ( #835 )
master-301-fd693ac
2025-09-18 00:12:53 +08:00
Wagner Bruna
171b2222a5
fix: avoid segfault for pix2pix models without reference images ( #766 )
...
* fix: avoid segfault for pix2pix models with no reference images
* fix: default to empty reference on pix2pix models to avoid segfault
* use resize instead of reserve
* format code
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-300-171b222
2025-09-18 00:11:38 +08:00
leejet
567f9f14f0
fix: avoid multithreading issues in the model loader
master-299-567f9f1
2025-09-18 00:00:15 +08:00
leejet
1e5f207006
chore: fix workflow ( #836 )
master-298-1e5f207
2025-09-17 22:11:55 +08:00
leejet
79426d578e
chore: set release tag by commit count
2025-09-16 23:24:36 +08:00
vmobilis
97ad3e7ff9
refactor: simplify DPM++ (2S) Ancestral ( #667 )
master-97ad3e7
2025-09-16 23:05:25 +08:00
Erik Scholz
8909523e92
refactor: move tiling cacl and debug print into the tiling code branch ( #833 )
master-8909523
2025-09-16 22:46:56 +08:00
rmatif
8376dfba2a
feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion ( #675 )
...
* feat: Add timestep shift and two new schedulers
* update readme
* fix spaces
* format code
* simplify SGMUniformSchedule
* simplify shifted_timestep logic
* avoid conflict
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-8376dfb
2025-09-16 22:42:09 +08:00
leejet
0ebe6fe118
refactor: simplify the logic of pm id image loading ( #827 )
master-0ebe6fe
2025-09-14 22:50:21 +08:00
rmatif
55c2e05d98
feat: optimize tensor loading time ( #790 )
...
* opt tensor loading
* fix build failure
* revert the changes
* allow the use of n_threads
* fix lora loading
* optimize lora loading
* add mutex
* use atomic
* fix build
* fix potential duplicate issue
* avoid duplicate lookup of lora tensor
* fix progeress bar
* remove unused remove_duplicates
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-55c2e05
2025-09-14 22:48:35 +08:00
leejet
52a97b3ac1
feat: add vace support ( #819 )
...
* add wan vace t2v support
* add --vace-strength option
* add vace i2v support
* fix the processing of vace_context
* add vace v2v support
* update docs
master-52a97b3
2025-09-14 16:57:33 +08:00
stduhpf
2c9b1e2594
feat: add VAE encoding tiling support and adaptive overlap ( #484 )
...
* implement tiling vae encode support
* Tiling (vae/upscale): adaptative overlap
* Tiling: fix edge case
* Tiling: fix crash when less than 2 tiles per dim
* remove extra dot
* Tiling: fix edge cases for adaptative overlap
* tiling: fix edge case
* set vae tile size via env var
* vae tiling: refactor again, base on smaller buffer for alignment
* Use bigger tiles for encode (to match compute buffer size)
* Fix edge case when tile is bigger than latent
* non-square VAE tiling (#3 )
* refactor tile number calculation
* support non-square tiles
* add env var to change tile overlap
* add safeguards and better error messages for SD_TILE_OVERLAP
* add safeguards and include overlapping factor for SD_TILE_SIZE
* avoid rounding issues when specifying SD_TILE_SIZE as a factor
* lower SD_TILE_OVERLAP limit
* zero-init empty output buffer
* Fix decode latent size
* fix encode
* tile size params instead of env
* Tiled vae parameter validation (#6 )
* avoid crash with invalid tile sizes, use 0 for default
* refactor default tile size, limit overlap factor
* remove explicit parameter for relative tile size
* limit encoding tile to latent size
* unify code style and format code
* update docs
* fix get_tile_sizes in decode_first_stage
---------
Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
master-2c9b1e2
2025-09-14 16:00:29 +08:00
leejet
288e2d63c0
docs: update docs
2025-09-14 14:24:24 +08:00
leejet
dc46993b55
feat: increase work_ctx memory buffer size ( #814 )
master-dc46993
2025-09-14 13:19:20 +08:00
Richard Palethorpe
a6a8569ea0
feat: Add SYCL Dockerfile ( #651 )
2025-09-14 13:02:59 +08:00
Erik Scholz
9e7befa320
fix: harden for large files ( #643 )
master-9e7befa
2025-09-14 12:44:19 +08:00
Wagner Bruna
c607fc3ed4
feat: use Euler sampling by default for SD3 and Flux ( #753 )
...
Thank you for your contribution.
master-c607fc3
2025-09-14 12:34:41 +08:00
Wagner Bruna
b54bec3f18
fix: do not force VAE type to f32 on SDXL ( #716 )
...
This seems to be a leftover from the initial SDXL support: it's
not enough to avoid NaN issues, and it's not not needed for the
fixed sdxl-vae-fp16-fix .
master-b54bec3
2025-09-14 12:19:59 +08:00
Wagner Bruna
5869987fe4
fix: make weight override more robust against ggml changes ( #760 )
master-5869987
2025-09-14 12:15:53 +08:00
Wagner Bruna
48956ffb87
feat: reduce CLIP memory usage with no embeddings ( #768 )
master-48956ff
2025-09-14 12:08:00 +08:00
Wagner Bruna
ddc4a18b92
fix: make tiled VAE reuse the compute buffer ( #821 )
master-ddc4a18
2025-09-14 11:41:50 +08:00
leejet
fce6afcc6a
feat: add sd3 flash attn support ( #815 )
master-fce6afc
2025-09-11 23:24:29 +08:00
Erik Scholz
49d6570c43
feat: add SmoothStep Scheduler ( #813 )
master-49d6570
2025-09-11 23:17:46 +08:00
clibdev
6bbaf161ad
chore: add install() support in CMakeLists.txt ( #540 )
master-6bbaf16
2025-09-11 22:24:16 +08:00
clibdev
87cdbd5978
feat: use log_printf to print ggml logs ( #545 )
master-87cdbd5
2025-09-11 22:16:05 +08:00
leejet
b017918106
chore: remove sd3 flash attention warn ( #812 )
master-b017918
2025-09-10 22:21:02 +08:00
Wagner Bruna
ac5a215998
fix: use {} for params init instead of memset ( #781 )
master-ac5a215
2025-09-10 21:49:29 +08:00
Wagner Bruna
abb36d66b5
chore: update flash attention warnings ( #805 )
master-abb36d6
2025-09-10 21:38:21 +08:00
Wagner Bruna
ff4fdbb88d
fix: accept NULL in sd_img_gen_params_t::input_id_images_path ( #809 )
master-ff4fdbb
2025-09-10 21:22:55 +08:00
Markus Hartung
abb115cd02
fix: clarify lora quant support and small fixes ( #792 )
master-abb115c
2025-09-08 22:39:25 +08:00
leejet
c648001030
feat: add detailed tensor loading time stat ( #793 )
master-c648001
2025-09-07 22:51:44 +08:00
stduhpf
c587a43c99
feat: support incrementing ref image index (omni-kontext) ( #755 )
...
* kontext: support ref images indices
* lora: support x_embedder
* update help message
* Support for negative indices
* support for OmniControl (offsets at index 0)
* c++11 compat
* add --increase-ref-index option
* simplify the logic and fix some issues
* update README.md
* remove unused variable
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-c587a43
2025-09-07 22:35:16 +08:00
leejet
f8fe4e7db9
fix: add flash attn support check ( #803 )
master-f8fe4e7
2025-09-07 21:29:06 +08:00
leejet
1c07fb6fb1
docs: update docs/wan.md
2025-09-07 12:07:20 +08:00
leejet
675208dcb6
chore: update to c++17
master-675208d
2025-09-07 12:04:17 +08:00
leejet
d7f430cd69
docs: update docs and help message
master-d7f430c
2025-09-07 02:26:44 +08:00
stduhpf
141a4b4113
feat: add flow shift parameter (for SD3 and Wan) ( #780 )
...
* Add flow shift parameter (for SD3 and Wan)
* unify code style and fix some issues
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-141a4b4
2025-09-07 02:16:59 +08:00
stduhpf
21ce9fe2cf
feat: add support for timestep boundary based automatic expert routing in Wan MoE ( #779 )
...
* Wan MoE: Automatic expert routing based on timestep boundary
* unify code style and fix some issues
---------
Co-authored-by: leejet <leejet714@gmail.com>
master-21ce9fe
2025-09-07 01:44:10 +08:00
leejet
cb1d975e96
feat: add wan2.1/2.2 support ( #778 )
...
* add wan vae suppport
* add wan model support
* add umt5 support
* add wan2.1 t2i support
* make flash attn work with wan
* make wan a little faster
* add wan2.1 t2v support
* add wan gguf support
* add offload params to cpu support
* add wan2.1 i2v support
* crop image before resize
* set default fps to 16
* add diff lora support
* fix wan2.1 i2v
* introduce sd_sample_params_t
* add wan2.2 t2v support
* add wan2.2 14B i2v support
* add wan2.2 ti2v support
* add high noise lora support
* sync: update ggml submodule url
* avoid build failure on linux
* avoid build failure
* update ggml
* update ggml
* fix sd_version_is_wan
* update ggml, fix cpu im2col_3d
* fix ggml_nn_attention_ext mask
* add cache support to ggml runner
* fix the issue of illegal memory access
* unify image loading processing
* add wan2.1/2.2 FLF2V support
* fix end_image mask
* update to latest ggml
* add GGUFReader
* update docs
master-cb1d975
2025-09-06 18:08:03 +08:00
Wagner Bruna
2eb3845df5
fix: typo in the verbose long flag ( #783 )
master-2eb3845
2025-09-04 00:49:01 +08:00
stduhpf
4c6475f917
feat: show usage on unknown arg ( #767 )
master-4c6475f
2025-09-01 21:38:34 +08:00
SmallAndSoft
f0fa7ddc40
docs: add compile option needed by Ninja ( #770 )
2025-09-01 21:35:25 +08:00
SmallAndSoft
a7c7905c6d
docs: add missing dash to docs/chroma.md ( #771 )
2025-09-01 21:34:34 +08:00