stable-diffusion.cpp

mirror of https://github.com/leejet/stable-diffusion.cpp.git synced 2026-06-23 22:56:42 +00:00

Author	SHA1	Message	Date
Stefan-Olt	98ba155fc6	docs: HipBLAS / ROCm build instruction fix (#843 )	2025-09-25 00:03:05 +08:00
Wagner Bruna	513f36d495	docs: include Vulkan compatibility for LoRA quants (#845 )	2025-09-25 00:01:10 +08:00
rmatif	1e0d2821bb	fix: correct tensor deduplication logic (#844 ) master-302-1e0d282	2025-09-24 23:22:40 +08:00
leejet	fd693ac6a2	refactor: remove unused --normalize-input parameter (#835 ) master-301-fd693ac	2025-09-18 00:12:53 +08:00
Wagner Bruna	171b2222a5	fix: avoid segfault for pix2pix models without reference images (#766 ) * fix: avoid segfault for pix2pix models with no reference images * fix: default to empty reference on pix2pix models to avoid segfault * use resize instead of reserve * format code --------- Co-authored-by: leejet <leejet714@gmail.com> master-300-171b222	2025-09-18 00:11:38 +08:00
leejet	567f9f14f0	fix: avoid multithreading issues in the model loader master-299-567f9f1	2025-09-18 00:00:15 +08:00
leejet	1e5f207006	chore: fix workflow (#836 ) master-298-1e5f207	2025-09-17 22:11:55 +08:00
leejet	79426d578e	chore: set release tag by commit count	2025-09-16 23:24:36 +08:00
vmobilis	97ad3e7ff9	refactor: simplify DPM++ (2S) Ancestral (#667 ) master-97ad3e7	2025-09-16 23:05:25 +08:00
Erik Scholz	8909523e92	refactor: move tiling cacl and debug print into the tiling code branch (#833 ) master-8909523	2025-09-16 22:46:56 +08:00
rmatif	8376dfba2a	feat: add sgm_uniform scheduler, simple scheduler, and support for NitroFusion (#675 ) * feat: Add timestep shift and two new schedulers * update readme * fix spaces * format code * simplify SGMUniformSchedule * simplify shifted_timestep logic * avoid conflict --------- Co-authored-by: leejet <leejet714@gmail.com> master-8376dfb	2025-09-16 22:42:09 +08:00
leejet	0ebe6fe118	refactor: simplify the logic of pm id image loading (#827 ) master-0ebe6fe	2025-09-14 22:50:21 +08:00
rmatif	55c2e05d98	feat: optimize tensor loading time (#790 ) * opt tensor loading * fix build failure * revert the changes * allow the use of n_threads * fix lora loading * optimize lora loading * add mutex * use atomic * fix build * fix potential duplicate issue * avoid duplicate lookup of lora tensor * fix progeress bar * remove unused remove_duplicates --------- Co-authored-by: leejet <leejet714@gmail.com> master-55c2e05	2025-09-14 22:48:35 +08:00
leejet	52a97b3ac1	feat: add vace support (#819 ) * add wan vace t2v support * add --vace-strength option * add vace i2v support * fix the processing of vace_context * add vace v2v support * update docs master-52a97b3	2025-09-14 16:57:33 +08:00
stduhpf	2c9b1e2594	feat: add VAE encoding tiling support and adaptive overlap (#484 ) * implement tiling vae encode support * Tiling (vae/upscale): adaptative overlap * Tiling: fix edge case * Tiling: fix crash when less than 2 tiles per dim * remove extra dot * Tiling: fix edge cases for adaptative overlap * tiling: fix edge case * set vae tile size via env var * vae tiling: refactor again, base on smaller buffer for alignment * Use bigger tiles for encode (to match compute buffer size) * Fix edge case when tile is bigger than latent * non-square VAE tiling (#3) * refactor tile number calculation * support non-square tiles * add env var to change tile overlap * add safeguards and better error messages for SD_TILE_OVERLAP * add safeguards and include overlapping factor for SD_TILE_SIZE * avoid rounding issues when specifying SD_TILE_SIZE as a factor * lower SD_TILE_OVERLAP limit * zero-init empty output buffer * Fix decode latent size * fix encode * tile size params instead of env * Tiled vae parameter validation (#6) * avoid crash with invalid tile sizes, use 0 for default * refactor default tile size, limit overlap factor * remove explicit parameter for relative tile size * limit encoding tile to latent size * unify code style and format code * update docs * fix get_tile_sizes in decode_first_stage --------- Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com> master-2c9b1e2	2025-09-14 16:00:29 +08:00
leejet	288e2d63c0	docs: update docs	2025-09-14 14:24:24 +08:00
leejet	dc46993b55	feat: increase work_ctx memory buffer size (#814 ) master-dc46993	2025-09-14 13:19:20 +08:00
Richard Palethorpe	a6a8569ea0	feat: Add SYCL Dockerfile (#651 )	2025-09-14 13:02:59 +08:00
Erik Scholz	9e7befa320	fix: harden for large files (#643 ) master-9e7befa	2025-09-14 12:44:19 +08:00
Wagner Bruna	c607fc3ed4	feat: use Euler sampling by default for SD3 and Flux (#753 ) Thank you for your contribution. master-c607fc3	2025-09-14 12:34:41 +08:00
Wagner Bruna	b54bec3f18	fix: do not force VAE type to f32 on SDXL (#716 ) This seems to be a leftover from the initial SDXL support: it's not enough to avoid NaN issues, and it's not not needed for the fixed sdxl-vae-fp16-fix . master-b54bec3	2025-09-14 12:19:59 +08:00
Wagner Bruna	5869987fe4	fix: make weight override more robust against ggml changes (#760 ) master-5869987	2025-09-14 12:15:53 +08:00
Wagner Bruna	48956ffb87	feat: reduce CLIP memory usage with no embeddings (#768 ) master-48956ff	2025-09-14 12:08:00 +08:00
Wagner Bruna	ddc4a18b92	fix: make tiled VAE reuse the compute buffer (#821 ) master-ddc4a18	2025-09-14 11:41:50 +08:00
leejet	fce6afcc6a	feat: add sd3 flash attn support (#815 ) master-fce6afc	2025-09-11 23:24:29 +08:00
Erik Scholz	49d6570c43	feat: add SmoothStep Scheduler (#813 ) master-49d6570	2025-09-11 23:17:46 +08:00
clibdev	6bbaf161ad	chore: add install() support in CMakeLists.txt (#540 ) master-6bbaf16	2025-09-11 22:24:16 +08:00
clibdev	87cdbd5978	feat: use log_printf to print ggml logs (#545 ) master-87cdbd5	2025-09-11 22:16:05 +08:00
leejet	b017918106	chore: remove sd3 flash attention warn (#812 ) master-b017918	2025-09-10 22:21:02 +08:00
Wagner Bruna	ac5a215998	fix: use {} for params init instead of memset (#781 ) master-ac5a215	2025-09-10 21:49:29 +08:00
Wagner Bruna	abb36d66b5	chore: update flash attention warnings (#805 ) master-abb36d6	2025-09-10 21:38:21 +08:00
Wagner Bruna	ff4fdbb88d	fix: accept NULL in sd_img_gen_params_t::input_id_images_path (#809 ) master-ff4fdbb	2025-09-10 21:22:55 +08:00
Markus Hartung	abb115cd02	fix: clarify lora quant support and small fixes (#792 ) master-abb115c	2025-09-08 22:39:25 +08:00
leejet	c648001030	feat: add detailed tensor loading time stat (#793 ) master-c648001	2025-09-07 22:51:44 +08:00
stduhpf	c587a43c99	feat: support incrementing ref image index (omni-kontext) (#755 ) * kontext: support ref images indices * lora: support x_embedder * update help message * Support for negative indices * support for OmniControl (offsets at index 0) * c++11 compat * add --increase-ref-index option * simplify the logic and fix some issues * update README.md * remove unused variable --------- Co-authored-by: leejet <leejet714@gmail.com> master-c587a43	2025-09-07 22:35:16 +08:00
leejet	f8fe4e7db9	fix: add flash attn support check (#803 ) master-f8fe4e7	2025-09-07 21:29:06 +08:00
leejet	1c07fb6fb1	docs: update docs/wan.md	2025-09-07 12:07:20 +08:00
leejet	675208dcb6	chore: update to c++17 master-675208d	2025-09-07 12:04:17 +08:00
leejet	d7f430cd69	docs: update docs and help message master-d7f430c	2025-09-07 02:26:44 +08:00
stduhpf	141a4b4113	feat: add flow shift parameter (for SD3 and Wan) (#780 ) * Add flow shift parameter (for SD3 and Wan) * unify code style and fix some issues --------- Co-authored-by: leejet <leejet714@gmail.com> master-141a4b4	2025-09-07 02:16:59 +08:00
stduhpf	21ce9fe2cf	feat: add support for timestep boundary based automatic expert routing in Wan MoE (#779 ) * Wan MoE: Automatic expert routing based on timestep boundary * unify code style and fix some issues --------- Co-authored-by: leejet <leejet714@gmail.com> master-21ce9fe	2025-09-07 01:44:10 +08:00
leejet	cb1d975e96	feat: add wan2.1/2.2 support (#778 ) * add wan vae suppport * add wan model support * add umt5 support * add wan2.1 t2i support * make flash attn work with wan * make wan a little faster * add wan2.1 t2v support * add wan gguf support * add offload params to cpu support * add wan2.1 i2v support * crop image before resize * set default fps to 16 * add diff lora support * fix wan2.1 i2v * introduce sd_sample_params_t * add wan2.2 t2v support * add wan2.2 14B i2v support * add wan2.2 ti2v support * add high noise lora support * sync: update ggml submodule url * avoid build failure on linux * avoid build failure * update ggml * update ggml * fix sd_version_is_wan * update ggml, fix cpu im2col_3d * fix ggml_nn_attention_ext mask * add cache support to ggml runner * fix the issue of illegal memory access * unify image loading processing * add wan2.1/2.2 FLF2V support * fix end_image mask * update to latest ggml * add GGUFReader * update docs master-cb1d975	2025-09-06 18:08:03 +08:00
Wagner Bruna	2eb3845df5	fix: typo in the verbose long flag (#783 ) master-2eb3845	2025-09-04 00:49:01 +08:00
stduhpf	4c6475f917	feat: show usage on unknown arg (#767 ) master-4c6475f	2025-09-01 21:38:34 +08:00
SmallAndSoft	f0fa7ddc40	docs: add compile option needed by Ninja (#770 )	2025-09-01 21:35:25 +08:00
SmallAndSoft	a7c7905c6d	docs: add missing dash to docs/chroma.md (#771 )	2025-09-01 21:34:34 +08:00
Wagner Bruna	eea77cbad9	feat: throttle model loading progress updates (#782 ) Some terminals have slow display latency, so frequent output during model loading can actually slow down the process. Also, since tensor loading times can vary a lot, the progress display now shows the average across past iterations instead of just the last one. master-eea77cb	2025-09-01 21:32:01 +08:00
NekopenDev	0e86d90ee4	chore: add Nvidia 30 series (cuda arch 86) to build master-0e86d90	2025-09-01 21:21:34 +08:00
leejet	5900ef6605	sync: update ggml, make cuda im2col a little faster	2025-08-03 01:29:40 +08:00
Daniele	5b8996f74a	Conv2D direct support (#744 ) * Conv2DDirect for VAE stage * Enable only for Vulkan, reduced duplicated code * Cmake option to use conv2d direct * conv2d direct always on for opencl * conv direct as a flag * fix merge typo * Align conv2d behavior to flash attention's * fix readme * add conv2d direct for controlnet * add conv2d direct for esrgan * clean code, use enable_conv2d_direct/get_all_blocks * format code --------- Co-authored-by: leejet <leejet714@gmail.com> master-5b8996f	2025-08-03 01:25:17 +08:00

1 2 3 4 5 ...

354 Commits