stable-diffusion.cpp

mirror of https://github.com/leejet/stable-diffusion.cpp.git synced 2025-12-12 21:38:58 +00:00

Author	SHA1	Message	Date
Wagner Bruna	a3a88fc9b2	fix: avoid crash loading LoRAs with bf16 weights (#1077 )	2025-12-12 22:36:54 +08:00
leejet	8823dc48bc	feat: align the spatial size to the corresponding multiple (#1073 )	2025-12-10 23:15:08 +08:00
leejet	2f0bd31a84	feat: add ovis image support (#1057 )	2025-12-07 12:32:56 +08:00
leejet	689e44c9a8	fix: correct ggml_ext_silu_act (#1056 )	2025-12-06 23:55:28 +08:00
stduhpf	bcc9c0d0b3	feat: handle ggml compute failures without crashing the program (#1003 ) * Feat: handle compute failures more gracefully * fix Unreachable code after return Co-authored-by: idostyle <idostyl3@googlemail.com> * adjust z_image.hpp --------- Co-authored-by: idostyle <idostyl3@googlemail.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-12-04 22:04:27 +08:00
leejet	bc80225336	fix: make the immediate LoRA apply mode work better when using Vulkan (#1021 )	2025-11-30 12:08:25 +08:00
leejet	52b67c538b	feat: add flux2 support (#1016 ) * add flux2 support * rename qwenvl to llm * add Flux2FlowDenoiser * update docs	2025-11-30 11:32:56 +08:00
stduhpf	aa2b8e0ca5	fix: patch 1x1 conv weights at runtime (#986 )	2025-11-19 23:27:23 +08:00
leejet	347710f68f	feat: support applying LoRA at runtime (#969 )	2025-11-13 21:48:44 +08:00
leejet	694f0d9235	refactor: optimize the logic for name conversion and the processing of the LoRA model (#955 )	2025-11-10 00:12:20 +08:00
stduhpf	8ecdf053ac	feat: add image preview support (#522 )	2025-11-10 00:12:02 +08:00
leejet	c2d8ffc22c	fix: compatibility for models with modified tensor shapes (#951 )	2025-11-07 23:04:41 +08:00
leejet	8f6c5c217b	refactor: simplify the model loading logic (#933 ) * remove String2GGMLType * remove preprocess_tensor * fix clip init * simplify the logic for reading weights	2025-11-03 21:21:34 +08:00
leejet	6103d86e2c	refactor: introduce GGMLRunnerContext (#928 ) * introduce GGMLRunnerContext * add Flash Attention enable control through GGMLRunnerContext * add conv2d_direct enable control through GGMLRunnerContext	2025-11-02 02:11:04 +08:00
leejet	dd75fc081c	refactor: unify the naming style of ggml extension functions (#921 )	2025-10-28 23:26:48 +08:00
leejet	9e28be6479	feat: add chroma radiance support (#910 ) * add chroma radiance support * fix ci * simply generate_init_latent * workaround: avoid ggml cuda error * format code * add chroma radiance doc	2025-10-25 23:56:14 +08:00
leejet	d05e46ca5e	chore: add .clang-tidy configuration and apply modernize checks (#902 )	2025-10-18 23:23:40 +08:00
leejet	40a6a8710e	fix: resolve precision issues in SDXL VAE under fp16 (#888 ) * fix: resolve precision issues in SDXL VAE under fp16 * add --force-sdxl-vae-conv-scale option * update docs	2025-10-15 23:01:00 +08:00
leejet	2e9242e37f	feat: add Qwen Image Edit support (#877 ) * add ref latent support for qwen image * optimize clip_preprocess and fix get_first_stage_encoding * add qwen2vl vit support * add qwen image edit support * fix qwen image edit pipeline * add mmproj file support * support dynamic number of Qwen image transformer blocks * set prompt_template_encode_start_idx every time * to_add_out precision fix * to_out.0 precision fix * update docs	2025-10-13 23:17:18 +08:00
Wagner Bruna	5436f6b814	fix: correct canny preprocessor (#861 )	2025-10-13 22:02:35 +08:00
Wagner Bruna	9727c6bb98	fix: resolve VAE tiling problem in Qwen Image (#873 )	2025-10-12 23:45:53 +08:00
leejet	beb99a2de2	feat: add Qwen Image support (#851 ) * add qwen tokenizer * add qwen2.5 vl support * mv qwen.hpp -> qwenvl.hpp * add qwen image model * add qwen image t2i pipeline * fix qwen image flash attn * add qwen image i2i pipline * change encoding of vocab_qwen.hpp to utf8 * fix get_first_stage_encoding * apply jeffbolz f32 patch https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302 * fix the issue that occurs when using CUDA with k-quants weights * optimize the handling of the FeedForward precision fix * to_add_out precision fix * update docs	2025-10-12 23:23:19 +08:00
stduhpf	11f436c483	feat: add support for Flux Controls and Flex.2 (#692 )	2025-10-11 00:06:57 +08:00
leejet	35843c77ea	fix: optimize the handling of embedding weight (#859 )	2025-09-25 23:09:59 +08:00
leejet	0ebe6fe118	refactor: simplify the logic of pm id image loading (#827 )	2025-09-14 22:50:21 +08:00
leejet	52a97b3ac1	feat: add vace support (#819 ) * add wan vace t2v support * add --vace-strength option * add vace i2v support * fix the processing of vace_context * add vace v2v support * update docs	2025-09-14 16:57:33 +08:00
stduhpf	2c9b1e2594	feat: add VAE encoding tiling support and adaptive overlap (#484 ) * implement tiling vae encode support * Tiling (vae/upscale): adaptative overlap * Tiling: fix edge case * Tiling: fix crash when less than 2 tiles per dim * remove extra dot * Tiling: fix edge cases for adaptative overlap * tiling: fix edge case * set vae tile size via env var * vae tiling: refactor again, base on smaller buffer for alignment * Use bigger tiles for encode (to match compute buffer size) * Fix edge case when tile is bigger than latent * non-square VAE tiling (#3) * refactor tile number calculation * support non-square tiles * add env var to change tile overlap * add safeguards and better error messages for SD_TILE_OVERLAP * add safeguards and include overlapping factor for SD_TILE_SIZE * avoid rounding issues when specifying SD_TILE_SIZE as a factor * lower SD_TILE_OVERLAP limit * zero-init empty output buffer * Fix decode latent size * fix encode * tile size params instead of env * Tiled vae parameter validation (#6) * avoid crash with invalid tile sizes, use 0 for default * refactor default tile size, limit overlap factor * remove explicit parameter for relative tile size * limit encoding tile to latent size * unify code style and format code * update docs * fix get_tile_sizes in decode_first_stage --------- Co-authored-by: Wagner Bruna <wbruna@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 16:00:29 +08:00
clibdev	87cdbd5978	feat: use log_printf to print ggml logs (#545 )	2025-09-11 22:16:05 +08:00
leejet	f8fe4e7db9	fix: add flash attn support check (#803 )	2025-09-07 21:29:06 +08:00
leejet	cb1d975e96	feat: add wan2.1/2.2 support (#778 ) * add wan vae suppport * add wan model support * add umt5 support * add wan2.1 t2i support * make flash attn work with wan * make wan a little faster * add wan2.1 t2v support * add wan gguf support * add offload params to cpu support * add wan2.1 i2v support * crop image before resize * set default fps to 16 * add diff lora support * fix wan2.1 i2v * introduce sd_sample_params_t * add wan2.2 t2v support * add wan2.2 14B i2v support * add wan2.2 ti2v support * add high noise lora support * sync: update ggml submodule url * avoid build failure on linux * avoid build failure * update ggml * update ggml * fix sd_version_is_wan * update ggml, fix cpu im2col_3d * fix ggml_nn_attention_ext mask * add cache support to ggml runner * fix the issue of illegal memory access * unify image loading processing * add wan2.1/2.2 FLF2V support * fix end_image mask * update to latest ggml * add GGUFReader * update docs	2025-09-06 18:08:03 +08:00
Daniele	5b8996f74a	Conv2D direct support (#744 ) * Conv2DDirect for VAE stage * Enable only for Vulkan, reduced duplicated code * Cmake option to use conv2d direct * conv2d direct always on for opencl * conv direct as a flag * fix merge typo * Align conv2d behavior to flash attention's * fix readme * add conv2d direct for controlnet * add conv2d direct for esrgan * clean code, use enable_conv2d_direct/get_all_blocks * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-08-03 01:25:17 +08:00
Wagner Bruna	f7f05fb185	chore: avoid setting GGML_MAX_NAME when building against external ggml (#751 ) An external ggml will most likely have been built with the default GGML_MAX_NAME value (64), which would be inconsistent with the value set by our build (128). That would be an ODR violation, and it could easily cause memory corruption issues due to the different sizeof(struct ggml_tensor) values. For now, when linking against an external ggml, we demand it has been patched with a bigger GGML_MAX_NAME, since we can't check against a value defined only at build time.	2025-08-03 01:24:40 +08:00
leejet	f6b9aa1a43	refector: optimize the usage of tensor_types	2025-07-28 23:18:29 +08:00
leejet	eed97a5e1d	sync: update ggml	2025-07-24 23:04:08 +08:00
Erik Scholz	ab835f7d39	fix: correct head dim check and L_k padding of flash attention (#736 )	2025-07-24 00:57:45 +08:00
leejet	7dac89ad75	refector: reuse some code	2025-07-01 23:33:50 +08:00
stduhpf	ea46fd6948	fix: force zero-initialize output of tiling (#703 )	2025-07-01 23:01:29 +08:00
rmatif	d42fd59464	feat: add OpenCL backend support (#680 )	2025-06-30 23:32:23 +08:00
stduhpf	b1cc40c35c	feat: add Chroma support (#696 ) --------- Co-authored-by: Green Sky <Green-Sky@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2025-06-29 23:36:42 +08:00
stduhpf	69c73789fe	fix: force binary mask for inpaint models (#589 ) Co-authored-by: leejet <leejet714@gmail.com>	2025-02-22 21:29:57 +08:00
stduhpf	1be2491dcf	feat: partial LyCORIS support (tucker decomposition for LoCon + LoHa + LoKr) (#577 )	2025-02-22 21:19:26 +08:00
leejet	dcf91f9e0f	chore: change SD_CUBLAS/SD_USE_CUBLAS to SD_CUDA/SD_USE_CUDA	2024-12-28 13:27:51 +08:00
stduhpf	0d9d6659a7	fix: fix metal build (#513 )	2024-12-28 13:06:17 +08:00
stduhpf	8f4ab9add3	feat: support Inpaint models (#511 )	2024-12-28 13:04:49 +08:00
stduhpf	7ce63e740c	feat: flexible model architecture for dit models (Flux & SD3) (#490 ) * Refactor: wtype per tensor * Fix default args * refactor: fix flux * Refactor photmaker v2 support * unet: refactor the refactoring * Refactor: fix controlnet and tae * refactor: upscaler * Refactor: fix runtime type override * upscaler: use fp16 again * Refactor: Flexible sd3 arch * Refactor: Flexible Flux arch * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-11-30 14:18:53 +08:00
leejet	4570715727	fix: use ggml_nn_attention in vae	2024-11-24 18:21:31 +08:00
leejet	c3eeb669cd	sync: update ggml	2024-11-23 13:29:32 +08:00
Erik Scholz	1c168d98a5	fix: repair flash attention support (#386 ) * repair flash attention in _ext this does not fix the currently broken fa behind the define, which is only used by VAE Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> * make flash attention in the diffusion model a runtime flag no support for sd3 or video * remove old flash attention option and switch vae over to attn_ext * update docs * format code --------- Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2024-11-23 12:39:08 +08:00
bssrdf	2b1bc06477	feat: add PhotoMaker Version 2 support (#358 ) * first attempt at updating to photomaker v2 * continue adding photomaker v2 modules * finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file * added a name converter for Photomaker V2; build ok * more debugging underway * failing at cuda mat_mul * updated chunk_half to be more efficient; redo feedforward * fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op * redo weight calculation and weightv fixed a bug now Photomaker V2 kinds of working * add python script for face detection (Photomaker V2 needs) * updated readme for photomaker * fixed a bug causing PMV1 crashing; both V1 and V2 work * fixed clean_input_ids for PMV2 * fixed a double counting bug in tokenize_with_trigger_token * updated photomaker readme * removed some commented code * improved reconstructing class word free prompt * changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server * minor clean up --------- Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-11-23 11:50:14 +08:00
leejet	ac54e00760	feat: add sd3.5 support (#445 )	2024-10-24 21:58:03 +08:00

1 2

74 Commits