stable-diffusion.cpp

mirror of https://github.com/leejet/stable-diffusion.cpp.git synced 2025-12-12 21:38:58 +00:00

Author	SHA1	Message	Date
Wagner Bruna	199e675cc7	feat: support for --tensor-type-rules on generation modes (#932 )	2025-11-16 17:07:32 +08:00
leejet	694f0d9235	refactor: optimize the logic for name conversion and the processing of the LoRA model (#955 )	2025-11-10 00:12:20 +08:00
akleine	d2d3944f50	feat: add support for SD2.x with TINY U-Nets (#939 )	2025-11-09 22:47:37 +08:00
leejet	8f6c5c217b	refactor: simplify the model loading logic (#933 ) * remove String2GGMLType * remove preprocess_tensor * fix clip init * simplify the logic for reading weights	2025-11-03 21:21:34 +08:00
leejet	9e28be6479	feat: add chroma radiance support (#910 ) * add chroma radiance support * fix ci * simply generate_init_latent * workaround: avoid ggml cuda error * format code * add chroma radiance doc	2025-10-25 23:56:14 +08:00
akleine	062490aa7c	feat: add SSD1B and tiny-sd support (#897 ) * feat: add code and doc for running SSD1B models * Added some more lines to support SD1.x with TINY U-Nets too. * support SSD-1B.safetensors * fix sdv1.5 diffusers format loader --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-10-25 23:35:54 +08:00
leejet	d05e46ca5e	chore: add .clang-tidy configuration and apply modernize checks (#902 )	2025-10-18 23:23:40 +08:00
leejet	db6f4791b4	feat: add wtype stat (#899 )	2025-10-17 23:40:32 +08:00
leejet	2e9242e37f	feat: add Qwen Image Edit support (#877 ) * add ref latent support for qwen image * optimize clip_preprocess and fix get_first_stage_encoding * add qwen2vl vit support * add qwen image edit support * fix qwen image edit pipeline * add mmproj file support * support dynamic number of Qwen image transformer blocks * set prompt_template_encode_start_idx every time * to_add_out precision fix * to_out.0 precision fix * update docs	2025-10-13 23:17:18 +08:00
leejet	beb99a2de2	feat: add Qwen Image support (#851 ) * add qwen tokenizer * add qwen2.5 vl support * mv qwen.hpp -> qwenvl.hpp * add qwen image model * add qwen image t2i pipeline * fix qwen image flash attn * add qwen image i2i pipline * change encoding of vocab_qwen.hpp to utf8 * fix get_first_stage_encoding * apply jeffbolz f32 patch https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302 * fix the issue that occurs when using CUDA with k-quants weights * optimize the handling of the FeedForward precision fix * to_add_out precision fix * update docs	2025-10-12 23:23:19 +08:00
stduhpf	11f436c483	feat: add support for Flux Controls and Flex.2 (#692 )	2025-10-11 00:06:57 +08:00
leejet	2abe9451c4	fix: optimize the handling of CLIP embedding weight (#840 )	2025-09-25 00:28:20 +08:00
Wagner Bruna	f3140eadbb	fix: tensor loading thread count (#854 )	2025-09-25 00:26:38 +08:00
rmatif	1e0d2821bb	fix: correct tensor deduplication logic (#844 )	2025-09-24 23:22:40 +08:00
leejet	567f9f14f0	fix: avoid multithreading issues in the model loader	2025-09-18 00:00:15 +08:00
rmatif	55c2e05d98	feat: optimize tensor loading time (#790 ) * opt tensor loading * fix build failure * revert the changes * allow the use of n_threads * fix lora loading * optimize lora loading * add mutex * use atomic * fix build * fix potential duplicate issue * avoid duplicate lookup of lora tensor * fix progeress bar * remove unused remove_duplicates --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-09-14 22:48:35 +08:00
Erik Scholz	9e7befa320	fix: harden for large files (#643 )	2025-09-14 12:44:19 +08:00
Wagner Bruna	5869987fe4	fix: make weight override more robust against ggml changes (#760 )	2025-09-14 12:15:53 +08:00
leejet	c648001030	feat: add detailed tensor loading time stat (#793 )	2025-09-07 22:51:44 +08:00
leejet	cb1d975e96	feat: add wan2.1/2.2 support (#778 ) * add wan vae suppport * add wan model support * add umt5 support * add wan2.1 t2i support * make flash attn work with wan * make wan a little faster * add wan2.1 t2v support * add wan gguf support * add offload params to cpu support * add wan2.1 i2v support * crop image before resize * set default fps to 16 * add diff lora support * fix wan2.1 i2v * introduce sd_sample_params_t * add wan2.2 t2v support * add wan2.2 14B i2v support * add wan2.2 ti2v support * add high noise lora support * sync: update ggml submodule url * avoid build failure on linux * avoid build failure * update ggml * update ggml * fix sd_version_is_wan * update ggml, fix cpu im2col_3d * fix ggml_nn_attention_ext mask * add cache support to ggml runner * fix the issue of illegal memory access * unify image loading processing * add wan2.1/2.2 FLF2V support * fix end_image mask * update to latest ggml * add GGUFReader * update docs	2025-09-06 18:08:03 +08:00
Wagner Bruna	eea77cbad9	feat: throttle model loading progress updates (#782 ) Some terminals have slow display latency, so frequent output during model loading can actually slow down the process. Also, since tensor loading times can vary a lot, the progress display now shows the average across past iterations instead of just the last one.	2025-09-01 21:32:01 +08:00
leejet	f6b9aa1a43	refector: optimize the usage of tensor_types	2025-07-28 23:18:29 +08:00
leejet	bd1eaef93e	fix: convert f64 to f32 and i64 to i32 when loading weights	2025-07-24 00:59:38 +08:00
stduhpf	a772dca27a	feat: add Instruct-Pix2pix/CosXL-Edit support (#679 ) * Instruct-p2p support * support 2 conditionings cfg * Do not re-encode the exact same image twice * fixes for 2-cfg * Fix pix2pix latent inputs + improve inpainting a bit + fix naming * prepare for other pix2pix-like models * Support sdxl ip2p * fix reference image embeddings * Support 2-cond cfg properly in cli * fix typo in help * Support masks for ip2p models * unify code style * delete unused code * use edit mode * add img_cond * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2025-07-12 15:36:45 +08:00
Wagner Bruna	6d84a30c66	feat: overriding quant types for specific tensors on model conversion (#724 )	2025-07-08 00:11:38 +08:00
stduhpf	dafc32d0dd	feat: add support for f64/i64 and clip_g diffusers model (#681 )	2025-07-06 23:24:55 +08:00
idostyle	225162f270	fix: mark encoder.embed_tokens.weight as unused tensor (#721 )	2025-07-06 23:10:10 +08:00
stduhpf	19fbfd8639	feat: override text encoders for unet models (#682 )	2025-07-04 22:19:47 +08:00
vmobilis	3bae667f3d	fix: break the line after skipping tensors in VAE (#591 )	2025-07-03 22:50:42 +08:00
stduhpf	83ef4e44ce	feat: add T5 with llama.cpp naming convention support (#654 )	2025-07-02 23:13:00 +08:00
rmatif	d42fd59464	feat: add OpenCL backend support (#680 )	2025-06-30 23:32:23 +08:00
idostyle	d7c7a34712	fix: ModelLoader::load_tensors duplicated check (#623 ) Introduced in 2b6ec97fe244d03c40aa8d70131d40bb086099b0	2025-03-09 12:23:23 +08:00
stduhpf	85e9a12988	fix: preprocess tensor names in tensor types map (#607 ) Thank you for your contribution	2025-03-01 11:48:04 +08:00
stduhpf	348a54e34a	feat: use pretty-progress for tensor loading (#516 )	2024-12-28 13:14:52 +08:00
stduhpf	8f4ab9add3	feat: support Inpaint models (#511 )	2024-12-28 13:04:49 +08:00
stduhpf	7ce63e740c	feat: flexible model architecture for dit models (Flux & SD3) (#490 ) * Refactor: wtype per tensor * Fix default args * refactor: fix flux * Refactor photmaker v2 support * unet: refactor the refactoring * Refactor: fix controlnet and tae * refactor: upscaler * Refactor: fix runtime type override * upscaler: use fp16 again * Refactor: Flexible sd3 arch * Refactor: Flexible Flux arch * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-11-30 14:18:53 +08:00
leejet	c3eeb669cd	sync: update ggml	2024-11-23 13:29:32 +08:00
Erik Scholz	1c168d98a5	fix: repair flash attention support (#386 ) * repair flash attention in _ext this does not fix the currently broken fa behind the define, which is only used by VAE Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> * make flash attention in the diffusion model a runtime flag no support for sd3 or video * remove old flash attention option and switch vae over to attn_ext * update docs * format code --------- Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com> Co-authored-by: leejet <leejet714@gmail.com>	2024-11-23 12:39:08 +08:00
bssrdf	2b1bc06477	feat: add PhotoMaker Version 2 support (#358 ) * first attempt at updating to photomaker v2 * continue adding photomaker v2 modules * finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file * added a name converter for Photomaker V2; build ok * more debugging underway * failing at cuda mat_mul * updated chunk_half to be more efficient; redo feedforward * fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op * redo weight calculation and weightv fixed a bug now Photomaker V2 kinds of working * add python script for face detection (Photomaker V2 needs) * updated readme for photomaker * fixed a bug causing PMV1 crashing; both V1 and V2 work * fixed clean_input_ids for PMV2 * fixed a double counting bug in tokenize_with_trigger_token * updated photomaker readme * removed some commented code * improved reconstructing class word free prompt * changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server * minor clean up --------- Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-11-23 11:50:14 +08:00
LostRuins Concedo	8f94efafa3	feat: add support for loading F8_E5M2 weights (#460 )	2024-11-23 11:45:11 +08:00
stduhpf	6ea812256e	feat: add flux 1 lite 8B (freepik) support (#474 ) * Flux Lite (Freepik) support * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-11-23 11:41:30 +08:00
stduhpf	65fa646684	feat: add sd3.5 medium and skip layer guidance support (#451 ) * mmdit-x * add support for sd3.5 medium * add skip layer guidance support (mmdit only) * ignore slg if slg_scale is zero (optimization) * init out_skip once * slg support for flux (expermiental) * warn if version doesn't support slg * refactor slg cli args * set default slg_scale to 0 (oops) * format code --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-11-23 11:15:31 +08:00
leejet	ac54e00760	feat: add sd3.5 support (#445 )	2024-10-24 21:58:03 +08:00
soham	2027b16fda	feat: add vulkan backend support (#291 ) * Fix includes and init vulkan the same as llama.cpp * Add Windows Vulkan CI * Updated ggml submodule * support epsilon as a parameter for ggml_group_norm --------- Co-authored-by: Cloudwalk <cloudwalk@icculus.org> Co-authored-by: Oleg Skutte <00.00.oleg.00.00@gmail.com> Co-authored-by: leejet <leejet714@gmail.com>	2024-08-27 23:56:09 +08:00
leejet	5c561eab31	feat: do not convert more flux tensors	2024-08-25 16:01:36 +08:00
leejet	1bdc767aaf	feat: force using f32 for some layers	2024-08-25 13:53:16 +08:00
leejet	79c9fe9556	feat: do not convert some tensors	2024-08-25 13:37:37 +08:00
leejet	c837c5d9cc	style: format code	2024-08-25 00:19:37 +08:00
leejet	64d231f384	feat: add flux support (#356 ) * add flux support * avoid build failures in non-CUDA environments * fix schnell support * add k quants support * add support for applying lora to quantized tensors * add inplace conversion support for f8_e4m3 (#359) in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16 * add xlabs flux comfy converted lora support * update docs --------- Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>	2024-08-24 14:29:52 +08:00
leejet	4a6e36edc5	sync: update ggml	2024-07-28 18:30:35 +08:00

1 2

74 Commits