stable-diffusion.cpp

mirror of https://github.com/leejet/stable-diffusion.cpp.git synced 2026-02-04 10:53:34 +00:00

Author	SHA1	Message	Date
leejet	e411520407	docs: add z-image-base example	2026-01-28 21:47:36 +08:00
leejet	9565c7f6bd	add support for flux2 klein (#1193 ) * add support for flux2 klein 4b * add support for flux2 klein 8b * use attention_mask in Flux.2 klein LLMEmbedder * update docs	2026-01-18 01:17:33 +08:00
leejet	a0adcfb148	feat: add support for qwen image edit 2511 (#1096 )	2025-12-24 23:00:08 +08:00
leejet	bf1a388b44	docs: update logo	2025-12-07 15:09:32 +08:00
leejet	c9005337a8	docs: update logo	2025-12-07 14:56:21 +08:00
leejet	2f0bd31a84	feat: add ovis image support (#1057 )	2025-12-07 12:32:56 +08:00
leejet	34a6fd4e60	feat: add z-image support (#1020 ) * add z-image support * use flux_latent_rgb_proj for z-image * fix qwen3 rope type * add support for qwen3 4b gguf * add support for diffusers format lora * fix nan issue that occurs when using CUDA with k-quants weights * add z-image docs	2025-12-01 22:39:43 +08:00
leejet	52b67c538b	feat: add flux2 support (#1016 ) * add flux2 support * rename qwenvl to llm * add Flux2FlowDenoiser * update docs	2025-11-30 11:32:56 +08:00
leejet	9e28be6479	feat: add chroma radiance support (#910 ) * add chroma radiance support * fix ci * simply generate_init_latent * workaround: avoid ggml cuda error * format code * add chroma radiance doc	2025-10-25 23:56:14 +08:00
leejet	2e9242e37f	feat: add Qwen Image Edit support (#877 ) * add ref latent support for qwen image * optimize clip_preprocess and fix get_first_stage_encoding * add qwen2vl vit support * add qwen image edit support * fix qwen image edit pipeline * add mmproj file support * support dynamic number of Qwen image transformer blocks * set prompt_template_encode_start_idx every time * to_add_out precision fix * to_out.0 precision fix * update docs	2025-10-13 23:17:18 +08:00
leejet	beb99a2de2	feat: add Qwen Image support (#851 ) * add qwen tokenizer * add qwen2.5 vl support * mv qwen.hpp -> qwenvl.hpp * add qwen image model * add qwen image t2i pipeline * fix qwen image flash attn * add qwen image i2i pipline * change encoding of vocab_qwen.hpp to utf8 * fix get_first_stage_encoding * apply jeffbolz f32 patch https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302 * fix the issue that occurs when using CUDA with k-quants weights * optimize the handling of the FeedForward precision fix * to_add_out precision fix * update docs	2025-10-12 23:23:19 +08:00
leejet	52a97b3ac1	feat: add vace support (#819 ) * add wan vace t2v support * add --vace-strength option * add vace i2v support * fix the processing of vace_context * add vace v2v support * update docs	2025-09-14 16:57:33 +08:00
leejet	cb1d975e96	feat: add wan2.1/2.2 support (#778 ) * add wan vae suppport * add wan model support * add umt5 support * add wan2.1 t2i support * make flash attn work with wan * make wan a little faster * add wan2.1 t2v support * add wan gguf support * add offload params to cpu support * add wan2.1 i2v support * crop image before resize * set default fps to 16 * add diff lora support * fix wan2.1 i2v * introduce sd_sample_params_t * add wan2.2 t2v support * add wan2.2 14B i2v support * add wan2.2 ti2v support * add high noise lora support * sync: update ggml submodule url * avoid build failure on linux * avoid build failure * update ggml * update ggml * fix sd_version_is_wan * update ggml, fix cpu im2col_3d * fix ggml_nn_attention_ext mask * add cache support to ggml runner * fix the issue of illegal memory access * unify image loading processing * add wan2.1/2.2 FLF2V support * fix end_image mask * update to latest ggml * add GGUFReader * update docs	2025-09-06 18:08:03 +08:00
leejet	d6c87dce5c	docs: add chroma doc	2025-06-29 23:58:15 +08:00
leejet	884e23eeeb	docs: add kontext doc	2025-06-29 10:35:31 +08:00
leejet	ac54e00760	feat: add sd3.5 support (#445 )	2024-10-24 21:58:03 +08:00
zhentaoyu	e410aeb534	sync: update ggml to fix large image generation with SYCL backend (#380 ) * turn off fast-math on host in SYCL backend Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update ggml for sync some sycl ops Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update sycl readme and ggml Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-09-02 22:29:35 +08:00
leejet	28a614769a	docs: update docs/flux.md	2024-08-25 13:11:34 +08:00
leejet	64d231f384	feat: add flux support (#356 ) * add flux support * avoid build failures in non-CUDA environments * fix schnell support * add k quants support * add support for applying lora to quantized tensors * add inplace conversion support for f8_e4m3 (#359) in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16 * add xlabs flux comfy converted lora support * update docs --------- Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>	2024-08-24 14:29:52 +08:00
zhentaoyu	697d000f49	feat: add SYCL Backend Support for Intel GPUs (#330 ) * update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase and clean code Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * add sycl in README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase ggml commit Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * refine README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-08-10 13:42:50 +08:00
leejet	73c2176648	feat: add sd3 support (#298 )	2024-07-28 15:44:08 +08:00
bssrdf	a469688e30	feat: add TencentARC PhotoMaker support (#179 ) * first efforts at implementing photomaker; lots more to do * added PhotoMakerIDEncoder model in SD * fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers * added input id image loading * added preprocessing inpit id images * finished get_num_tensors * fixed a bug in remove_duplicates * add a get_learned_condition_with_trigger function to do photomaker stuff * add a convert_token_to_id function for photomaker to extract trigger word's token id * making progress; need to implement tokenizer decoder * making more progress; finishing vision model forward * debugging vision_model outputs * corrected clip vision model output * continue making progress in id fusion process * finished stacked id embedding; to be tested * remove garbage file * debuging graph compute * more progress; now alloc buffer failed * fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated) * added delayed subject conditioning; now photomaker runs and generates images * fixed stat_merge_step * added photomaker lora model (to be tested) * reworked pmid lora * finished applying pmid lora; to be tested * finalized pmid lora * add a few print tensor; tweak in sample again * small tweak; still not getting ID faces * fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results * disable pmid lora apply for now; 1 input image seems working; > 1 not working * turn pmid lora apply back on * fixed a decode bug * fixed a bug in ggml's conv_2d, and now > 1 input images working * add style_ratio as a cli param; reworked encode with trigger for attention weights * merge commit fixing lora free param buffer error * change default style ratio to 10% * added an option to offload vae decoder to CPU for mem-limited gpus * removing image normalization step seems making ID fidelity much higher * revert default style ratio back ro 20% * added an option for normalizing input ID images; cleaned up debugging code * more clean up * fixed bugs; now failed with cuda error; likely out-of-mem on GPU * free pmid model params when required * photomaker working properly now after merging and adapting to GGMLBlock API * remove tensor renaming; fixing names in the photomaker model file * updated README.md to include instructions and notes for running PhotoMaker * a bit clean up * remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update * add input image requirement in README * bring back freeing pmid lora params buffer; simply pooled output of CLIPvision * remove MultiheadAttention2; customized MultiheadAttention * added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images * update docs * fix ci error * make stable-diffusion.h a pure c header file This reverts commit 27887b630db6a92f269f0aef8de9bc9832ab50a9. * fix ci error * format code * reuse get_learned_condition * reuse pad_tokens * reuse CLIPVisionModel * reuse LoraModel * add --clip-on-cpu * fix lora name conversion for SDXL --------- Co-authored-by: bssrdf <bssrdf@gmail.com> Co-authored-by: leejet <leejet714@gmail.com>	2024-03-12 23:15:17 +08:00
Steward Garcia	36ec16ac99	feat: Control Net support + Textual Inversion (embeddings) (#131 ) * add controlnet to pipeline * add cli params * control strength cli param * cli param keep controlnet in cpu * add Textual Inversion * add canny preprocessor * refactor: change ggml_type_sizef to ggml_row_size * process hint once time * ignore the embedding name case --------- Co-authored-by: leejet <leejet714@gmail.com>	2024-01-29 22:38:51 +08:00
leejet	9a9f3daf8e	feat: add LoRA support	2023-11-19 17:43:49 +08:00
leejet	58735a2813	feat: add img2img mode (#5 )	2023-08-16 01:48:07 +08:00
leejet	3aca342e60	Initial commit	2023-08-13 16:00:22 +08:00

26 Commits