* add z-image support
* use flux_latent_rgb_proj for z-image
* fix qwen3 rope type
* add support for qwen3 4b gguf
* add support for diffusers format lora
* fix nan issue that occurs when using CUDA with k-quants weights
* add z-image docs
* add ref latent support for qwen image
* optimize clip_preprocess and fix get_first_stage_encoding
* add qwen2vl vit support
* add qwen image edit support
* fix qwen image edit pipeline
* add mmproj file support
* support dynamic number of Qwen image transformer blocks
* set prompt_template_encode_start_idx every time
* to_add_out precision fix
* to_out.0 precision fix
* update docs
* add wan vace t2v support
* add --vace-strength option
* add vace i2v support
* fix the processing of vace_context
* add vace v2v support
* update docs
* add wan vae suppport
* add wan model support
* add umt5 support
* add wan2.1 t2i support
* make flash attn work with wan
* make wan a little faster
* add wan2.1 t2v support
* add wan gguf support
* add offload params to cpu support
* add wan2.1 i2v support
* crop image before resize
* set default fps to 16
* add diff lora support
* fix wan2.1 i2v
* introduce sd_sample_params_t
* add wan2.2 t2v support
* add wan2.2 14B i2v support
* add wan2.2 ti2v support
* add high noise lora support
* sync: update ggml submodule url
* avoid build failure on linux
* avoid build failure
* update ggml
* update ggml
* fix sd_version_is_wan
* update ggml, fix cpu im2col_3d
* fix ggml_nn_attention_ext mask
* add cache support to ggml runner
* fix the issue of illegal memory access
* unify image loading processing
* add wan2.1/2.2 FLF2V support
* fix end_image mask
* update to latest ggml
* add GGUFReader
* update docs
* add flux support
* avoid build failures in non-CUDA environments
* fix schnell support
* add k quants support
* add support for applying lora to quantized tensors
* add inplace conversion support for f8_e4m3 (#359)
in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16
* add xlabs flux comfy converted lora support
* update docs
---------
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
* first efforts at implementing photomaker; lots more to do
* added PhotoMakerIDEncoder model in SD
* fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers
* added input id image loading
* added preprocessing inpit id images
* finished get_num_tensors
* fixed a bug in remove_duplicates
* add a get_learned_condition_with_trigger function to do photomaker stuff
* add a convert_token_to_id function for photomaker to extract trigger word's token id
* making progress; need to implement tokenizer decoder
* making more progress; finishing vision model forward
* debugging vision_model outputs
* corrected clip vision model output
* continue making progress in id fusion process
* finished stacked id embedding; to be tested
* remove garbage file
* debuging graph compute
* more progress; now alloc buffer failed
* fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated)
* added delayed subject conditioning; now photomaker runs and generates images
* fixed stat_merge_step
* added photomaker lora model (to be tested)
* reworked pmid lora
* finished applying pmid lora; to be tested
* finalized pmid lora
* add a few print tensor; tweak in sample again
* small tweak; still not getting ID faces
* fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results
* disable pmid lora apply for now; 1 input image seems working; > 1 not working
* turn pmid lora apply back on
* fixed a decode bug
* fixed a bug in ggml's conv_2d, and now > 1 input images working
* add style_ratio as a cli param; reworked encode with trigger for attention weights
* merge commit fixing lora free param buffer error
* change default style ratio to 10%
* added an option to offload vae decoder to CPU for mem-limited gpus
* removing image normalization step seems making ID fidelity much higher
* revert default style ratio back ro 20%
* added an option for normalizing input ID images; cleaned up debugging code
* more clean up
* fixed bugs; now failed with cuda error; likely out-of-mem on GPU
* free pmid model params when required
* photomaker working properly now after merging and adapting to GGMLBlock API
* remove tensor renaming; fixing names in the photomaker model file
* updated README.md to include instructions and notes for running PhotoMaker
* a bit clean up
* remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update
* add input image requirement in README
* bring back freeing pmid lora params buffer; simply pooled output of CLIPvision
* remove MultiheadAttention2; customized MultiheadAttention
* added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images
* update docs
* fix ci error
* make stable-diffusion.h a pure c header file
This reverts commit 27887b630db6a92f269f0aef8de9bc9832ab50a9.
* fix ci error
* format code
* reuse get_learned_condition
* reuse pad_tokens
* reuse CLIPVisionModel
* reuse LoraModel
* add --clip-on-cpu
* fix lora name conversion for SDXL
---------
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* add controlnet to pipeline
* add cli params
* control strength cli param
* cli param keep controlnet in cpu
* add Textual Inversion
* add canny preprocessor
* refactor: change ggml_type_sizef to ggml_row_size
* process hint once time
* ignore the embedding name case
---------
Co-authored-by: leejet <leejet714@gmail.com>