* global bool
* reworked circular to global flag
* cleaner implementation of tiling support in sd cpp
* cleaned rope
* working simplified but still need wraps
* Further clean of rope
* resolve flux conflict
* switch to pad op circular only
* Set ggml to most recent
* Revert ggml temp
* Update ggml to most recent
* Revert unneded flux change
* move circular flag to the GGMLRunnerContext
* Pass through circular param in all places where conv is called
* fix of constant and minor cleanup
* Added back --circular option
* Conv2d circular in vae and various models
* Fix temporal padding for qwen image and other vaes
* Z Image circular tiling
* x and y axis seamless only
* First attempt at chroma seamless x and y
* refactor into pure x and y, almost there
* Fix crash on chroma
* Refactor into cleaner variable choices
* Removed redundant set_circular_enabled
* Sync ggml
* simplify circular parameter
* format code
* no need to perform circular pad on the clip
* simplify circular_axes setting
* unify function naming
* remove unnecessary member variables
* simplify rope
---------
Co-authored-by: Phylliida <phylliidadev@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* introduce GGMLRunnerContext
* add Flash Attention enable control through GGMLRunnerContext
* add conv2d_direct enable control through GGMLRunnerContext
* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE
Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
* make flash attention in the diffusion model a runtime flag
no support for sd3 or video
* remove old flash attention option and switch vae over to attn_ext
* update docs
* format code
---------
Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
* add flux support
* avoid build failures in non-CUDA environments
* fix schnell support
* add k quants support
* add support for applying lora to quantized tensors
* add inplace conversion support for f8_e4m3 (#359)
in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16
* add xlabs flux comfy converted lora support
* update docs
---------
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
* add controlnet to pipeline
* add cli params
* control strength cli param
* cli param keep controlnet in cpu
* add Textual Inversion
* add canny preprocessor
* refactor: change ggml_type_sizef to ggml_row_size
* process hint once time
* ignore the embedding name case
---------
Co-authored-by: leejet <leejet714@gmail.com>