* global bool
* reworked circular to global flag
* cleaner implementation of tiling support in sd cpp
* cleaned rope
* working simplified but still need wraps
* Further clean of rope
* resolve flux conflict
* switch to pad op circular only
* Set ggml to most recent
* Revert ggml temp
* Update ggml to most recent
* Revert unneded flux change
* move circular flag to the GGMLRunnerContext
* Pass through circular param in all places where conv is called
* fix of constant and minor cleanup
* Added back --circular option
* Conv2d circular in vae and various models
* Fix temporal padding for qwen image and other vaes
* Z Image circular tiling
* x and y axis seamless only
* First attempt at chroma seamless x and y
* refactor into pure x and y, almost there
* Fix crash on chroma
* Refactor into cleaner variable choices
* Removed redundant set_circular_enabled
* Sync ggml
* simplify circular parameter
* format code
* no need to perform circular pad on the clip
* simplify circular_axes setting
* unify function naming
* remove unnecessary member variables
* simplify rope
---------
Co-authored-by: Phylliida <phylliidadev@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* add wan vae suppport
* add wan model support
* add umt5 support
* add wan2.1 t2i support
* make flash attn work with wan
* make wan a little faster
* add wan2.1 t2v support
* add wan gguf support
* add offload params to cpu support
* add wan2.1 i2v support
* crop image before resize
* set default fps to 16
* add diff lora support
* fix wan2.1 i2v
* introduce sd_sample_params_t
* add wan2.2 t2v support
* add wan2.2 14B i2v support
* add wan2.2 ti2v support
* add high noise lora support
* sync: update ggml submodule url
* avoid build failure on linux
* avoid build failure
* update ggml
* update ggml
* fix sd_version_is_wan
* update ggml, fix cpu im2col_3d
* fix ggml_nn_attention_ext mask
* add cache support to ggml runner
* fix the issue of illegal memory access
* unify image loading processing
* add wan2.1/2.2 FLF2V support
* fix end_image mask
* update to latest ggml
* add GGUFReader
* update docs
* add flux support
* avoid build failures in non-CUDA environments
* fix schnell support
* add k quants support
* add support for applying lora to quantized tensors
* add inplace conversion support for f8_e4m3 (#359)
in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16
* add xlabs flux comfy converted lora support
* update docs
---------
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
Added NVIDEA's new "Align Your Steps" style scheduler in accordance with their
quick start guide. Currently has handling for SD1.5, SDXL, and SVD, using the
noise levels from their paper to generate the sigma values. Can be selected
using the --schedule ays command line switch. Updates the main.cpp help
message and README to reflect this option, also they now inform the user
of the --color switch as well.
---------
Co-authored-by: leejet <leejet714@gmail.com>