* Conv2DDirect for VAE stage
* Enable only for Vulkan, reduced duplicated code
* Cmake option to use conv2d direct
* conv2d direct always on for opencl
* conv direct as a flag
* fix merge typo
* Align conv2d behavior to flash attention's
* fix readme
* add conv2d direct for controlnet
* add conv2d direct for esrgan
* clean code, use enable_conv2d_direct/get_all_blocks
* format code
---------
Co-authored-by: leejet <leejet714@gmail.com>
* add flux support
* avoid build failures in non-CUDA environments
* fix schnell support
* add k quants support
* add support for applying lora to quantized tensors
* add inplace conversion support for f8_e4m3 (#359)
in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16
* add xlabs flux comfy converted lora support
* update docs
---------
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
* add controlnet to pipeline
* add cli params
* control strength cli param
* cli param keep controlnet in cpu
* add Textual Inversion
* add canny preprocessor
* refactor: change ggml_type_sizef to ggml_row_size
* process hint once time
* ignore the embedding name case
---------
Co-authored-by: leejet <leejet714@gmail.com>