22 Commits

Author SHA1 Message Date
fszontagh
064001b524
perf: allocate CPU-offloaded params from runtime device pinned host buffer (#1601) 2026-06-06 16:22:18 +08:00
fszontagh
a7f2e03da4
perf: keep chunk-K residency engaged with runtime LoRA (#1598) 2026-06-03 23:12:00 +08:00
fszontagh
ed74577c40
feat: --stream-layers for streaming weights from CPU during generation (#1576) 2026-06-02 22:35:28 +08:00
Wagner Bruna
02f06370a7
refactor: call CPU backend functions dynamically (#1591)
Co-authored-by: leejet <leejet714@gmail.com>
2026-06-01 23:41:21 +08:00
leejet
20901f6d8e
fix: remove kv padding from flash attention wrapper (#1453) 2026-05-31 23:23:19 +08:00
stduhpf
a397e03488
feat: add Longcat-Image / Longcat-Image-Edit support (#1053)
Co-authored-by: leejet <leejet714@gmail.com>
2026-05-24 02:02:02 +08:00
stduhpf
adaa599a3b
Feat: Temporal tile custom size with overlap (#1510)
* Temporal tile size + overlap

* add --extra-tiling-args support

---------

Co-authored-by: leejet <leejet714@gmail.com>
2026-05-21 23:44:12 +08:00
stduhpf
47d8198b69
feat: add taeltx2_3_wide support (#1535) 2026-05-21 22:34:12 +08:00
leejet
67dda3f897
feat: add ltx2.3 support (#1463)
* add GemmaTokenizer

* add basic ltx2.3 support

* change vocab file encoding

* fix ci

* fix ubuntu build

* add temporal tiling support

* add ltx audio support

* update ggml submodule url

* fix generate_video

* add i2v support

* minify bundled Gemma tokenizer vocab sources

* pass video fps into temporal rope embeddings

* fix av_ca_timestep_scale_multiplier

* add LTX2Scheduler support

* update docs

* fix ci
2026-05-17 16:46:20 +08:00
leejet
36330724bd
feat: add module backend assignment support (#1500)
Co-authored-by: Stéphane du Hamel <stephduh@live.fr>
2026-05-16 20:27:06 +08:00
Wagner Bruna
686856edca
chore: do not report the fake VAE "allocation" as an error (#1494) 2026-05-16 16:08:31 +08:00
leejet
0665a7f8bf
feat: add hidream o1 image support (#1485) 2026-05-15 00:40:21 +08:00
Wagner Bruna
57ff2eb0f4
feat: support for memory-mapping model weights (#1414)
Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
Co-authored-by: Junmo Kim <me@junmo.kim>
Co-authored-by: leejet <leejet714@gmail.com>
2026-05-15 00:30:03 +08:00
leejet
90e87bc846
feat: add max-vram based segmented param offload (#1476) 2026-05-06 21:56:02 +08:00
Wagner Bruna
b8079e253d
feat: transition from compile-time to runtime backend discovery (#1448)
Co-authored-by: Stéphane du Hamel <stephduh@live.fr>
Co-authored-by: Cyberhan123 <255542417@qq.com>
Co-authored-by: leejet <leejet714@gmail.com>
2026-04-29 23:26:57 +08:00
akleine
970c4a3312
chore: replace some NULL with nullptr + use "%zu" for printing some size_t data (#1457) 2026-04-27 22:42:57 +08:00
leejet
f16a110f87
refactor: migrate generation pipeline to sd::Tensor (#1373) 2026-03-30 00:19:25 +08:00
leejet
84cbd88df1
style: remove redundant struct qualifiers for consistent C/C++ type usage (#1349) 2026-03-16 22:17:22 +08:00
leejet
acc3bf1fdc
refactor: optimize the VAE architecture (#1345) 2026-03-15 16:57:42 +08:00
stduhpf
3d33caaef8
fix: make tiling work better when using circular (#1299) 2026-03-08 00:25:07 +08:00
leejet
ba35dd734e
refactor: introduce ggml_ext_zeros_like/ggml_ext_ones_like (#1312) 2026-03-04 00:36:52 +08:00
leejet
28ef93c0e1
refactor: reorganize the file structure (#1266) 2026-02-10 23:13:35 +08:00