feat: support disk params backend (#1651)

2026-06-19 04:37:18 +00:00 · 2026-06-14 14:48:50 +08:00 · 2026-06-14 14:48:50 +08:00 · bdb431ad95
commit bdb431ad95
parent 276025e054
27 changed files with 134 additions and 65 deletions
--- a/docs/backend.md
+++ b/docs/backend.md
@ -3,7 +3,7 @@
 `stable-diffusion.cpp` has two backend assignments:

 - `--backend` selects the runtime backend used to execute model graphs.
- `--params-backend` selects the backend used to allocate model parameters.
+- `--params-backend` selects where model parameters are kept.

 If `--params-backend` is not set, parameters use the same backend as their module runtime backend.

@ -29,6 +29,12 @@ The same syntax is used for parameter placement:
 sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
 ```

+`--params-backend` also accepts the special value `disk`:
+
+```shell
+sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
+```
+
 Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.

 `all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau

 The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.

+The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.
+
 ## Runtime backend vs. parameter backend

-The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.
+The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.

 For example:

@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu

 This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.

+For example:
+
+```shell
+sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
+```
+
+This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.
+
+`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.
+
 Per-module assignments can be mixed:

 ```shell
@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution.

 Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.

+`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.
+
 `SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.

 ## Compatibility flags
@ -113,10 +133,12 @@ The older CPU placement flags are still supported:

 `--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.

-`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:
+`--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing:

 ```shell
--params-backend cpu
+--params-backend '*=cpu'
 ```

+Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.
+
 Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
--- a/docs/performance.md
+++ b/docs/performance.md
@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log:

 Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.

+## Use params backend to reduce VRAM or RAM usage.
+
+`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM.
+
+Use CPU params to reduce VRAM usage:
+
+```shell
+--backend cuda0 --params-backend cpu
+```
+
+This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it:
+
+```shell
+--offload-to-cpu --params-backend te=disk
+```
+
+Use disk params to reduce both VRAM and RAM usage:
+
+```shell
+--backend cuda0 --params-backend disk
+```
+
+This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost.
+
+Per-module assignments can target only the largest modules:
+
+```shell
+--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu
+```
+
+See [backend selection](./backend.md) for full syntax.
+
 ## Use quantization to reduce memory usage.

-[quantization](./quantization_and_gguf.md)
+[quantization](./quantization_and_gguf.md)
--- a/examples/cli/main.cpp
+++ b/examples/cli/main.cpp
@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) {
        vae_decode_only = false;
    }

-    sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview);
+    sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview);

    SDImageVec results;
    int num_results             = 0;
--- a/examples/common/common.cpp
+++ b/examples/common/common.cpp
@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() {
         &backend},
        {"",
         "--params-backend",
-         "parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
+         "parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu",
         &params_backend},
    };

@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const {
    return oss.str();
 }

-sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) {
+sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) {
    embedding_vec.clear();
    embedding_vec.reserve(embedding_map.size());
    for (const auto& kv : embedding_map) {
@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
        photo_maker_path.c_str(),
        tensor_type_rules.c_str(),
        vae_decode_only,
-        free_params_immediately,
        n_threads,
        wtype,
        rng_type,
--- a/examples/common/common.h
+++ b/examples/common/common.h
@ -179,7 +179,7 @@ struct SDContextParams {
    bool validate(SDMode mode);
    bool resolve_and_validate(SDMode mode);
    std::string to_string() const;
-    sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview);
+    sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview);
 };

 struct SDGenerationParams {
--- a/examples/server/main.cpp
+++ b/examples/server/main.cpp
@ -85,7 +85,7 @@ int main(int argc, const char** argv) {
    LOG_DEBUG("%s", ctx_params.to_string().c_str());
    LOG_DEBUG("%s", default_gen_params.to_string().c_str());

-    sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
+    sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false);
    SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));

    if (sd_ctx == nullptr) {
--- a/include/stable-diffusion.h
+++ b/include/stable-diffusion.h
@ -197,7 +197,6 @@ typedef struct {
    const char* photo_maker_path;
    const char* tensor_type_rules;
    bool vae_decode_only;
-    bool free_params_immediately;
    int n_threads;
    enum sd_type_t wtype;
    enum rng_type_t rng_type;
--- a/src/core/ggml_extend_backend.cpp
+++ b/src/core/ggml_extend_backend.cpp
@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) {
    return lower.empty() || lower == "default" || lower == "auto";
 }

+static bool is_disk_backend_token(const std::string& name) {
+    return lower_copy(trim_copy(name)) == "disk";
+}
+
 static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
    std::string name = lower_copy(trim_copy(raw_name));
    name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) {
    if (name.empty()) {
        return runtime_backend(module);
    }
+    if (is_disk_backend_token(name)) {
+        return runtime_backend(module);
+    }
    return init_cached_backend(name);
 }

@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) {
    return sd_backend_is_cpu(params_backend(module));
 }

+bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const {
+    return is_disk_backend_token(params_assignment_.get(module));
+}
+
 bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
    ggml_backend_t backend = runtime_backend(module);
    if (backend == nullptr) {
@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu

 bool SDBackendManager::init(const char* backend_spec,
                            const char* params_backend_spec,
-                            bool offload_params_to_cpu,
                            bool keep_clip_on_cpu,
                            bool keep_vae_on_cpu,
                            bool keep_control_net_on_cpu,
@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec,
        }
    }

-    if (params_assignment_.empty() && offload_params_to_cpu) {
-        params_assignment_.set_default("cpu");
-    }
-
    return validate(error);
 }

 bool SDBackendManager::validate(std::string* error) const {
-    auto validate_name = [&](const std::string& name) -> bool {
+    auto validate_runtime_name = [&](const std::string& name) -> bool {
        if (is_default_backend_token(name)) {
            return true;
        }
+        if (is_disk_backend_token(name)) {
+            if (error != nullptr) {
+                *error = "backend 'disk' is only supported by params_backend";
+            }
+            return false;
+        }
        if (!sd_resolve_backend_name(name).empty()) {
            return true;
        }
@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const {
        }
        return false;
    };
+    auto validate_params_name = [&](const std::string& name) -> bool {
+        if (is_disk_backend_token(name)) {
+            return true;
+        }
+        return validate_runtime_name(name);
+    };

-    if (!validate_name(runtime_assignment_.default_name) ||
-        !validate_name(params_assignment_.default_name)) {
+    if (!validate_runtime_name(runtime_assignment_.default_name) ||
+        !validate_params_name(params_assignment_.default_name)) {
        return false;
    }
    for (const auto& kv : runtime_assignment_.module_names) {
-        if (!validate_name(kv.second)) {
+        if (!validate_runtime_name(kv.second)) {
            return false;
        }
    }
    for (const auto& kv : params_assignment_.module_names) {
-        if (!validate_name(kv.second)) {
+        if (!validate_params_name(kv.second)) {
            return false;
        }
    }
--- a/src/core/ggml_extend_backend.h
+++ b/src/core/ggml_extend_backend.h
@ -51,7 +51,6 @@ public:

    bool init(const char* backend_spec,
              const char* params_backend_spec,
-              bool offload_params_to_cpu,
              bool keep_clip_on_cpu,
              bool keep_vae_on_cpu,
              bool keep_control_net_on_cpu,
@ -63,6 +62,7 @@ public:

    bool runtime_backend_is_cpu(SDBackendModule module);
    bool params_backend_is_cpu(SDBackendModule module);
+    bool params_backend_is_disk(SDBackendModule module) const;
    bool runtime_backend_supports_host_buffer(SDBackendModule module);

 private:
--- a/src/model/adapter/lora.hpp
+++ b/src/model/adapter/lora.hpp
@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner {
        if (model_manager == nullptr ||
            !model_manager->register_param_tensors("LoRA",
                                                   std::move(tensors),
-                                                   ModelManager::ResidencyMode::Resident,
+                                                   ModelManager::ResidencyMode::ParamBackend,
                                                   runtime_backend,
                                                   params_backend) ||
            !model_manager->validate_registered_tensors()) {
--- a/src/model/adapter/pmid.hpp
+++ b/src/model/adapter/pmid.hpp
@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner {
        model_loader.load_tensors(on_new_tensor_cb);
        if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
                                                   tensors,
-                                                   ModelManager::ResidencyMode::Resident,
+                                                   ModelManager::ResidencyMode::ParamBackend,
                                                   runtime_backend,
                                                   params_backend) ||
            !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/control.hpp
+++ b/src/model/diffusion/control.hpp
@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner {
        manager->set_n_threads(n_threads);
        if (!manager->register_param_tensors("ControlNet",
                                             std::move(tensors),
-                                             ModelManager::ResidencyMode::Resident,
+                                             ModelManager::ResidencyMode::ParamBackend,
                                             runtime_backend,
                                             params_backend) ||
            !manager->validate_registered_tensors()) {
--- a/src/model/diffusion/flux.hpp
+++ b/src/model/diffusion/flux.hpp
@ -1609,7 +1609,7 @@ namespace Flux {
            if (!model_manager->register_runner_params("Flux test",
                                                       *flux,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/ltxv.hpp
+++ b/src/model/diffusion/ltxv.hpp
@ -2048,7 +2048,7 @@ namespace LTXV {
            if (!model_manager->register_runner_params("LTXAV test",
                                                       *ltxav,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/mmdit.hpp
+++ b/src/model/diffusion/mmdit.hpp
@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner {
            if (!model_manager->register_runner_params("MMDiT test",
                                                       *mmdit,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/qwen_image.hpp
+++ b/src/model/diffusion/qwen_image.hpp
@ -715,7 +715,7 @@ namespace Qwen {
            if (!model_manager->register_runner_params("Qwen image test",
                                                       *qwen_image,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/wan.hpp
+++ b/src/model/diffusion/wan.hpp
@ -1040,7 +1040,7 @@ namespace WAN {
            if (!model_manager->register_runner_params("Wan test",
                                                       *wan,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/diffusion/z_image.hpp
+++ b/src/model/diffusion/z_image.hpp
@ -723,7 +723,7 @@ namespace ZImage {
            if (!model_manager->register_runner_params("ZImage test",
                                                       *z_image,
                                                       "model.diffusion_model",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/te/llm.hpp
+++ b/src/model/te/llm.hpp
@ -2084,7 +2084,7 @@ namespace LLM {
            if (!model_manager->register_runner_params("LLM test",
                                                       *llm,
                                                       "text_encoders.llm",
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/te/t5.hpp
+++ b/src/model/te/t5.hpp
@ -592,7 +592,7 @@ struct T5Embedder {
        if (!model_manager->register_runner_params("T5 test",
                                                   *t5,
                                                   "",
-                                                   ModelManager::ResidencyMode::Resident,
+                                                   ModelManager::ResidencyMode::ParamBackend,
                                                   backend,
                                                   backend) ||
            !model_manager->validate_registered_tensors()) {
--- a/src/model/vae/ltx_audio_vae.hpp
+++ b/src/model/vae/ltx_audio_vae.hpp
@ -1082,7 +1082,7 @@ namespace LTXV {

            if (!model_manager->register_runner_params("LTX audio VAE test",
                                                       *ltx_audio_vae,
-                                                       ModelManager::ResidencyMode::Resident,
+                                                       ModelManager::ResidencyMode::ParamBackend,
                                                       backend,
                                                       backend) ||
                !model_manager->validate_registered_tensors()) {
--- a/src/model/vae/ltx_vae.hpp
+++ b/src/model/vae/ltx_vae.hpp
@ -1538,7 +1538,7 @@ struct LTXVideoVAE : public VAE {

        if (!model_manager->register_runner_params("LTX VAE test",
                                                   *vae,
-                                                   ModelManager::ResidencyMode::Resident,
+                                                   ModelManager::ResidencyMode::ParamBackend,
                                                   backend,
                                                   backend) ||
            !model_manager->validate_registered_tensors()) {
--- a/src/model/vae/wan_vae.hpp
+++ b/src/model/vae/wan_vae.hpp
@ -1340,7 +1340,7 @@ namespace WAN {

                if (!model_manager->register_runner_params("Wan VAE test",
                                                           *vae,
-                                                           ModelManager::ResidencyMode::Resident,
+                                                           ModelManager::ResidencyMode::ParamBackend,
                                                           backend,
                                                           backend) ||
                    !model_manager->validate_registered_tensors()) {
--- a/src/model_manager.cpp
+++ b/src/model_manager.cpp
@ -492,7 +492,7 @@ bool ModelManager::mmap_params(const std::vector<TensorState*>& states,
 }

 bool ModelManager::can_mmap_storage(const TensorState& state) const {
-    if (!enable_mmap_ || state.residency_mode != ResidencyMode::Resident) {
+    if (!enable_mmap_ || state.residency_mode != ResidencyMode::ParamBackend) {
        return false;
    }
    if (state.compute_backend == nullptr || state.params_backend == nullptr) {
--- a/src/model_manager.h
+++ b/src/model_manager.h
@ -16,7 +16,7 @@ class ModelManager : public RunnerWeightManager {
 public:
    enum class ResidencyMode {
        Disk,
-        Resident,
+        ParamBackend,
    };

    struct LoraSpec {
@ -33,7 +33,7 @@ private:
        ggml_tensor* tensor = nullptr;
        std::string desc;

-        ResidencyMode residency_mode   = ResidencyMode::Resident;
+        ResidencyMode residency_mode   = ResidencyMode::ParamBackend;
        ggml_backend_t compute_backend = nullptr;
        ggml_backend_t params_backend  = nullptr;
        bool metadata_validated        = false;
--- a/src/stable-diffusion.cpp
+++ b/src/stable-diffusion.cpp
@ -165,7 +165,6 @@ public:
    SDVersion version;
    bool vae_decode_only         = false;
    bool external_vae_is_invalid = false;
-    bool free_params_immediately = false;

    bool circular_x = false;
    bool circular_y = false;
@ -246,7 +245,7 @@ public:
        }
        return model_manager->register_param_tensors(desc,
                                                     std::move(group_tensors),
-                                                     free_params_immediately ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::Resident,
+                                                     backend_manager.params_backend_is_disk(module) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
                                                     backend_for(module),
                                                     params_backend_for(module),
                                                     params_mem_size);
@ -255,8 +254,7 @@ public:
    bool init_backend(const sd_ctx_params_t* sd_ctx_params) {
        std::string error;
        if (!backend_manager.init(sd_ctx_params->backend,
-                                  sd_ctx_params->params_backend,
-                                  offload_params_to_cpu,
+                                  params_backend_spec.c_str(),
                                  sd_ctx_params->keep_clip_on_cpu,
                                  sd_ctx_params->keep_vae_on_cpu,
                                  sd_ctx_params->keep_control_net_on_cpu,
@ -319,24 +317,21 @@ public:
    }

    bool init(const sd_ctx_params_t* sd_ctx_params) {
-        n_threads               = sd_ctx_params->n_threads;
-        vae_decode_only         = sd_ctx_params->vae_decode_only;
-        free_params_immediately = sd_ctx_params->free_params_immediately;
-        offload_params_to_cpu   = sd_ctx_params->offload_params_to_cpu;
-        enable_mmap             = sd_ctx_params->enable_mmap;
-        max_vram                = sd_ctx_params->max_vram;
-        stream_layers           = sd_ctx_params->stream_layers;
-        backend_spec            = SAFE_STR(sd_ctx_params->backend);
-        params_backend_spec     = SAFE_STR(sd_ctx_params->params_backend);
+        n_threads             = sd_ctx_params->n_threads;
+        vae_decode_only       = sd_ctx_params->vae_decode_only;
+        offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
+        enable_mmap           = sd_ctx_params->enable_mmap;
+        max_vram              = sd_ctx_params->max_vram;
+        stream_layers         = sd_ctx_params->stream_layers;
+        backend_spec          = SAFE_STR(sd_ctx_params->backend);
+        params_backend_spec   = SAFE_STR(sd_ctx_params->params_backend);
+        if (offload_params_to_cpu) {
+            params_backend_spec = params_backend_spec.empty() ? "*=cpu" : "*=cpu," + params_backend_spec;
+        }
        if (stream_layers && max_vram == 0.f) {
            LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring");
            stream_layers = false;
        }
-        if (stream_layers && !offload_params_to_cpu && params_backend_spec.empty()) {
-            // Streaming needs CPU-resident params.
-            LOG_WARN("--stream-layers has no effect without --offload-to-cpu (or --params-backend); ignoring");
-            stream_layers = false;
-        }

        bool use_tae         = false;
        bool use_audio_vae   = false;
@ -354,6 +349,10 @@ public:
        if (!init_backend(sd_ctx_params)) {
            return false;
        }
+        if (stream_layers && !backend_manager.params_backend_is_cpu(SDBackendModule::DIFFUSION)) {
+            LOG_WARN("--stream-layers has no effect unless diffusion params backend is cpu; ignoring");
+            stream_layers = false;
+        }
        max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION));

        model_manager = std::make_shared<ModelManager>();
@ -2644,7 +2643,6 @@ void sd_hires_params_init(sd_hires_params_t* hires_params) {
 void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) {
    *sd_ctx_params                         = {};
    sd_ctx_params->vae_decode_only         = true;
-    sd_ctx_params->free_params_immediately = true;
    sd_ctx_params->n_threads               = sd_get_num_physical_cores();
    sd_ctx_params->wtype                   = SD_TYPE_COUNT;
    sd_ctx_params->rng_type                = CUDA_RNG;
@ -2694,7 +2692,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
             "photo_maker_path: %s\n"
             "tensor_type_rules: %s\n"
             "vae_decode_only: %s\n"
-             "free_params_immediately: %s\n"
             "n_threads: %d\n"
             "wtype: %s\n"
             "rng_type: %s\n"
@ -2734,7 +2731,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
             SAFE_STR(sd_ctx_params->photo_maker_path),
             SAFE_STR(sd_ctx_params->tensor_type_rules),
             BOOL_STR(sd_ctx_params->vae_decode_only),
-             BOOL_STR(sd_ctx_params->free_params_immediately),
             sd_ctx_params->n_threads,
             sd_type_name(sd_ctx_params->wtype),
             sd_rng_type_name(sd_ctx_params->rng_type),
@ -5037,7 +5033,7 @@ static sd::Tensor<float> upscale_ltx_spatial_video_latent(sd_ctx_t* sd_ctx,
    upsampler->get_param_tensors(tensors);
    if (!upsampler_manager->register_param_tensors("LTX latent upsampler",
                                                   std::move(tensors),
-                                                   ModelManager::ResidencyMode::Resident,
+                                                   ModelManager::ResidencyMode::ParamBackend,
                                                   sd_ctx->sd->backend_for(SDBackendModule::UPSCALER),
                                                   sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) ||
        !upsampler_manager->validate_registered_tensors()) {
--- a/src/upscaler.cpp
+++ b/src/upscaler.cpp
@ -43,10 +43,13 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
                                  int n_threads) {
    ggml_log_set(ggml_log_callback_default, nullptr);

+    std::string effective_params_backend_spec = params_backend_spec;
+    if (offload_params_to_cpu) {
+        effective_params_backend_spec = effective_params_backend_spec.empty() ? "*=cpu" : "*=cpu," + effective_params_backend_spec;
+    }
    std::string error;
    if (!backend_manager.init(backend_spec.c_str(),
-                              params_backend_spec.c_str(),
-                              offload_params_to_cpu,
+                              effective_params_backend_spec.c_str(),
                              false,
                              false,
                              false,
@ -106,7 +109,7 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
    esrgan_upscaler->get_param_tensors(tensors);
    if (!model_manager->register_param_tensors("ESRGAN",
                                               std::move(tensors),
-                                               ModelManager::ResidencyMode::Resident,
+                                               backend_manager.params_backend_is_disk(SDBackendModule::UPSCALER) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
                                               backend_for(SDBackendModule::UPSCALER),
                                               params_backend_for(SDBackendModule::UPSCALER)) ||
        !model_manager->validate_registered_tensors()) {