diff --git a/docs/backend.md b/docs/backend.md index 53088b0e..248133bc 100644 --- a/docs/backend.md +++ b/docs/backend.md @@ -3,7 +3,7 @@ `stable-diffusion.cpp` has two backend assignments: - `--backend` selects the runtime backend used to execute model graphs. -- `--params-backend` selects the backend used to allocate model parameters. +- `--params-backend` selects where model parameters are kept. If `--params-backend` is not set, parameters use the same backend as their module runtime backend. @@ -29,6 +29,12 @@ The same syntax is used for parameter placement: sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu ``` +`--params-backend` also accepts the special value `disk`: + +```shell +sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk +``` + Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent. `all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment: @@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend. +The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend. + ## Runtime backend vs. parameter backend -The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated. +The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand. For example: @@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed. +For example: + +```shell +sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk +``` + +This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use. + +`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend. + Per-module assignments can be mixed: ```shell @@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution. Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance. +`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend. + `SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them. ## Compatibility flags @@ -113,10 +133,12 @@ The older CPU placement flags are still supported: `--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`. -`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to: +`--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing: ```shell ---params-backend cpu +--params-backend '*=cpu' ``` +Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk. + Explicit `--backend` and `--params-backend` assignments are preferred for new commands. diff --git a/docs/performance.md b/docs/performance.md index 0c4735e0..2f526057 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log: Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed. +## Use params backend to reduce VRAM or RAM usage. + +`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM. + +Use CPU params to reduce VRAM usage: + +```shell +--backend cuda0 --params-backend cpu +``` + +This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it: + +```shell +--offload-to-cpu --params-backend te=disk +``` + +Use disk params to reduce both VRAM and RAM usage: + +```shell +--backend cuda0 --params-backend disk +``` + +This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost. + +Per-module assignments can target only the largest modules: + +```shell +--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu +``` + +See [backend selection](./backend.md) for full syntax. + ## Use quantization to reduce memory usage. -[quantization](./quantization_and_gguf.md) \ No newline at end of file +[quantization](./quantization_and_gguf.md) diff --git a/examples/cli/main.cpp b/examples/cli/main.cpp index decee0a9..85901be6 100644 --- a/examples/cli/main.cpp +++ b/examples/cli/main.cpp @@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) { vae_decode_only = false; } - sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview); + sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview); SDImageVec results; int num_results = 0; diff --git a/examples/common/common.cpp b/examples/common/common.cpp index 3ae5faba..a6e396d8 100644 --- a/examples/common/common.cpp +++ b/examples/common/common.cpp @@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() { &backend}, {"", "--params-backend", - "parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu", + "parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu", ¶ms_backend}, }; @@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const { return oss.str(); } -sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) { +sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) { embedding_vec.clear(); embedding_vec.reserve(embedding_map.size()); for (const auto& kv : embedding_map) { @@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f photo_maker_path.c_str(), tensor_type_rules.c_str(), vae_decode_only, - free_params_immediately, n_threads, wtype, rng_type, diff --git a/examples/common/common.h b/examples/common/common.h index a90a3313..a6cf17b3 100644 --- a/examples/common/common.h +++ b/examples/common/common.h @@ -179,7 +179,7 @@ struct SDContextParams { bool validate(SDMode mode); bool resolve_and_validate(SDMode mode); std::string to_string() const; - sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview); + sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview); }; struct SDGenerationParams { diff --git a/examples/server/main.cpp b/examples/server/main.cpp index 32d570d6..1d8aa9bd 100644 --- a/examples/server/main.cpp +++ b/examples/server/main.cpp @@ -85,7 +85,7 @@ int main(int argc, const char** argv) { LOG_DEBUG("%s", ctx_params.to_string().c_str()); LOG_DEBUG("%s", default_gen_params.to_string().c_str()); - sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false); + sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false); SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params)); if (sd_ctx == nullptr) { diff --git a/include/stable-diffusion.h b/include/stable-diffusion.h index 2175f895..ecd01fd3 100644 --- a/include/stable-diffusion.h +++ b/include/stable-diffusion.h @@ -197,7 +197,6 @@ typedef struct { const char* photo_maker_path; const char* tensor_type_rules; bool vae_decode_only; - bool free_params_immediately; int n_threads; enum sd_type_t wtype; enum rng_type_t rng_type; diff --git a/src/core/ggml_extend_backend.cpp b/src/core/ggml_extend_backend.cpp index d085129d..500e04e2 100644 --- a/src/core/ggml_extend_backend.cpp +++ b/src/core/ggml_extend_backend.cpp @@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) { return lower.empty() || lower == "default" || lower == "auto"; } +static bool is_disk_backend_token(const std::string& name) { + return lower_copy(trim_copy(name)) == "disk"; +} + static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) { std::string name = lower_copy(trim_copy(raw_name)); name.erase(std::remove(name.begin(), name.end(), '-'), name.end()); @@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) { if (name.empty()) { return runtime_backend(module); } + if (is_disk_backend_token(name)) { + return runtime_backend(module); + } return init_cached_backend(name); } @@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) { return sd_backend_is_cpu(params_backend(module)); } +bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const { + return is_disk_backend_token(params_assignment_.get(module)); +} + bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) { ggml_backend_t backend = runtime_backend(module); if (backend == nullptr) { @@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu bool SDBackendManager::init(const char* backend_spec, const char* params_backend_spec, - bool offload_params_to_cpu, bool keep_clip_on_cpu, bool keep_vae_on_cpu, bool keep_control_net_on_cpu, @@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec, } } - if (params_assignment_.empty() && offload_params_to_cpu) { - params_assignment_.set_default("cpu"); - } - return validate(error); } bool SDBackendManager::validate(std::string* error) const { - auto validate_name = [&](const std::string& name) -> bool { + auto validate_runtime_name = [&](const std::string& name) -> bool { if (is_default_backend_token(name)) { return true; } + if (is_disk_backend_token(name)) { + if (error != nullptr) { + *error = "backend 'disk' is only supported by params_backend"; + } + return false; + } if (!sd_resolve_backend_name(name).empty()) { return true; } @@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const { } return false; }; + auto validate_params_name = [&](const std::string& name) -> bool { + if (is_disk_backend_token(name)) { + return true; + } + return validate_runtime_name(name); + }; - if (!validate_name(runtime_assignment_.default_name) || - !validate_name(params_assignment_.default_name)) { + if (!validate_runtime_name(runtime_assignment_.default_name) || + !validate_params_name(params_assignment_.default_name)) { return false; } for (const auto& kv : runtime_assignment_.module_names) { - if (!validate_name(kv.second)) { + if (!validate_runtime_name(kv.second)) { return false; } } for (const auto& kv : params_assignment_.module_names) { - if (!validate_name(kv.second)) { + if (!validate_params_name(kv.second)) { return false; } } diff --git a/src/core/ggml_extend_backend.h b/src/core/ggml_extend_backend.h index fc071ffd..a604984f 100644 --- a/src/core/ggml_extend_backend.h +++ b/src/core/ggml_extend_backend.h @@ -51,7 +51,6 @@ public: bool init(const char* backend_spec, const char* params_backend_spec, - bool offload_params_to_cpu, bool keep_clip_on_cpu, bool keep_vae_on_cpu, bool keep_control_net_on_cpu, @@ -63,6 +62,7 @@ public: bool runtime_backend_is_cpu(SDBackendModule module); bool params_backend_is_cpu(SDBackendModule module); + bool params_backend_is_disk(SDBackendModule module) const; bool runtime_backend_supports_host_buffer(SDBackendModule module); private: diff --git a/src/model/adapter/lora.hpp b/src/model/adapter/lora.hpp index 0899688e..0b759175 100644 --- a/src/model/adapter/lora.hpp +++ b/src/model/adapter/lora.hpp @@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner { if (model_manager == nullptr || !model_manager->register_param_tensors("LoRA", std::move(tensors), - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, runtime_backend, params_backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/adapter/pmid.hpp b/src/model/adapter/pmid.hpp index 69191b74..8f7d4dbd 100644 --- a/src/model/adapter/pmid.hpp +++ b/src/model/adapter/pmid.hpp @@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner { model_loader.load_tensors(on_new_tensor_cb); if (!model_manager->register_param_tensors("PhotoMaker ID embeds", tensors, - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, runtime_backend, params_backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/control.hpp b/src/model/diffusion/control.hpp index 7cf9370b..d857fa09 100644 --- a/src/model/diffusion/control.hpp +++ b/src/model/diffusion/control.hpp @@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner { manager->set_n_threads(n_threads); if (!manager->register_param_tensors("ControlNet", std::move(tensors), - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, runtime_backend, params_backend) || !manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/flux.hpp b/src/model/diffusion/flux.hpp index 7efaf931..d3dfb71e 100644 --- a/src/model/diffusion/flux.hpp +++ b/src/model/diffusion/flux.hpp @@ -1609,7 +1609,7 @@ namespace Flux { if (!model_manager->register_runner_params("Flux test", *flux, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/ltxv.hpp b/src/model/diffusion/ltxv.hpp index 3535821d..b89ff32c 100644 --- a/src/model/diffusion/ltxv.hpp +++ b/src/model/diffusion/ltxv.hpp @@ -2048,7 +2048,7 @@ namespace LTXV { if (!model_manager->register_runner_params("LTXAV test", *ltxav, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/mmdit.hpp b/src/model/diffusion/mmdit.hpp index b73a9fc7..d8e76dfb 100644 --- a/src/model/diffusion/mmdit.hpp +++ b/src/model/diffusion/mmdit.hpp @@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner { if (!model_manager->register_runner_params("MMDiT test", *mmdit, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/qwen_image.hpp b/src/model/diffusion/qwen_image.hpp index 5cee54c5..aecd8bce 100644 --- a/src/model/diffusion/qwen_image.hpp +++ b/src/model/diffusion/qwen_image.hpp @@ -715,7 +715,7 @@ namespace Qwen { if (!model_manager->register_runner_params("Qwen image test", *qwen_image, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/wan.hpp b/src/model/diffusion/wan.hpp index 9e27807f..9a907dcf 100644 --- a/src/model/diffusion/wan.hpp +++ b/src/model/diffusion/wan.hpp @@ -1040,7 +1040,7 @@ namespace WAN { if (!model_manager->register_runner_params("Wan test", *wan, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/diffusion/z_image.hpp b/src/model/diffusion/z_image.hpp index 936da0f7..d23c2856 100644 --- a/src/model/diffusion/z_image.hpp +++ b/src/model/diffusion/z_image.hpp @@ -723,7 +723,7 @@ namespace ZImage { if (!model_manager->register_runner_params("ZImage test", *z_image, "model.diffusion_model", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/te/llm.hpp b/src/model/te/llm.hpp index 3905d53a..74dc232e 100644 --- a/src/model/te/llm.hpp +++ b/src/model/te/llm.hpp @@ -2084,7 +2084,7 @@ namespace LLM { if (!model_manager->register_runner_params("LLM test", *llm, "text_encoders.llm", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/te/t5.hpp b/src/model/te/t5.hpp index a8d1e869..23da0822 100644 --- a/src/model/te/t5.hpp +++ b/src/model/te/t5.hpp @@ -592,7 +592,7 @@ struct T5Embedder { if (!model_manager->register_runner_params("T5 test", *t5, "", - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/vae/ltx_audio_vae.hpp b/src/model/vae/ltx_audio_vae.hpp index bd0d18a9..997c57a5 100644 --- a/src/model/vae/ltx_audio_vae.hpp +++ b/src/model/vae/ltx_audio_vae.hpp @@ -1082,7 +1082,7 @@ namespace LTXV { if (!model_manager->register_runner_params("LTX audio VAE test", *ltx_audio_vae, - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/vae/ltx_vae.hpp b/src/model/vae/ltx_vae.hpp index 77ce9656..7eeff31b 100644 --- a/src/model/vae/ltx_vae.hpp +++ b/src/model/vae/ltx_vae.hpp @@ -1538,7 +1538,7 @@ struct LTXVideoVAE : public VAE { if (!model_manager->register_runner_params("LTX VAE test", *vae, - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model/vae/wan_vae.hpp b/src/model/vae/wan_vae.hpp index c8cfaa9d..8a845c7c 100644 --- a/src/model/vae/wan_vae.hpp +++ b/src/model/vae/wan_vae.hpp @@ -1340,7 +1340,7 @@ namespace WAN { if (!model_manager->register_runner_params("Wan VAE test", *vae, - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, backend, backend) || !model_manager->validate_registered_tensors()) { diff --git a/src/model_manager.cpp b/src/model_manager.cpp index 328a478b..d32f0365 100644 --- a/src/model_manager.cpp +++ b/src/model_manager.cpp @@ -492,7 +492,7 @@ bool ModelManager::mmap_params(const std::vector& states, } bool ModelManager::can_mmap_storage(const TensorState& state) const { - if (!enable_mmap_ || state.residency_mode != ResidencyMode::Resident) { + if (!enable_mmap_ || state.residency_mode != ResidencyMode::ParamBackend) { return false; } if (state.compute_backend == nullptr || state.params_backend == nullptr) { diff --git a/src/model_manager.h b/src/model_manager.h index e18d4c5d..1a414c15 100644 --- a/src/model_manager.h +++ b/src/model_manager.h @@ -16,7 +16,7 @@ class ModelManager : public RunnerWeightManager { public: enum class ResidencyMode { Disk, - Resident, + ParamBackend, }; struct LoraSpec { @@ -33,7 +33,7 @@ private: ggml_tensor* tensor = nullptr; std::string desc; - ResidencyMode residency_mode = ResidencyMode::Resident; + ResidencyMode residency_mode = ResidencyMode::ParamBackend; ggml_backend_t compute_backend = nullptr; ggml_backend_t params_backend = nullptr; bool metadata_validated = false; diff --git a/src/stable-diffusion.cpp b/src/stable-diffusion.cpp index c071fd29..1b3b5dba 100644 --- a/src/stable-diffusion.cpp +++ b/src/stable-diffusion.cpp @@ -165,7 +165,6 @@ public: SDVersion version; bool vae_decode_only = false; bool external_vae_is_invalid = false; - bool free_params_immediately = false; bool circular_x = false; bool circular_y = false; @@ -246,7 +245,7 @@ public: } return model_manager->register_param_tensors(desc, std::move(group_tensors), - free_params_immediately ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::Resident, + backend_manager.params_backend_is_disk(module) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend, backend_for(module), params_backend_for(module), params_mem_size); @@ -255,8 +254,7 @@ public: bool init_backend(const sd_ctx_params_t* sd_ctx_params) { std::string error; if (!backend_manager.init(sd_ctx_params->backend, - sd_ctx_params->params_backend, - offload_params_to_cpu, + params_backend_spec.c_str(), sd_ctx_params->keep_clip_on_cpu, sd_ctx_params->keep_vae_on_cpu, sd_ctx_params->keep_control_net_on_cpu, @@ -319,24 +317,21 @@ public: } bool init(const sd_ctx_params_t* sd_ctx_params) { - n_threads = sd_ctx_params->n_threads; - vae_decode_only = sd_ctx_params->vae_decode_only; - free_params_immediately = sd_ctx_params->free_params_immediately; - offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu; - enable_mmap = sd_ctx_params->enable_mmap; - max_vram = sd_ctx_params->max_vram; - stream_layers = sd_ctx_params->stream_layers; - backend_spec = SAFE_STR(sd_ctx_params->backend); - params_backend_spec = SAFE_STR(sd_ctx_params->params_backend); + n_threads = sd_ctx_params->n_threads; + vae_decode_only = sd_ctx_params->vae_decode_only; + offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu; + enable_mmap = sd_ctx_params->enable_mmap; + max_vram = sd_ctx_params->max_vram; + stream_layers = sd_ctx_params->stream_layers; + backend_spec = SAFE_STR(sd_ctx_params->backend); + params_backend_spec = SAFE_STR(sd_ctx_params->params_backend); + if (offload_params_to_cpu) { + params_backend_spec = params_backend_spec.empty() ? "*=cpu" : "*=cpu," + params_backend_spec; + } if (stream_layers && max_vram == 0.f) { LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring"); stream_layers = false; } - if (stream_layers && !offload_params_to_cpu && params_backend_spec.empty()) { - // Streaming needs CPU-resident params. - LOG_WARN("--stream-layers has no effect without --offload-to-cpu (or --params-backend); ignoring"); - stream_layers = false; - } bool use_tae = false; bool use_audio_vae = false; @@ -354,6 +349,10 @@ public: if (!init_backend(sd_ctx_params)) { return false; } + if (stream_layers && !backend_manager.params_backend_is_cpu(SDBackendModule::DIFFUSION)) { + LOG_WARN("--stream-layers has no effect unless diffusion params backend is cpu; ignoring"); + stream_layers = false; + } max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION)); model_manager = std::make_shared(); @@ -2644,7 +2643,6 @@ void sd_hires_params_init(sd_hires_params_t* hires_params) { void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) { *sd_ctx_params = {}; sd_ctx_params->vae_decode_only = true; - sd_ctx_params->free_params_immediately = true; sd_ctx_params->n_threads = sd_get_num_physical_cores(); sd_ctx_params->wtype = SD_TYPE_COUNT; sd_ctx_params->rng_type = CUDA_RNG; @@ -2694,7 +2692,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) { "photo_maker_path: %s\n" "tensor_type_rules: %s\n" "vae_decode_only: %s\n" - "free_params_immediately: %s\n" "n_threads: %d\n" "wtype: %s\n" "rng_type: %s\n" @@ -2734,7 +2731,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) { SAFE_STR(sd_ctx_params->photo_maker_path), SAFE_STR(sd_ctx_params->tensor_type_rules), BOOL_STR(sd_ctx_params->vae_decode_only), - BOOL_STR(sd_ctx_params->free_params_immediately), sd_ctx_params->n_threads, sd_type_name(sd_ctx_params->wtype), sd_rng_type_name(sd_ctx_params->rng_type), @@ -5037,7 +5033,7 @@ static sd::Tensor upscale_ltx_spatial_video_latent(sd_ctx_t* sd_ctx, upsampler->get_param_tensors(tensors); if (!upsampler_manager->register_param_tensors("LTX latent upsampler", std::move(tensors), - ModelManager::ResidencyMode::Resident, + ModelManager::ResidencyMode::ParamBackend, sd_ctx->sd->backend_for(SDBackendModule::UPSCALER), sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) || !upsampler_manager->validate_registered_tensors()) { diff --git a/src/upscaler.cpp b/src/upscaler.cpp index 0a9182e9..be1bb2f5 100644 --- a/src/upscaler.cpp +++ b/src/upscaler.cpp @@ -43,10 +43,13 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path, int n_threads) { ggml_log_set(ggml_log_callback_default, nullptr); + std::string effective_params_backend_spec = params_backend_spec; + if (offload_params_to_cpu) { + effective_params_backend_spec = effective_params_backend_spec.empty() ? "*=cpu" : "*=cpu," + effective_params_backend_spec; + } std::string error; if (!backend_manager.init(backend_spec.c_str(), - params_backend_spec.c_str(), - offload_params_to_cpu, + effective_params_backend_spec.c_str(), false, false, false, @@ -106,7 +109,7 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path, esrgan_upscaler->get_param_tensors(tensors); if (!model_manager->register_param_tensors("ESRGAN", std::move(tensors), - ModelManager::ResidencyMode::Resident, + backend_manager.params_backend_is_disk(SDBackendModule::UPSCALER) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend, backend_for(SDBackendModule::UPSCALER), params_backend_for(SDBackendModule::UPSCALER)) || !model_manager->validate_registered_tensors()) {