feat: support disk params backend (#1651)

This commit is contained in:
leejet 2026-06-14 14:48:50 +08:00 committed by GitHub
parent 276025e054
commit bdb431ad95
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
27 changed files with 134 additions and 65 deletions

View File

@ -3,7 +3,7 @@
`stable-diffusion.cpp` has two backend assignments: `stable-diffusion.cpp` has two backend assignments:
- `--backend` selects the runtime backend used to execute model graphs. - `--backend` selects the runtime backend used to execute model graphs.
- `--params-backend` selects the backend used to allocate model parameters. - `--params-backend` selects where model parameters are kept.
If `--params-backend` is not set, parameters use the same backend as their module runtime backend. If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
@ -29,6 +29,12 @@ The same syntax is used for parameter placement:
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
``` ```
`--params-backend` also accepts the special value `disk`:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent. Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment: `all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend. The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.
## Runtime backend vs. parameter backend ## Runtime backend vs. parameter backend
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated. The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.
For example: For example:
@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed. This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
For example:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```
This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.
`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.
Per-module assignments can be mixed: Per-module assignments can be mixed:
```shell ```shell
@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution.
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance. Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them. `SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
## Compatibility flags ## Compatibility flags
@ -113,10 +133,12 @@ The older CPU placement flags are still supported:
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`. `--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.
`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to: `--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing:
```shell ```shell
--params-backend cpu --params-backend '*=cpu'
``` ```
Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.
Explicit `--backend` and `--params-backend` assignments are preferred for new commands. Explicit `--backend` and `--params-backend` assignments are preferred for new commands.

View File

@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log:
Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed. Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.
## Use params backend to reduce VRAM or RAM usage.
`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM.
Use CPU params to reduce VRAM usage:
```shell
--backend cuda0 --params-backend cpu
```
This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it:
```shell
--offload-to-cpu --params-backend te=disk
```
Use disk params to reduce both VRAM and RAM usage:
```shell
--backend cuda0 --params-backend disk
```
This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost.
Per-module assignments can target only the largest modules:
```shell
--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu
```
See [backend selection](./backend.md) for full syntax.
## Use quantization to reduce memory usage. ## Use quantization to reduce memory usage.
[quantization](./quantization_and_gguf.md) [quantization](./quantization_and_gguf.md)

View File

@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) {
vae_decode_only = false; vae_decode_only = false;
} }
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview); sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview);
SDImageVec results; SDImageVec results;
int num_results = 0; int num_results = 0;

View File

@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() {
&backend}, &backend},
{"", {"",
"--params-backend", "--params-backend",
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu", "parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu",
&params_backend}, &params_backend},
}; };
@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const {
return oss.str(); return oss.str();
} }
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) { sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) {
embedding_vec.clear(); embedding_vec.clear();
embedding_vec.reserve(embedding_map.size()); embedding_vec.reserve(embedding_map.size());
for (const auto& kv : embedding_map) { for (const auto& kv : embedding_map) {
@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
photo_maker_path.c_str(), photo_maker_path.c_str(),
tensor_type_rules.c_str(), tensor_type_rules.c_str(),
vae_decode_only, vae_decode_only,
free_params_immediately,
n_threads, n_threads,
wtype, wtype,
rng_type, rng_type,

View File

@ -179,7 +179,7 @@ struct SDContextParams {
bool validate(SDMode mode); bool validate(SDMode mode);
bool resolve_and_validate(SDMode mode); bool resolve_and_validate(SDMode mode);
std::string to_string() const; std::string to_string() const;
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview); sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview);
}; };
struct SDGenerationParams { struct SDGenerationParams {

View File

@ -85,7 +85,7 @@ int main(int argc, const char** argv) {
LOG_DEBUG("%s", ctx_params.to_string().c_str()); LOG_DEBUG("%s", ctx_params.to_string().c_str());
LOG_DEBUG("%s", default_gen_params.to_string().c_str()); LOG_DEBUG("%s", default_gen_params.to_string().c_str());
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false); sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false);
SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params)); SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));
if (sd_ctx == nullptr) { if (sd_ctx == nullptr) {

View File

@ -197,7 +197,6 @@ typedef struct {
const char* photo_maker_path; const char* photo_maker_path;
const char* tensor_type_rules; const char* tensor_type_rules;
bool vae_decode_only; bool vae_decode_only;
bool free_params_immediately;
int n_threads; int n_threads;
enum sd_type_t wtype; enum sd_type_t wtype;
enum rng_type_t rng_type; enum rng_type_t rng_type;

View File

@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) {
return lower.empty() || lower == "default" || lower == "auto"; return lower.empty() || lower == "default" || lower == "auto";
} }
static bool is_disk_backend_token(const std::string& name) {
return lower_copy(trim_copy(name)) == "disk";
}
static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) { static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
std::string name = lower_copy(trim_copy(raw_name)); std::string name = lower_copy(trim_copy(raw_name));
name.erase(std::remove(name.begin(), name.end(), '-'), name.end()); name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) {
if (name.empty()) { if (name.empty()) {
return runtime_backend(module); return runtime_backend(module);
} }
if (is_disk_backend_token(name)) {
return runtime_backend(module);
}
return init_cached_backend(name); return init_cached_backend(name);
} }
@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) {
return sd_backend_is_cpu(params_backend(module)); return sd_backend_is_cpu(params_backend(module));
} }
bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const {
return is_disk_backend_token(params_assignment_.get(module));
}
bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) { bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
ggml_backend_t backend = runtime_backend(module); ggml_backend_t backend = runtime_backend(module);
if (backend == nullptr) { if (backend == nullptr) {
@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu
bool SDBackendManager::init(const char* backend_spec, bool SDBackendManager::init(const char* backend_spec,
const char* params_backend_spec, const char* params_backend_spec,
bool offload_params_to_cpu,
bool keep_clip_on_cpu, bool keep_clip_on_cpu,
bool keep_vae_on_cpu, bool keep_vae_on_cpu,
bool keep_control_net_on_cpu, bool keep_control_net_on_cpu,
@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec,
} }
} }
if (params_assignment_.empty() && offload_params_to_cpu) {
params_assignment_.set_default("cpu");
}
return validate(error); return validate(error);
} }
bool SDBackendManager::validate(std::string* error) const { bool SDBackendManager::validate(std::string* error) const {
auto validate_name = [&](const std::string& name) -> bool { auto validate_runtime_name = [&](const std::string& name) -> bool {
if (is_default_backend_token(name)) { if (is_default_backend_token(name)) {
return true; return true;
} }
if (is_disk_backend_token(name)) {
if (error != nullptr) {
*error = "backend 'disk' is only supported by params_backend";
}
return false;
}
if (!sd_resolve_backend_name(name).empty()) { if (!sd_resolve_backend_name(name).empty()) {
return true; return true;
} }
@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const {
} }
return false; return false;
}; };
auto validate_params_name = [&](const std::string& name) -> bool {
if (is_disk_backend_token(name)) {
return true;
}
return validate_runtime_name(name);
};
if (!validate_name(runtime_assignment_.default_name) || if (!validate_runtime_name(runtime_assignment_.default_name) ||
!validate_name(params_assignment_.default_name)) { !validate_params_name(params_assignment_.default_name)) {
return false; return false;
} }
for (const auto& kv : runtime_assignment_.module_names) { for (const auto& kv : runtime_assignment_.module_names) {
if (!validate_name(kv.second)) { if (!validate_runtime_name(kv.second)) {
return false; return false;
} }
} }
for (const auto& kv : params_assignment_.module_names) { for (const auto& kv : params_assignment_.module_names) {
if (!validate_name(kv.second)) { if (!validate_params_name(kv.second)) {
return false; return false;
} }
} }

View File

@ -51,7 +51,6 @@ public:
bool init(const char* backend_spec, bool init(const char* backend_spec,
const char* params_backend_spec, const char* params_backend_spec,
bool offload_params_to_cpu,
bool keep_clip_on_cpu, bool keep_clip_on_cpu,
bool keep_vae_on_cpu, bool keep_vae_on_cpu,
bool keep_control_net_on_cpu, bool keep_control_net_on_cpu,
@ -63,6 +62,7 @@ public:
bool runtime_backend_is_cpu(SDBackendModule module); bool runtime_backend_is_cpu(SDBackendModule module);
bool params_backend_is_cpu(SDBackendModule module); bool params_backend_is_cpu(SDBackendModule module);
bool params_backend_is_disk(SDBackendModule module) const;
bool runtime_backend_supports_host_buffer(SDBackendModule module); bool runtime_backend_supports_host_buffer(SDBackendModule module);
private: private:

View File

@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner {
if (model_manager == nullptr || if (model_manager == nullptr ||
!model_manager->register_param_tensors("LoRA", !model_manager->register_param_tensors("LoRA",
std::move(tensors), std::move(tensors),
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
runtime_backend, runtime_backend,
params_backend) || params_backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner {
model_loader.load_tensors(on_new_tensor_cb); model_loader.load_tensors(on_new_tensor_cb);
if (!model_manager->register_param_tensors("PhotoMaker ID embeds", if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
tensors, tensors,
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
runtime_backend, runtime_backend,
params_backend) || params_backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner {
manager->set_n_threads(n_threads); manager->set_n_threads(n_threads);
if (!manager->register_param_tensors("ControlNet", if (!manager->register_param_tensors("ControlNet",
std::move(tensors), std::move(tensors),
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
runtime_backend, runtime_backend,
params_backend) || params_backend) ||
!manager->validate_registered_tensors()) { !manager->validate_registered_tensors()) {

View File

@ -1609,7 +1609,7 @@ namespace Flux {
if (!model_manager->register_runner_params("Flux test", if (!model_manager->register_runner_params("Flux test",
*flux, *flux,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -2048,7 +2048,7 @@ namespace LTXV {
if (!model_manager->register_runner_params("LTXAV test", if (!model_manager->register_runner_params("LTXAV test",
*ltxav, *ltxav,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner {
if (!model_manager->register_runner_params("MMDiT test", if (!model_manager->register_runner_params("MMDiT test",
*mmdit, *mmdit,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -715,7 +715,7 @@ namespace Qwen {
if (!model_manager->register_runner_params("Qwen image test", if (!model_manager->register_runner_params("Qwen image test",
*qwen_image, *qwen_image,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -1040,7 +1040,7 @@ namespace WAN {
if (!model_manager->register_runner_params("Wan test", if (!model_manager->register_runner_params("Wan test",
*wan, *wan,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -723,7 +723,7 @@ namespace ZImage {
if (!model_manager->register_runner_params("ZImage test", if (!model_manager->register_runner_params("ZImage test",
*z_image, *z_image,
"model.diffusion_model", "model.diffusion_model",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -2084,7 +2084,7 @@ namespace LLM {
if (!model_manager->register_runner_params("LLM test", if (!model_manager->register_runner_params("LLM test",
*llm, *llm,
"text_encoders.llm", "text_encoders.llm",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -592,7 +592,7 @@ struct T5Embedder {
if (!model_manager->register_runner_params("T5 test", if (!model_manager->register_runner_params("T5 test",
*t5, *t5,
"", "",
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -1082,7 +1082,7 @@ namespace LTXV {
if (!model_manager->register_runner_params("LTX audio VAE test", if (!model_manager->register_runner_params("LTX audio VAE test",
*ltx_audio_vae, *ltx_audio_vae,
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -1538,7 +1538,7 @@ struct LTXVideoVAE : public VAE {
if (!model_manager->register_runner_params("LTX VAE test", if (!model_manager->register_runner_params("LTX VAE test",
*vae, *vae,
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -1340,7 +1340,7 @@ namespace WAN {
if (!model_manager->register_runner_params("Wan VAE test", if (!model_manager->register_runner_params("Wan VAE test",
*vae, *vae,
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
backend, backend,
backend) || backend) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {

View File

@ -492,7 +492,7 @@ bool ModelManager::mmap_params(const std::vector<TensorState*>& states,
} }
bool ModelManager::can_mmap_storage(const TensorState& state) const { bool ModelManager::can_mmap_storage(const TensorState& state) const {
if (!enable_mmap_ || state.residency_mode != ResidencyMode::Resident) { if (!enable_mmap_ || state.residency_mode != ResidencyMode::ParamBackend) {
return false; return false;
} }
if (state.compute_backend == nullptr || state.params_backend == nullptr) { if (state.compute_backend == nullptr || state.params_backend == nullptr) {

View File

@ -16,7 +16,7 @@ class ModelManager : public RunnerWeightManager {
public: public:
enum class ResidencyMode { enum class ResidencyMode {
Disk, Disk,
Resident, ParamBackend,
}; };
struct LoraSpec { struct LoraSpec {
@ -33,7 +33,7 @@ private:
ggml_tensor* tensor = nullptr; ggml_tensor* tensor = nullptr;
std::string desc; std::string desc;
ResidencyMode residency_mode = ResidencyMode::Resident; ResidencyMode residency_mode = ResidencyMode::ParamBackend;
ggml_backend_t compute_backend = nullptr; ggml_backend_t compute_backend = nullptr;
ggml_backend_t params_backend = nullptr; ggml_backend_t params_backend = nullptr;
bool metadata_validated = false; bool metadata_validated = false;

View File

@ -165,7 +165,6 @@ public:
SDVersion version; SDVersion version;
bool vae_decode_only = false; bool vae_decode_only = false;
bool external_vae_is_invalid = false; bool external_vae_is_invalid = false;
bool free_params_immediately = false;
bool circular_x = false; bool circular_x = false;
bool circular_y = false; bool circular_y = false;
@ -246,7 +245,7 @@ public:
} }
return model_manager->register_param_tensors(desc, return model_manager->register_param_tensors(desc,
std::move(group_tensors), std::move(group_tensors),
free_params_immediately ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::Resident, backend_manager.params_backend_is_disk(module) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
backend_for(module), backend_for(module),
params_backend_for(module), params_backend_for(module),
params_mem_size); params_mem_size);
@ -255,8 +254,7 @@ public:
bool init_backend(const sd_ctx_params_t* sd_ctx_params) { bool init_backend(const sd_ctx_params_t* sd_ctx_params) {
std::string error; std::string error;
if (!backend_manager.init(sd_ctx_params->backend, if (!backend_manager.init(sd_ctx_params->backend,
sd_ctx_params->params_backend, params_backend_spec.c_str(),
offload_params_to_cpu,
sd_ctx_params->keep_clip_on_cpu, sd_ctx_params->keep_clip_on_cpu,
sd_ctx_params->keep_vae_on_cpu, sd_ctx_params->keep_vae_on_cpu,
sd_ctx_params->keep_control_net_on_cpu, sd_ctx_params->keep_control_net_on_cpu,
@ -319,24 +317,21 @@ public:
} }
bool init(const sd_ctx_params_t* sd_ctx_params) { bool init(const sd_ctx_params_t* sd_ctx_params) {
n_threads = sd_ctx_params->n_threads; n_threads = sd_ctx_params->n_threads;
vae_decode_only = sd_ctx_params->vae_decode_only; vae_decode_only = sd_ctx_params->vae_decode_only;
free_params_immediately = sd_ctx_params->free_params_immediately; offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu; enable_mmap = sd_ctx_params->enable_mmap;
enable_mmap = sd_ctx_params->enable_mmap; max_vram = sd_ctx_params->max_vram;
max_vram = sd_ctx_params->max_vram; stream_layers = sd_ctx_params->stream_layers;
stream_layers = sd_ctx_params->stream_layers; backend_spec = SAFE_STR(sd_ctx_params->backend);
backend_spec = SAFE_STR(sd_ctx_params->backend); params_backend_spec = SAFE_STR(sd_ctx_params->params_backend);
params_backend_spec = SAFE_STR(sd_ctx_params->params_backend); if (offload_params_to_cpu) {
params_backend_spec = params_backend_spec.empty() ? "*=cpu" : "*=cpu," + params_backend_spec;
}
if (stream_layers && max_vram == 0.f) { if (stream_layers && max_vram == 0.f) {
LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring"); LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring");
stream_layers = false; stream_layers = false;
} }
if (stream_layers && !offload_params_to_cpu && params_backend_spec.empty()) {
// Streaming needs CPU-resident params.
LOG_WARN("--stream-layers has no effect without --offload-to-cpu (or --params-backend); ignoring");
stream_layers = false;
}
bool use_tae = false; bool use_tae = false;
bool use_audio_vae = false; bool use_audio_vae = false;
@ -354,6 +349,10 @@ public:
if (!init_backend(sd_ctx_params)) { if (!init_backend(sd_ctx_params)) {
return false; return false;
} }
if (stream_layers && !backend_manager.params_backend_is_cpu(SDBackendModule::DIFFUSION)) {
LOG_WARN("--stream-layers has no effect unless diffusion params backend is cpu; ignoring");
stream_layers = false;
}
max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION)); max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION));
model_manager = std::make_shared<ModelManager>(); model_manager = std::make_shared<ModelManager>();
@ -2644,7 +2643,6 @@ void sd_hires_params_init(sd_hires_params_t* hires_params) {
void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) { void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) {
*sd_ctx_params = {}; *sd_ctx_params = {};
sd_ctx_params->vae_decode_only = true; sd_ctx_params->vae_decode_only = true;
sd_ctx_params->free_params_immediately = true;
sd_ctx_params->n_threads = sd_get_num_physical_cores(); sd_ctx_params->n_threads = sd_get_num_physical_cores();
sd_ctx_params->wtype = SD_TYPE_COUNT; sd_ctx_params->wtype = SD_TYPE_COUNT;
sd_ctx_params->rng_type = CUDA_RNG; sd_ctx_params->rng_type = CUDA_RNG;
@ -2694,7 +2692,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
"photo_maker_path: %s\n" "photo_maker_path: %s\n"
"tensor_type_rules: %s\n" "tensor_type_rules: %s\n"
"vae_decode_only: %s\n" "vae_decode_only: %s\n"
"free_params_immediately: %s\n"
"n_threads: %d\n" "n_threads: %d\n"
"wtype: %s\n" "wtype: %s\n"
"rng_type: %s\n" "rng_type: %s\n"
@ -2734,7 +2731,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
SAFE_STR(sd_ctx_params->photo_maker_path), SAFE_STR(sd_ctx_params->photo_maker_path),
SAFE_STR(sd_ctx_params->tensor_type_rules), SAFE_STR(sd_ctx_params->tensor_type_rules),
BOOL_STR(sd_ctx_params->vae_decode_only), BOOL_STR(sd_ctx_params->vae_decode_only),
BOOL_STR(sd_ctx_params->free_params_immediately),
sd_ctx_params->n_threads, sd_ctx_params->n_threads,
sd_type_name(sd_ctx_params->wtype), sd_type_name(sd_ctx_params->wtype),
sd_rng_type_name(sd_ctx_params->rng_type), sd_rng_type_name(sd_ctx_params->rng_type),
@ -5037,7 +5033,7 @@ static sd::Tensor<float> upscale_ltx_spatial_video_latent(sd_ctx_t* sd_ctx,
upsampler->get_param_tensors(tensors); upsampler->get_param_tensors(tensors);
if (!upsampler_manager->register_param_tensors("LTX latent upsampler", if (!upsampler_manager->register_param_tensors("LTX latent upsampler",
std::move(tensors), std::move(tensors),
ModelManager::ResidencyMode::Resident, ModelManager::ResidencyMode::ParamBackend,
sd_ctx->sd->backend_for(SDBackendModule::UPSCALER), sd_ctx->sd->backend_for(SDBackendModule::UPSCALER),
sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) || sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) ||
!upsampler_manager->validate_registered_tensors()) { !upsampler_manager->validate_registered_tensors()) {

View File

@ -43,10 +43,13 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
int n_threads) { int n_threads) {
ggml_log_set(ggml_log_callback_default, nullptr); ggml_log_set(ggml_log_callback_default, nullptr);
std::string effective_params_backend_spec = params_backend_spec;
if (offload_params_to_cpu) {
effective_params_backend_spec = effective_params_backend_spec.empty() ? "*=cpu" : "*=cpu," + effective_params_backend_spec;
}
std::string error; std::string error;
if (!backend_manager.init(backend_spec.c_str(), if (!backend_manager.init(backend_spec.c_str(),
params_backend_spec.c_str(), effective_params_backend_spec.c_str(),
offload_params_to_cpu,
false, false,
false, false,
false, false,
@ -106,7 +109,7 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
esrgan_upscaler->get_param_tensors(tensors); esrgan_upscaler->get_param_tensors(tensors);
if (!model_manager->register_param_tensors("ESRGAN", if (!model_manager->register_param_tensors("ESRGAN",
std::move(tensors), std::move(tensors),
ModelManager::ResidencyMode::Resident, backend_manager.params_backend_is_disk(SDBackendModule::UPSCALER) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
backend_for(SDBackendModule::UPSCALER), backend_for(SDBackendModule::UPSCALER),
params_backend_for(SDBackendModule::UPSCALER)) || params_backend_for(SDBackendModule::UPSCALER)) ||
!model_manager->validate_registered_tensors()) { !model_manager->validate_registered_tensors()) {