mirror of
https://github.com/leejet/stable-diffusion.cpp.git
synced 2026-06-19 04:37:18 +00:00
feat: support disk params backend (#1651)
This commit is contained in:
parent
276025e054
commit
bdb431ad95
@ -3,7 +3,7 @@
|
|||||||
`stable-diffusion.cpp` has two backend assignments:
|
`stable-diffusion.cpp` has two backend assignments:
|
||||||
|
|
||||||
- `--backend` selects the runtime backend used to execute model graphs.
|
- `--backend` selects the runtime backend used to execute model graphs.
|
||||||
- `--params-backend` selects the backend used to allocate model parameters.
|
- `--params-backend` selects where model parameters are kept.
|
||||||
|
|
||||||
If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
|
If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
|
||||||
|
|
||||||
@ -29,6 +29,12 @@ The same syntax is used for parameter placement:
|
|||||||
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
|
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`--params-backend` also accepts the special value `disk`:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
|
||||||
|
```
|
||||||
|
|
||||||
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
|
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
|
||||||
|
|
||||||
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
|
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
|
||||||
@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau
|
|||||||
|
|
||||||
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
|
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
|
||||||
|
|
||||||
|
The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.
|
||||||
|
|
||||||
## Runtime backend vs. parameter backend
|
## Runtime backend vs. parameter backend
|
||||||
|
|
||||||
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.
|
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.
|
||||||
|
|
||||||
For example:
|
For example:
|
||||||
|
|
||||||
@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
|
|||||||
|
|
||||||
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
|
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
|
||||||
|
```
|
||||||
|
|
||||||
|
This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.
|
||||||
|
|
||||||
|
`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.
|
||||||
|
|
||||||
Per-module assignments can be mixed:
|
Per-module assignments can be mixed:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution.
|
|||||||
|
|
||||||
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
|
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
|
||||||
|
|
||||||
|
`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.
|
||||||
|
|
||||||
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
|
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
|
||||||
|
|
||||||
## Compatibility flags
|
## Compatibility flags
|
||||||
@ -113,10 +133,12 @@ The older CPU placement flags are still supported:
|
|||||||
|
|
||||||
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.
|
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.
|
||||||
|
|
||||||
`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:
|
`--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
--params-backend cpu
|
--params-backend '*=cpu'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.
|
||||||
|
|
||||||
Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
|
Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
|
||||||
|
|||||||
@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log:
|
|||||||
|
|
||||||
Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.
|
Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.
|
||||||
|
|
||||||
|
## Use params backend to reduce VRAM or RAM usage.
|
||||||
|
|
||||||
|
`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM.
|
||||||
|
|
||||||
|
Use CPU params to reduce VRAM usage:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
--backend cuda0 --params-backend cpu
|
||||||
|
```
|
||||||
|
|
||||||
|
This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
--offload-to-cpu --params-backend te=disk
|
||||||
|
```
|
||||||
|
|
||||||
|
Use disk params to reduce both VRAM and RAM usage:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
--backend cuda0 --params-backend disk
|
||||||
|
```
|
||||||
|
|
||||||
|
This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost.
|
||||||
|
|
||||||
|
Per-module assignments can target only the largest modules:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu
|
||||||
|
```
|
||||||
|
|
||||||
|
See [backend selection](./backend.md) for full syntax.
|
||||||
|
|
||||||
## Use quantization to reduce memory usage.
|
## Use quantization to reduce memory usage.
|
||||||
|
|
||||||
[quantization](./quantization_and_gguf.md)
|
[quantization](./quantization_and_gguf.md)
|
||||||
@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) {
|
|||||||
vae_decode_only = false;
|
vae_decode_only = false;
|
||||||
}
|
}
|
||||||
|
|
||||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview);
|
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview);
|
||||||
|
|
||||||
SDImageVec results;
|
SDImageVec results;
|
||||||
int num_results = 0;
|
int num_results = 0;
|
||||||
|
|||||||
@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() {
|
|||||||
&backend},
|
&backend},
|
||||||
{"",
|
{"",
|
||||||
"--params-backend",
|
"--params-backend",
|
||||||
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
|
"parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu",
|
||||||
¶ms_backend},
|
¶ms_backend},
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const {
|
|||||||
return oss.str();
|
return oss.str();
|
||||||
}
|
}
|
||||||
|
|
||||||
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) {
|
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) {
|
||||||
embedding_vec.clear();
|
embedding_vec.clear();
|
||||||
embedding_vec.reserve(embedding_map.size());
|
embedding_vec.reserve(embedding_map.size());
|
||||||
for (const auto& kv : embedding_map) {
|
for (const auto& kv : embedding_map) {
|
||||||
@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
|
|||||||
photo_maker_path.c_str(),
|
photo_maker_path.c_str(),
|
||||||
tensor_type_rules.c_str(),
|
tensor_type_rules.c_str(),
|
||||||
vae_decode_only,
|
vae_decode_only,
|
||||||
free_params_immediately,
|
|
||||||
n_threads,
|
n_threads,
|
||||||
wtype,
|
wtype,
|
||||||
rng_type,
|
rng_type,
|
||||||
|
|||||||
@ -179,7 +179,7 @@ struct SDContextParams {
|
|||||||
bool validate(SDMode mode);
|
bool validate(SDMode mode);
|
||||||
bool resolve_and_validate(SDMode mode);
|
bool resolve_and_validate(SDMode mode);
|
||||||
std::string to_string() const;
|
std::string to_string() const;
|
||||||
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview);
|
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview);
|
||||||
};
|
};
|
||||||
|
|
||||||
struct SDGenerationParams {
|
struct SDGenerationParams {
|
||||||
|
|||||||
@ -85,7 +85,7 @@ int main(int argc, const char** argv) {
|
|||||||
LOG_DEBUG("%s", ctx_params.to_string().c_str());
|
LOG_DEBUG("%s", ctx_params.to_string().c_str());
|
||||||
LOG_DEBUG("%s", default_gen_params.to_string().c_str());
|
LOG_DEBUG("%s", default_gen_params.to_string().c_str());
|
||||||
|
|
||||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
|
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false);
|
||||||
SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));
|
SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));
|
||||||
|
|
||||||
if (sd_ctx == nullptr) {
|
if (sd_ctx == nullptr) {
|
||||||
|
|||||||
@ -197,7 +197,6 @@ typedef struct {
|
|||||||
const char* photo_maker_path;
|
const char* photo_maker_path;
|
||||||
const char* tensor_type_rules;
|
const char* tensor_type_rules;
|
||||||
bool vae_decode_only;
|
bool vae_decode_only;
|
||||||
bool free_params_immediately;
|
|
||||||
int n_threads;
|
int n_threads;
|
||||||
enum sd_type_t wtype;
|
enum sd_type_t wtype;
|
||||||
enum rng_type_t rng_type;
|
enum rng_type_t rng_type;
|
||||||
|
|||||||
@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) {
|
|||||||
return lower.empty() || lower == "default" || lower == "auto";
|
return lower.empty() || lower == "default" || lower == "auto";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool is_disk_backend_token(const std::string& name) {
|
||||||
|
return lower_copy(trim_copy(name)) == "disk";
|
||||||
|
}
|
||||||
|
|
||||||
static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
|
static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
|
||||||
std::string name = lower_copy(trim_copy(raw_name));
|
std::string name = lower_copy(trim_copy(raw_name));
|
||||||
name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
|
name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
|
||||||
@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) {
|
|||||||
if (name.empty()) {
|
if (name.empty()) {
|
||||||
return runtime_backend(module);
|
return runtime_backend(module);
|
||||||
}
|
}
|
||||||
|
if (is_disk_backend_token(name)) {
|
||||||
|
return runtime_backend(module);
|
||||||
|
}
|
||||||
return init_cached_backend(name);
|
return init_cached_backend(name);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) {
|
|||||||
return sd_backend_is_cpu(params_backend(module));
|
return sd_backend_is_cpu(params_backend(module));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const {
|
||||||
|
return is_disk_backend_token(params_assignment_.get(module));
|
||||||
|
}
|
||||||
|
|
||||||
bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
|
bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
|
||||||
ggml_backend_t backend = runtime_backend(module);
|
ggml_backend_t backend = runtime_backend(module);
|
||||||
if (backend == nullptr) {
|
if (backend == nullptr) {
|
||||||
@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu
|
|||||||
|
|
||||||
bool SDBackendManager::init(const char* backend_spec,
|
bool SDBackendManager::init(const char* backend_spec,
|
||||||
const char* params_backend_spec,
|
const char* params_backend_spec,
|
||||||
bool offload_params_to_cpu,
|
|
||||||
bool keep_clip_on_cpu,
|
bool keep_clip_on_cpu,
|
||||||
bool keep_vae_on_cpu,
|
bool keep_vae_on_cpu,
|
||||||
bool keep_control_net_on_cpu,
|
bool keep_control_net_on_cpu,
|
||||||
@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (params_assignment_.empty() && offload_params_to_cpu) {
|
|
||||||
params_assignment_.set_default("cpu");
|
|
||||||
}
|
|
||||||
|
|
||||||
return validate(error);
|
return validate(error);
|
||||||
}
|
}
|
||||||
|
|
||||||
bool SDBackendManager::validate(std::string* error) const {
|
bool SDBackendManager::validate(std::string* error) const {
|
||||||
auto validate_name = [&](const std::string& name) -> bool {
|
auto validate_runtime_name = [&](const std::string& name) -> bool {
|
||||||
if (is_default_backend_token(name)) {
|
if (is_default_backend_token(name)) {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
if (is_disk_backend_token(name)) {
|
||||||
|
if (error != nullptr) {
|
||||||
|
*error = "backend 'disk' is only supported by params_backend";
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
if (!sd_resolve_backend_name(name).empty()) {
|
if (!sd_resolve_backend_name(name).empty()) {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const {
|
|||||||
}
|
}
|
||||||
return false;
|
return false;
|
||||||
};
|
};
|
||||||
|
auto validate_params_name = [&](const std::string& name) -> bool {
|
||||||
|
if (is_disk_backend_token(name)) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return validate_runtime_name(name);
|
||||||
|
};
|
||||||
|
|
||||||
if (!validate_name(runtime_assignment_.default_name) ||
|
if (!validate_runtime_name(runtime_assignment_.default_name) ||
|
||||||
!validate_name(params_assignment_.default_name)) {
|
!validate_params_name(params_assignment_.default_name)) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
for (const auto& kv : runtime_assignment_.module_names) {
|
for (const auto& kv : runtime_assignment_.module_names) {
|
||||||
if (!validate_name(kv.second)) {
|
if (!validate_runtime_name(kv.second)) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
for (const auto& kv : params_assignment_.module_names) {
|
for (const auto& kv : params_assignment_.module_names) {
|
||||||
if (!validate_name(kv.second)) {
|
if (!validate_params_name(kv.second)) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -51,7 +51,6 @@ public:
|
|||||||
|
|
||||||
bool init(const char* backend_spec,
|
bool init(const char* backend_spec,
|
||||||
const char* params_backend_spec,
|
const char* params_backend_spec,
|
||||||
bool offload_params_to_cpu,
|
|
||||||
bool keep_clip_on_cpu,
|
bool keep_clip_on_cpu,
|
||||||
bool keep_vae_on_cpu,
|
bool keep_vae_on_cpu,
|
||||||
bool keep_control_net_on_cpu,
|
bool keep_control_net_on_cpu,
|
||||||
@ -63,6 +62,7 @@ public:
|
|||||||
|
|
||||||
bool runtime_backend_is_cpu(SDBackendModule module);
|
bool runtime_backend_is_cpu(SDBackendModule module);
|
||||||
bool params_backend_is_cpu(SDBackendModule module);
|
bool params_backend_is_cpu(SDBackendModule module);
|
||||||
|
bool params_backend_is_disk(SDBackendModule module) const;
|
||||||
bool runtime_backend_supports_host_buffer(SDBackendModule module);
|
bool runtime_backend_supports_host_buffer(SDBackendModule module);
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
|||||||
@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner {
|
|||||||
if (model_manager == nullptr ||
|
if (model_manager == nullptr ||
|
||||||
!model_manager->register_param_tensors("LoRA",
|
!model_manager->register_param_tensors("LoRA",
|
||||||
std::move(tensors),
|
std::move(tensors),
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
runtime_backend,
|
runtime_backend,
|
||||||
params_backend) ||
|
params_backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner {
|
|||||||
model_loader.load_tensors(on_new_tensor_cb);
|
model_loader.load_tensors(on_new_tensor_cb);
|
||||||
if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
|
if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
|
||||||
tensors,
|
tensors,
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
runtime_backend,
|
runtime_backend,
|
||||||
params_backend) ||
|
params_backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner {
|
|||||||
manager->set_n_threads(n_threads);
|
manager->set_n_threads(n_threads);
|
||||||
if (!manager->register_param_tensors("ControlNet",
|
if (!manager->register_param_tensors("ControlNet",
|
||||||
std::move(tensors),
|
std::move(tensors),
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
runtime_backend,
|
runtime_backend,
|
||||||
params_backend) ||
|
params_backend) ||
|
||||||
!manager->validate_registered_tensors()) {
|
!manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1609,7 +1609,7 @@ namespace Flux {
|
|||||||
if (!model_manager->register_runner_params("Flux test",
|
if (!model_manager->register_runner_params("Flux test",
|
||||||
*flux,
|
*flux,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -2048,7 +2048,7 @@ namespace LTXV {
|
|||||||
if (!model_manager->register_runner_params("LTXAV test",
|
if (!model_manager->register_runner_params("LTXAV test",
|
||||||
*ltxav,
|
*ltxav,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner {
|
|||||||
if (!model_manager->register_runner_params("MMDiT test",
|
if (!model_manager->register_runner_params("MMDiT test",
|
||||||
*mmdit,
|
*mmdit,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -715,7 +715,7 @@ namespace Qwen {
|
|||||||
if (!model_manager->register_runner_params("Qwen image test",
|
if (!model_manager->register_runner_params("Qwen image test",
|
||||||
*qwen_image,
|
*qwen_image,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1040,7 +1040,7 @@ namespace WAN {
|
|||||||
if (!model_manager->register_runner_params("Wan test",
|
if (!model_manager->register_runner_params("Wan test",
|
||||||
*wan,
|
*wan,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -723,7 +723,7 @@ namespace ZImage {
|
|||||||
if (!model_manager->register_runner_params("ZImage test",
|
if (!model_manager->register_runner_params("ZImage test",
|
||||||
*z_image,
|
*z_image,
|
||||||
"model.diffusion_model",
|
"model.diffusion_model",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -2084,7 +2084,7 @@ namespace LLM {
|
|||||||
if (!model_manager->register_runner_params("LLM test",
|
if (!model_manager->register_runner_params("LLM test",
|
||||||
*llm,
|
*llm,
|
||||||
"text_encoders.llm",
|
"text_encoders.llm",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -592,7 +592,7 @@ struct T5Embedder {
|
|||||||
if (!model_manager->register_runner_params("T5 test",
|
if (!model_manager->register_runner_params("T5 test",
|
||||||
*t5,
|
*t5,
|
||||||
"",
|
"",
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1082,7 +1082,7 @@ namespace LTXV {
|
|||||||
|
|
||||||
if (!model_manager->register_runner_params("LTX audio VAE test",
|
if (!model_manager->register_runner_params("LTX audio VAE test",
|
||||||
*ltx_audio_vae,
|
*ltx_audio_vae,
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1538,7 +1538,7 @@ struct LTXVideoVAE : public VAE {
|
|||||||
|
|
||||||
if (!model_manager->register_runner_params("LTX VAE test",
|
if (!model_manager->register_runner_params("LTX VAE test",
|
||||||
*vae,
|
*vae,
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -1340,7 +1340,7 @@ namespace WAN {
|
|||||||
|
|
||||||
if (!model_manager->register_runner_params("Wan VAE test",
|
if (!model_manager->register_runner_params("Wan VAE test",
|
||||||
*vae,
|
*vae,
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend,
|
backend,
|
||||||
backend) ||
|
backend) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -492,7 +492,7 @@ bool ModelManager::mmap_params(const std::vector<TensorState*>& states,
|
|||||||
}
|
}
|
||||||
|
|
||||||
bool ModelManager::can_mmap_storage(const TensorState& state) const {
|
bool ModelManager::can_mmap_storage(const TensorState& state) const {
|
||||||
if (!enable_mmap_ || state.residency_mode != ResidencyMode::Resident) {
|
if (!enable_mmap_ || state.residency_mode != ResidencyMode::ParamBackend) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
if (state.compute_backend == nullptr || state.params_backend == nullptr) {
|
if (state.compute_backend == nullptr || state.params_backend == nullptr) {
|
||||||
|
|||||||
@ -16,7 +16,7 @@ class ModelManager : public RunnerWeightManager {
|
|||||||
public:
|
public:
|
||||||
enum class ResidencyMode {
|
enum class ResidencyMode {
|
||||||
Disk,
|
Disk,
|
||||||
Resident,
|
ParamBackend,
|
||||||
};
|
};
|
||||||
|
|
||||||
struct LoraSpec {
|
struct LoraSpec {
|
||||||
@ -33,7 +33,7 @@ private:
|
|||||||
ggml_tensor* tensor = nullptr;
|
ggml_tensor* tensor = nullptr;
|
||||||
std::string desc;
|
std::string desc;
|
||||||
|
|
||||||
ResidencyMode residency_mode = ResidencyMode::Resident;
|
ResidencyMode residency_mode = ResidencyMode::ParamBackend;
|
||||||
ggml_backend_t compute_backend = nullptr;
|
ggml_backend_t compute_backend = nullptr;
|
||||||
ggml_backend_t params_backend = nullptr;
|
ggml_backend_t params_backend = nullptr;
|
||||||
bool metadata_validated = false;
|
bool metadata_validated = false;
|
||||||
|
|||||||
@ -165,7 +165,6 @@ public:
|
|||||||
SDVersion version;
|
SDVersion version;
|
||||||
bool vae_decode_only = false;
|
bool vae_decode_only = false;
|
||||||
bool external_vae_is_invalid = false;
|
bool external_vae_is_invalid = false;
|
||||||
bool free_params_immediately = false;
|
|
||||||
|
|
||||||
bool circular_x = false;
|
bool circular_x = false;
|
||||||
bool circular_y = false;
|
bool circular_y = false;
|
||||||
@ -246,7 +245,7 @@ public:
|
|||||||
}
|
}
|
||||||
return model_manager->register_param_tensors(desc,
|
return model_manager->register_param_tensors(desc,
|
||||||
std::move(group_tensors),
|
std::move(group_tensors),
|
||||||
free_params_immediately ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::Resident,
|
backend_manager.params_backend_is_disk(module) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend_for(module),
|
backend_for(module),
|
||||||
params_backend_for(module),
|
params_backend_for(module),
|
||||||
params_mem_size);
|
params_mem_size);
|
||||||
@ -255,8 +254,7 @@ public:
|
|||||||
bool init_backend(const sd_ctx_params_t* sd_ctx_params) {
|
bool init_backend(const sd_ctx_params_t* sd_ctx_params) {
|
||||||
std::string error;
|
std::string error;
|
||||||
if (!backend_manager.init(sd_ctx_params->backend,
|
if (!backend_manager.init(sd_ctx_params->backend,
|
||||||
sd_ctx_params->params_backend,
|
params_backend_spec.c_str(),
|
||||||
offload_params_to_cpu,
|
|
||||||
sd_ctx_params->keep_clip_on_cpu,
|
sd_ctx_params->keep_clip_on_cpu,
|
||||||
sd_ctx_params->keep_vae_on_cpu,
|
sd_ctx_params->keep_vae_on_cpu,
|
||||||
sd_ctx_params->keep_control_net_on_cpu,
|
sd_ctx_params->keep_control_net_on_cpu,
|
||||||
@ -319,24 +317,21 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
bool init(const sd_ctx_params_t* sd_ctx_params) {
|
bool init(const sd_ctx_params_t* sd_ctx_params) {
|
||||||
n_threads = sd_ctx_params->n_threads;
|
n_threads = sd_ctx_params->n_threads;
|
||||||
vae_decode_only = sd_ctx_params->vae_decode_only;
|
vae_decode_only = sd_ctx_params->vae_decode_only;
|
||||||
free_params_immediately = sd_ctx_params->free_params_immediately;
|
offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
|
||||||
offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
|
enable_mmap = sd_ctx_params->enable_mmap;
|
||||||
enable_mmap = sd_ctx_params->enable_mmap;
|
max_vram = sd_ctx_params->max_vram;
|
||||||
max_vram = sd_ctx_params->max_vram;
|
stream_layers = sd_ctx_params->stream_layers;
|
||||||
stream_layers = sd_ctx_params->stream_layers;
|
backend_spec = SAFE_STR(sd_ctx_params->backend);
|
||||||
backend_spec = SAFE_STR(sd_ctx_params->backend);
|
params_backend_spec = SAFE_STR(sd_ctx_params->params_backend);
|
||||||
params_backend_spec = SAFE_STR(sd_ctx_params->params_backend);
|
if (offload_params_to_cpu) {
|
||||||
|
params_backend_spec = params_backend_spec.empty() ? "*=cpu" : "*=cpu," + params_backend_spec;
|
||||||
|
}
|
||||||
if (stream_layers && max_vram == 0.f) {
|
if (stream_layers && max_vram == 0.f) {
|
||||||
LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring");
|
LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring");
|
||||||
stream_layers = false;
|
stream_layers = false;
|
||||||
}
|
}
|
||||||
if (stream_layers && !offload_params_to_cpu && params_backend_spec.empty()) {
|
|
||||||
// Streaming needs CPU-resident params.
|
|
||||||
LOG_WARN("--stream-layers has no effect without --offload-to-cpu (or --params-backend); ignoring");
|
|
||||||
stream_layers = false;
|
|
||||||
}
|
|
||||||
|
|
||||||
bool use_tae = false;
|
bool use_tae = false;
|
||||||
bool use_audio_vae = false;
|
bool use_audio_vae = false;
|
||||||
@ -354,6 +349,10 @@ public:
|
|||||||
if (!init_backend(sd_ctx_params)) {
|
if (!init_backend(sd_ctx_params)) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
if (stream_layers && !backend_manager.params_backend_is_cpu(SDBackendModule::DIFFUSION)) {
|
||||||
|
LOG_WARN("--stream-layers has no effect unless diffusion params backend is cpu; ignoring");
|
||||||
|
stream_layers = false;
|
||||||
|
}
|
||||||
max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION));
|
max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION));
|
||||||
|
|
||||||
model_manager = std::make_shared<ModelManager>();
|
model_manager = std::make_shared<ModelManager>();
|
||||||
@ -2644,7 +2643,6 @@ void sd_hires_params_init(sd_hires_params_t* hires_params) {
|
|||||||
void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) {
|
void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) {
|
||||||
*sd_ctx_params = {};
|
*sd_ctx_params = {};
|
||||||
sd_ctx_params->vae_decode_only = true;
|
sd_ctx_params->vae_decode_only = true;
|
||||||
sd_ctx_params->free_params_immediately = true;
|
|
||||||
sd_ctx_params->n_threads = sd_get_num_physical_cores();
|
sd_ctx_params->n_threads = sd_get_num_physical_cores();
|
||||||
sd_ctx_params->wtype = SD_TYPE_COUNT;
|
sd_ctx_params->wtype = SD_TYPE_COUNT;
|
||||||
sd_ctx_params->rng_type = CUDA_RNG;
|
sd_ctx_params->rng_type = CUDA_RNG;
|
||||||
@ -2694,7 +2692,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
|
|||||||
"photo_maker_path: %s\n"
|
"photo_maker_path: %s\n"
|
||||||
"tensor_type_rules: %s\n"
|
"tensor_type_rules: %s\n"
|
||||||
"vae_decode_only: %s\n"
|
"vae_decode_only: %s\n"
|
||||||
"free_params_immediately: %s\n"
|
|
||||||
"n_threads: %d\n"
|
"n_threads: %d\n"
|
||||||
"wtype: %s\n"
|
"wtype: %s\n"
|
||||||
"rng_type: %s\n"
|
"rng_type: %s\n"
|
||||||
@ -2734,7 +2731,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
|
|||||||
SAFE_STR(sd_ctx_params->photo_maker_path),
|
SAFE_STR(sd_ctx_params->photo_maker_path),
|
||||||
SAFE_STR(sd_ctx_params->tensor_type_rules),
|
SAFE_STR(sd_ctx_params->tensor_type_rules),
|
||||||
BOOL_STR(sd_ctx_params->vae_decode_only),
|
BOOL_STR(sd_ctx_params->vae_decode_only),
|
||||||
BOOL_STR(sd_ctx_params->free_params_immediately),
|
|
||||||
sd_ctx_params->n_threads,
|
sd_ctx_params->n_threads,
|
||||||
sd_type_name(sd_ctx_params->wtype),
|
sd_type_name(sd_ctx_params->wtype),
|
||||||
sd_rng_type_name(sd_ctx_params->rng_type),
|
sd_rng_type_name(sd_ctx_params->rng_type),
|
||||||
@ -5037,7 +5033,7 @@ static sd::Tensor<float> upscale_ltx_spatial_video_latent(sd_ctx_t* sd_ctx,
|
|||||||
upsampler->get_param_tensors(tensors);
|
upsampler->get_param_tensors(tensors);
|
||||||
if (!upsampler_manager->register_param_tensors("LTX latent upsampler",
|
if (!upsampler_manager->register_param_tensors("LTX latent upsampler",
|
||||||
std::move(tensors),
|
std::move(tensors),
|
||||||
ModelManager::ResidencyMode::Resident,
|
ModelManager::ResidencyMode::ParamBackend,
|
||||||
sd_ctx->sd->backend_for(SDBackendModule::UPSCALER),
|
sd_ctx->sd->backend_for(SDBackendModule::UPSCALER),
|
||||||
sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) ||
|
sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) ||
|
||||||
!upsampler_manager->validate_registered_tensors()) {
|
!upsampler_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
@ -43,10 +43,13 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
|
|||||||
int n_threads) {
|
int n_threads) {
|
||||||
ggml_log_set(ggml_log_callback_default, nullptr);
|
ggml_log_set(ggml_log_callback_default, nullptr);
|
||||||
|
|
||||||
|
std::string effective_params_backend_spec = params_backend_spec;
|
||||||
|
if (offload_params_to_cpu) {
|
||||||
|
effective_params_backend_spec = effective_params_backend_spec.empty() ? "*=cpu" : "*=cpu," + effective_params_backend_spec;
|
||||||
|
}
|
||||||
std::string error;
|
std::string error;
|
||||||
if (!backend_manager.init(backend_spec.c_str(),
|
if (!backend_manager.init(backend_spec.c_str(),
|
||||||
params_backend_spec.c_str(),
|
effective_params_backend_spec.c_str(),
|
||||||
offload_params_to_cpu,
|
|
||||||
false,
|
false,
|
||||||
false,
|
false,
|
||||||
false,
|
false,
|
||||||
@ -106,7 +109,7 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
|
|||||||
esrgan_upscaler->get_param_tensors(tensors);
|
esrgan_upscaler->get_param_tensors(tensors);
|
||||||
if (!model_manager->register_param_tensors("ESRGAN",
|
if (!model_manager->register_param_tensors("ESRGAN",
|
||||||
std::move(tensors),
|
std::move(tensors),
|
||||||
ModelManager::ResidencyMode::Resident,
|
backend_manager.params_backend_is_disk(SDBackendModule::UPSCALER) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
|
||||||
backend_for(SDBackendModule::UPSCALER),
|
backend_for(SDBackendModule::UPSCALER),
|
||||||
params_backend_for(SDBackendModule::UPSCALER)) ||
|
params_backend_for(SDBackendModule::UPSCALER)) ||
|
||||||
!model_manager->validate_registered_tensors()) {
|
!model_manager->validate_registered_tensors()) {
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user