mirror of
https://github.com/leejet/stable-diffusion.cpp.git
synced 2026-06-19 04:37:18 +00:00
feat: support disk params backend (#1651)
This commit is contained in:
parent
276025e054
commit
bdb431ad95
@ -3,7 +3,7 @@
|
||||
`stable-diffusion.cpp` has two backend assignments:
|
||||
|
||||
- `--backend` selects the runtime backend used to execute model graphs.
|
||||
- `--params-backend` selects the backend used to allocate model parameters.
|
||||
- `--params-backend` selects where model parameters are kept.
|
||||
|
||||
If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
|
||||
|
||||
@ -29,6 +29,12 @@ The same syntax is used for parameter placement:
|
||||
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
|
||||
```
|
||||
|
||||
`--params-backend` also accepts the special value `disk`:
|
||||
|
||||
```shell
|
||||
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
|
||||
```
|
||||
|
||||
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
|
||||
|
||||
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
|
||||
@ -64,9 +70,11 @@ The special values `auto`, `default`, and an empty backend name select the defau
|
||||
|
||||
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
|
||||
|
||||
The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.
|
||||
|
||||
## Runtime backend vs. parameter backend
|
||||
|
||||
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated.
|
||||
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.
|
||||
|
||||
For example:
|
||||
|
||||
@ -76,6 +84,16 @@ sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
|
||||
|
||||
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
|
||||
|
||||
For example:
|
||||
|
||||
```shell
|
||||
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
|
||||
```
|
||||
|
||||
This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.
|
||||
|
||||
`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.
|
||||
|
||||
Per-module assignments can be mixed:
|
||||
|
||||
```shell
|
||||
@ -100,6 +118,8 @@ uses one shared CPU backend for both `te` and `vae` runtime execution.
|
||||
|
||||
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
|
||||
|
||||
`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.
|
||||
|
||||
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
|
||||
|
||||
## Compatibility flags
|
||||
@ -113,10 +133,12 @@ The older CPU placement flags are still supported:
|
||||
|
||||
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` affect runtime backend assignment only when `--backend` is not set. They map to `te=cpu`, `vae=cpu`, and `controlnet=cpu`.
|
||||
|
||||
`--offload-to-cpu` affects parameter backend assignment only when `--params-backend` is not set. It is equivalent to:
|
||||
`--offload-to-cpu` prepends a CPU default to the parameter assignment before parsing:
|
||||
|
||||
```shell
|
||||
--params-backend cpu
|
||||
--params-backend '*=cpu'
|
||||
```
|
||||
|
||||
Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.
|
||||
|
||||
Explicit `--backend` and `--params-backend` assignments are preferred for new commands.
|
||||
|
||||
@ -21,6 +21,38 @@ and the compute buffer shrink in the debug log:
|
||||
|
||||
Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.
|
||||
|
||||
## Use params backend to reduce VRAM or RAM usage.
|
||||
|
||||
`--params-backend` controls where model parameters are kept. If it is not set, parameters use the same backend as `--backend`, so a GPU runtime backend also keeps parameters in VRAM.
|
||||
|
||||
Use CPU params to reduce VRAM usage:
|
||||
|
||||
```shell
|
||||
--backend cuda0 --params-backend cpu
|
||||
```
|
||||
|
||||
This keeps model weights in system RAM and moves them to the runtime backend when needed. `--offload-to-cpu` is a compatibility shortcut that prepends `*=cpu` to `--params-backend`, so explicit module assignments can still override it:
|
||||
|
||||
```shell
|
||||
--offload-to-cpu --params-backend te=disk
|
||||
```
|
||||
|
||||
Use disk params to reduce both VRAM and RAM usage:
|
||||
|
||||
```shell
|
||||
--backend cuda0 --params-backend disk
|
||||
```
|
||||
|
||||
This reloads parameters from the model file on demand and releases them after use. It has the lowest memory residency, but can be slower because weights must be read again. `disk` is never selected implicitly; set it explicitly when RAM usage matters more than reload cost.
|
||||
|
||||
Per-module assignments can target only the largest modules:
|
||||
|
||||
```shell
|
||||
--backend cuda0 --params-backend diffusion=disk,te=cpu,vae=cpu
|
||||
```
|
||||
|
||||
See [backend selection](./backend.md) for full syntax.
|
||||
|
||||
## Use quantization to reduce memory usage.
|
||||
|
||||
[quantization](./quantization_and_gguf.md)
|
||||
[quantization](./quantization_and_gguf.md)
|
||||
|
||||
@ -746,7 +746,7 @@ int main(int argc, const char* argv[]) {
|
||||
vae_decode_only = false;
|
||||
}
|
||||
|
||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, true, cli_params.taesd_preview);
|
||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(vae_decode_only, cli_params.taesd_preview);
|
||||
|
||||
SDImageVec results;
|
||||
int num_results = 0;
|
||||
|
||||
@ -421,7 +421,7 @@ ArgOptions SDContextParams::get_options() {
|
||||
&backend},
|
||||
{"",
|
||||
"--params-backend",
|
||||
"parameter backend assignment, e.g. cpu or diffusion=cpu,clip=cpu",
|
||||
"parameter backend assignment, e.g. disk, cpu, or diffusion=disk,clip=cpu",
|
||||
¶ms_backend},
|
||||
};
|
||||
|
||||
@ -757,7 +757,7 @@ std::string SDContextParams::to_string() const {
|
||||
return oss.str();
|
||||
}
|
||||
|
||||
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview) {
|
||||
sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview) {
|
||||
embedding_vec.clear();
|
||||
embedding_vec.reserve(embedding_map.size());
|
||||
for (const auto& kv : embedding_map) {
|
||||
@ -788,7 +788,6 @@ sd_ctx_params_t SDContextParams::to_sd_ctx_params_t(bool vae_decode_only, bool f
|
||||
photo_maker_path.c_str(),
|
||||
tensor_type_rules.c_str(),
|
||||
vae_decode_only,
|
||||
free_params_immediately,
|
||||
n_threads,
|
||||
wtype,
|
||||
rng_type,
|
||||
|
||||
@ -179,7 +179,7 @@ struct SDContextParams {
|
||||
bool validate(SDMode mode);
|
||||
bool resolve_and_validate(SDMode mode);
|
||||
std::string to_string() const;
|
||||
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool free_params_immediately, bool taesd_preview);
|
||||
sd_ctx_params_t to_sd_ctx_params_t(bool vae_decode_only, bool taesd_preview);
|
||||
};
|
||||
|
||||
struct SDGenerationParams {
|
||||
|
||||
@ -85,7 +85,7 @@ int main(int argc, const char** argv) {
|
||||
LOG_DEBUG("%s", ctx_params.to_string().c_str());
|
||||
LOG_DEBUG("%s", default_gen_params.to_string().c_str());
|
||||
|
||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
|
||||
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false);
|
||||
SDCtxPtr sd_ctx(new_sd_ctx(&sd_ctx_params));
|
||||
|
||||
if (sd_ctx == nullptr) {
|
||||
|
||||
@ -197,7 +197,6 @@ typedef struct {
|
||||
const char* photo_maker_path;
|
||||
const char* tensor_type_rules;
|
||||
bool vae_decode_only;
|
||||
bool free_params_immediately;
|
||||
int n_threads;
|
||||
enum sd_type_t wtype;
|
||||
enum rng_type_t rng_type;
|
||||
|
||||
@ -45,6 +45,10 @@ static bool is_default_backend_token(const std::string& name) {
|
||||
return lower.empty() || lower == "default" || lower == "auto";
|
||||
}
|
||||
|
||||
static bool is_disk_backend_token(const std::string& name) {
|
||||
return lower_copy(trim_copy(name)) == "disk";
|
||||
}
|
||||
|
||||
static bool parse_backend_module(const std::string& raw_name, SDBackendModule* module) {
|
||||
std::string name = lower_copy(trim_copy(raw_name));
|
||||
name.erase(std::remove(name.begin(), name.end(), '-'), name.end());
|
||||
@ -504,6 +508,9 @@ ggml_backend_t SDBackendManager::params_backend(SDBackendModule module) {
|
||||
if (name.empty()) {
|
||||
return runtime_backend(module);
|
||||
}
|
||||
if (is_disk_backend_token(name)) {
|
||||
return runtime_backend(module);
|
||||
}
|
||||
return init_cached_backend(name);
|
||||
}
|
||||
|
||||
@ -515,6 +522,10 @@ bool SDBackendManager::params_backend_is_cpu(SDBackendModule module) {
|
||||
return sd_backend_is_cpu(params_backend(module));
|
||||
}
|
||||
|
||||
bool SDBackendManager::params_backend_is_disk(SDBackendModule module) const {
|
||||
return is_disk_backend_token(params_assignment_.get(module));
|
||||
}
|
||||
|
||||
bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule module) {
|
||||
ggml_backend_t backend = runtime_backend(module);
|
||||
if (backend == nullptr) {
|
||||
@ -534,7 +545,6 @@ bool SDBackendManager::runtime_backend_supports_host_buffer(SDBackendModule modu
|
||||
|
||||
bool SDBackendManager::init(const char* backend_spec,
|
||||
const char* params_backend_spec,
|
||||
bool offload_params_to_cpu,
|
||||
bool keep_clip_on_cpu,
|
||||
bool keep_vae_on_cpu,
|
||||
bool keep_control_net_on_cpu,
|
||||
@ -560,18 +570,20 @@ bool SDBackendManager::init(const char* backend_spec,
|
||||
}
|
||||
}
|
||||
|
||||
if (params_assignment_.empty() && offload_params_to_cpu) {
|
||||
params_assignment_.set_default("cpu");
|
||||
}
|
||||
|
||||
return validate(error);
|
||||
}
|
||||
|
||||
bool SDBackendManager::validate(std::string* error) const {
|
||||
auto validate_name = [&](const std::string& name) -> bool {
|
||||
auto validate_runtime_name = [&](const std::string& name) -> bool {
|
||||
if (is_default_backend_token(name)) {
|
||||
return true;
|
||||
}
|
||||
if (is_disk_backend_token(name)) {
|
||||
if (error != nullptr) {
|
||||
*error = "backend 'disk' is only supported by params_backend";
|
||||
}
|
||||
return false;
|
||||
}
|
||||
if (!sd_resolve_backend_name(name).empty()) {
|
||||
return true;
|
||||
}
|
||||
@ -580,18 +592,24 @@ bool SDBackendManager::validate(std::string* error) const {
|
||||
}
|
||||
return false;
|
||||
};
|
||||
auto validate_params_name = [&](const std::string& name) -> bool {
|
||||
if (is_disk_backend_token(name)) {
|
||||
return true;
|
||||
}
|
||||
return validate_runtime_name(name);
|
||||
};
|
||||
|
||||
if (!validate_name(runtime_assignment_.default_name) ||
|
||||
!validate_name(params_assignment_.default_name)) {
|
||||
if (!validate_runtime_name(runtime_assignment_.default_name) ||
|
||||
!validate_params_name(params_assignment_.default_name)) {
|
||||
return false;
|
||||
}
|
||||
for (const auto& kv : runtime_assignment_.module_names) {
|
||||
if (!validate_name(kv.second)) {
|
||||
if (!validate_runtime_name(kv.second)) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
for (const auto& kv : params_assignment_.module_names) {
|
||||
if (!validate_name(kv.second)) {
|
||||
if (!validate_params_name(kv.second)) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
@ -51,7 +51,6 @@ public:
|
||||
|
||||
bool init(const char* backend_spec,
|
||||
const char* params_backend_spec,
|
||||
bool offload_params_to_cpu,
|
||||
bool keep_clip_on_cpu,
|
||||
bool keep_vae_on_cpu,
|
||||
bool keep_control_net_on_cpu,
|
||||
@ -63,6 +62,7 @@ public:
|
||||
|
||||
bool runtime_backend_is_cpu(SDBackendModule module);
|
||||
bool params_backend_is_cpu(SDBackendModule module);
|
||||
bool params_backend_is_disk(SDBackendModule module) const;
|
||||
bool runtime_backend_supports_host_buffer(SDBackendModule module);
|
||||
|
||||
private:
|
||||
|
||||
@ -101,7 +101,7 @@ struct LoraModel : public GGMLRunner {
|
||||
if (model_manager == nullptr ||
|
||||
!model_manager->register_param_tensors("LoRA",
|
||||
std::move(tensors),
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
runtime_backend,
|
||||
params_backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -622,7 +622,7 @@ struct PhotoMakerIDEmbed : public GGMLRunner {
|
||||
model_loader.load_tensors(on_new_tensor_cb);
|
||||
if (!model_manager->register_param_tensors("PhotoMaker ID embeds",
|
||||
tensors,
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
runtime_backend,
|
||||
params_backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -482,7 +482,7 @@ struct ControlNet : public GGMLRunner {
|
||||
manager->set_n_threads(n_threads);
|
||||
if (!manager->register_param_tensors("ControlNet",
|
||||
std::move(tensors),
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
runtime_backend,
|
||||
params_backend) ||
|
||||
!manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1609,7 +1609,7 @@ namespace Flux {
|
||||
if (!model_manager->register_runner_params("Flux test",
|
||||
*flux,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -2048,7 +2048,7 @@ namespace LTXV {
|
||||
if (!model_manager->register_runner_params("LTXAV test",
|
||||
*ltxav,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1015,7 +1015,7 @@ struct MMDiTRunner : public DiffusionModelRunner {
|
||||
if (!model_manager->register_runner_params("MMDiT test",
|
||||
*mmdit,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -715,7 +715,7 @@ namespace Qwen {
|
||||
if (!model_manager->register_runner_params("Qwen image test",
|
||||
*qwen_image,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1040,7 +1040,7 @@ namespace WAN {
|
||||
if (!model_manager->register_runner_params("Wan test",
|
||||
*wan,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -723,7 +723,7 @@ namespace ZImage {
|
||||
if (!model_manager->register_runner_params("ZImage test",
|
||||
*z_image,
|
||||
"model.diffusion_model",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -2084,7 +2084,7 @@ namespace LLM {
|
||||
if (!model_manager->register_runner_params("LLM test",
|
||||
*llm,
|
||||
"text_encoders.llm",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -592,7 +592,7 @@ struct T5Embedder {
|
||||
if (!model_manager->register_runner_params("T5 test",
|
||||
*t5,
|
||||
"",
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1082,7 +1082,7 @@ namespace LTXV {
|
||||
|
||||
if (!model_manager->register_runner_params("LTX audio VAE test",
|
||||
*ltx_audio_vae,
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1538,7 +1538,7 @@ struct LTXVideoVAE : public VAE {
|
||||
|
||||
if (!model_manager->register_runner_params("LTX VAE test",
|
||||
*vae,
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -1340,7 +1340,7 @@ namespace WAN {
|
||||
|
||||
if (!model_manager->register_runner_params("Wan VAE test",
|
||||
*vae,
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
backend,
|
||||
backend) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -492,7 +492,7 @@ bool ModelManager::mmap_params(const std::vector<TensorState*>& states,
|
||||
}
|
||||
|
||||
bool ModelManager::can_mmap_storage(const TensorState& state) const {
|
||||
if (!enable_mmap_ || state.residency_mode != ResidencyMode::Resident) {
|
||||
if (!enable_mmap_ || state.residency_mode != ResidencyMode::ParamBackend) {
|
||||
return false;
|
||||
}
|
||||
if (state.compute_backend == nullptr || state.params_backend == nullptr) {
|
||||
|
||||
@ -16,7 +16,7 @@ class ModelManager : public RunnerWeightManager {
|
||||
public:
|
||||
enum class ResidencyMode {
|
||||
Disk,
|
||||
Resident,
|
||||
ParamBackend,
|
||||
};
|
||||
|
||||
struct LoraSpec {
|
||||
@ -33,7 +33,7 @@ private:
|
||||
ggml_tensor* tensor = nullptr;
|
||||
std::string desc;
|
||||
|
||||
ResidencyMode residency_mode = ResidencyMode::Resident;
|
||||
ResidencyMode residency_mode = ResidencyMode::ParamBackend;
|
||||
ggml_backend_t compute_backend = nullptr;
|
||||
ggml_backend_t params_backend = nullptr;
|
||||
bool metadata_validated = false;
|
||||
|
||||
@ -165,7 +165,6 @@ public:
|
||||
SDVersion version;
|
||||
bool vae_decode_only = false;
|
||||
bool external_vae_is_invalid = false;
|
||||
bool free_params_immediately = false;
|
||||
|
||||
bool circular_x = false;
|
||||
bool circular_y = false;
|
||||
@ -246,7 +245,7 @@ public:
|
||||
}
|
||||
return model_manager->register_param_tensors(desc,
|
||||
std::move(group_tensors),
|
||||
free_params_immediately ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::Resident,
|
||||
backend_manager.params_backend_is_disk(module) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
|
||||
backend_for(module),
|
||||
params_backend_for(module),
|
||||
params_mem_size);
|
||||
@ -255,8 +254,7 @@ public:
|
||||
bool init_backend(const sd_ctx_params_t* sd_ctx_params) {
|
||||
std::string error;
|
||||
if (!backend_manager.init(sd_ctx_params->backend,
|
||||
sd_ctx_params->params_backend,
|
||||
offload_params_to_cpu,
|
||||
params_backend_spec.c_str(),
|
||||
sd_ctx_params->keep_clip_on_cpu,
|
||||
sd_ctx_params->keep_vae_on_cpu,
|
||||
sd_ctx_params->keep_control_net_on_cpu,
|
||||
@ -319,24 +317,21 @@ public:
|
||||
}
|
||||
|
||||
bool init(const sd_ctx_params_t* sd_ctx_params) {
|
||||
n_threads = sd_ctx_params->n_threads;
|
||||
vae_decode_only = sd_ctx_params->vae_decode_only;
|
||||
free_params_immediately = sd_ctx_params->free_params_immediately;
|
||||
offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
|
||||
enable_mmap = sd_ctx_params->enable_mmap;
|
||||
max_vram = sd_ctx_params->max_vram;
|
||||
stream_layers = sd_ctx_params->stream_layers;
|
||||
backend_spec = SAFE_STR(sd_ctx_params->backend);
|
||||
params_backend_spec = SAFE_STR(sd_ctx_params->params_backend);
|
||||
n_threads = sd_ctx_params->n_threads;
|
||||
vae_decode_only = sd_ctx_params->vae_decode_only;
|
||||
offload_params_to_cpu = sd_ctx_params->offload_params_to_cpu;
|
||||
enable_mmap = sd_ctx_params->enable_mmap;
|
||||
max_vram = sd_ctx_params->max_vram;
|
||||
stream_layers = sd_ctx_params->stream_layers;
|
||||
backend_spec = SAFE_STR(sd_ctx_params->backend);
|
||||
params_backend_spec = SAFE_STR(sd_ctx_params->params_backend);
|
||||
if (offload_params_to_cpu) {
|
||||
params_backend_spec = params_backend_spec.empty() ? "*=cpu" : "*=cpu," + params_backend_spec;
|
||||
}
|
||||
if (stream_layers && max_vram == 0.f) {
|
||||
LOG_WARN("--stream-layers has no effect without --max-vram set; ignoring");
|
||||
stream_layers = false;
|
||||
}
|
||||
if (stream_layers && !offload_params_to_cpu && params_backend_spec.empty()) {
|
||||
// Streaming needs CPU-resident params.
|
||||
LOG_WARN("--stream-layers has no effect without --offload-to-cpu (or --params-backend); ignoring");
|
||||
stream_layers = false;
|
||||
}
|
||||
|
||||
bool use_tae = false;
|
||||
bool use_audio_vae = false;
|
||||
@ -354,6 +349,10 @@ public:
|
||||
if (!init_backend(sd_ctx_params)) {
|
||||
return false;
|
||||
}
|
||||
if (stream_layers && !backend_manager.params_backend_is_cpu(SDBackendModule::DIFFUSION)) {
|
||||
LOG_WARN("--stream-layers has no effect unless diffusion params backend is cpu; ignoring");
|
||||
stream_layers = false;
|
||||
}
|
||||
max_vram = sd::ggml_graph_cut::resolve_max_vram_gib(max_vram, backend_for(SDBackendModule::DIFFUSION));
|
||||
|
||||
model_manager = std::make_shared<ModelManager>();
|
||||
@ -2644,7 +2643,6 @@ void sd_hires_params_init(sd_hires_params_t* hires_params) {
|
||||
void sd_ctx_params_init(sd_ctx_params_t* sd_ctx_params) {
|
||||
*sd_ctx_params = {};
|
||||
sd_ctx_params->vae_decode_only = true;
|
||||
sd_ctx_params->free_params_immediately = true;
|
||||
sd_ctx_params->n_threads = sd_get_num_physical_cores();
|
||||
sd_ctx_params->wtype = SD_TYPE_COUNT;
|
||||
sd_ctx_params->rng_type = CUDA_RNG;
|
||||
@ -2694,7 +2692,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
|
||||
"photo_maker_path: %s\n"
|
||||
"tensor_type_rules: %s\n"
|
||||
"vae_decode_only: %s\n"
|
||||
"free_params_immediately: %s\n"
|
||||
"n_threads: %d\n"
|
||||
"wtype: %s\n"
|
||||
"rng_type: %s\n"
|
||||
@ -2734,7 +2731,6 @@ char* sd_ctx_params_to_str(const sd_ctx_params_t* sd_ctx_params) {
|
||||
SAFE_STR(sd_ctx_params->photo_maker_path),
|
||||
SAFE_STR(sd_ctx_params->tensor_type_rules),
|
||||
BOOL_STR(sd_ctx_params->vae_decode_only),
|
||||
BOOL_STR(sd_ctx_params->free_params_immediately),
|
||||
sd_ctx_params->n_threads,
|
||||
sd_type_name(sd_ctx_params->wtype),
|
||||
sd_rng_type_name(sd_ctx_params->rng_type),
|
||||
@ -5037,7 +5033,7 @@ static sd::Tensor<float> upscale_ltx_spatial_video_latent(sd_ctx_t* sd_ctx,
|
||||
upsampler->get_param_tensors(tensors);
|
||||
if (!upsampler_manager->register_param_tensors("LTX latent upsampler",
|
||||
std::move(tensors),
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
ModelManager::ResidencyMode::ParamBackend,
|
||||
sd_ctx->sd->backend_for(SDBackendModule::UPSCALER),
|
||||
sd_ctx->sd->params_backend_for(SDBackendModule::UPSCALER)) ||
|
||||
!upsampler_manager->validate_registered_tensors()) {
|
||||
|
||||
@ -43,10 +43,13 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
|
||||
int n_threads) {
|
||||
ggml_log_set(ggml_log_callback_default, nullptr);
|
||||
|
||||
std::string effective_params_backend_spec = params_backend_spec;
|
||||
if (offload_params_to_cpu) {
|
||||
effective_params_backend_spec = effective_params_backend_spec.empty() ? "*=cpu" : "*=cpu," + effective_params_backend_spec;
|
||||
}
|
||||
std::string error;
|
||||
if (!backend_manager.init(backend_spec.c_str(),
|
||||
params_backend_spec.c_str(),
|
||||
offload_params_to_cpu,
|
||||
effective_params_backend_spec.c_str(),
|
||||
false,
|
||||
false,
|
||||
false,
|
||||
@ -106,7 +109,7 @@ bool UpscalerGGML::load_from_file(const std::string& esrgan_path,
|
||||
esrgan_upscaler->get_param_tensors(tensors);
|
||||
if (!model_manager->register_param_tensors("ESRGAN",
|
||||
std::move(tensors),
|
||||
ModelManager::ResidencyMode::Resident,
|
||||
backend_manager.params_backend_is_disk(SDBackendModule::UPSCALER) ? ModelManager::ResidencyMode::Disk : ModelManager::ResidencyMode::ParamBackend,
|
||||
backend_for(SDBackendModule::UPSCALER),
|
||||
params_backend_for(SDBackendModule::UPSCALER)) ||
|
||||
!model_manager->validate_registered_tensors()) {
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user