153 lines
5.9 KiB
Markdown

# Backend selection
`stable-diffusion.cpp` has two backend assignments:
- `--backend` selects the runtime backend used to execute model graphs.
- `--params-backend` selects where model parameters are kept.
If `--params-backend` is not set, parameters use the same backend as their module runtime backend.
## Syntax
A backend assignment can be a single backend name:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cpu
```
This applies to every module that does not have a more specific assignment.
Assignments can also target individual modules:
```shell
sd-cli -m model.safetensors -p "a cat" --backend te=cpu,vae=cuda0,diffusion=vulkan0
```
The same syntax is used for parameter placement:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend te=cpu,vae=cpu
```
`--params-backend` also accepts the special value `disk`:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```
`--max-vram` can target resolved backend/device names:
```shell
sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,vae=vulkan0 --max-vram cuda0=6,vulkan0=2
```
The budget applies to every module running on that backend.
Module names are case-insensitive. Hyphens and underscores in module names are ignored, so `clip_vision`, `clip-vision`, and `clipvision` are equivalent.
`all=`, `default=`, and `*=` can be used to set the default backend inside a mixed assignment:
```shell
sd-cli -m model.safetensors -p "a cat" --backend all=cuda0,te=cpu
```
## Modules
| Module | Purpose | Accepted names |
| --- | --- | --- |
| `diffusion` | UNet, DiT, MMDiT, Flux, Wan, Qwen Image, and other diffusion models | `diffusion`, `model`, `unet`, `dit` |
| `te` | Text encoders and conditioners | `te`, `clip`, `text`, `textencoder`, `textencoders`, `conditioner`, `cond`, `llm`, `t5`, `t5xxl` |
| `clip_vision` | CLIP vision encoder | `clip_vision`, `clipvision`, `clip-vision`, `vision` |
| `vae` | VAE and TAE | `vae`, `firststage`, `autoencoder`, `tae` |
| `controlnet` | ControlNet | `controlnet`, `control` |
| `photomaker` | PhotoMaker ID encoder and PhotoMaker LoRA | `photomaker`, `photomakerid`, `pmid`, `photo` |
| `upscaler` | ESRGAN upscaler | `upscaler`, `esrgan`, `hires` |
`te` is the preferred module name for text encoders. `clip` is kept as an accepted alias because many existing commands and model names use CLIP terminology.
## Backend names
Backend names are resolved against the GGML backend device list. Matching is case-insensitive and accepts exact names or unique prefixes, so common values include names such as:
- `cpu`
- `cuda0`
- `vulkan0`
- `metal`
The special values `auto`, `default`, and an empty backend name select the default backend. The default preference is GPU, then integrated GPU, then CPU.
The special value `gpu` selects the first GPU backend, falling back to the first integrated GPU backend.
The special value `disk` is accepted only by `--params-backend`. `--backend disk` is invalid because `disk` is a parameter residency mode, not a runtime compute backend.
## Runtime backend vs. parameter backend
The runtime backend controls where graph execution runs. The parameter backend controls where model weights are allocated or whether they are reloaded from disk on demand.
For example:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend cpu
```
This runs all modules on `cuda0`, but stores parameters in CPU RAM. During execution, parameters are moved to the runtime backend as needed.
For example:
```shell
sd-cli -m model.safetensors -p "a cat" --backend cuda0 --params-backend disk
```
This runs all modules on `cuda0`, reloads parameters from the model file as needed, and releases those parameter buffers after use.
`disk` is never selected implicitly. If `--params-backend` is not set, parameters use the runtime backend.
Per-module assignments can be mixed:
```shell
sd-cli -m model.safetensors -p "a cat" --backend diffusion=cuda0,te=cpu,vae=cpu --params-backend diffusion=cuda0,te=cpu,vae=cpu
```
This keeps text encoding and VAE execution on CPU while the diffusion model runs on GPU.
## Backend sharing and lifetime
Backends are managed by `SDBackendManager`.
Within one manager, backend instances are cached by resolved backend device name. If multiple modules request the same backend, they share the same `ggml_backend_t`.
For example:
```shell
--backend te=cpu,vae=cpu
```
uses one shared CPU backend for both `te` and `vae` runtime execution.
Runtime and parameter assignments also share the same backend cache. If `--backend diffusion=cuda0` and `--params-backend diffusion=cuda0` resolve to the same device, both use the same backend instance.
`--params-backend disk` does not create a separate backend instance. Parameters are loaded lazily using the module runtime backend.
`SDBackendManager` owns the backend instances and frees them when the context or upscaler is destroyed. Model runners receive non-owning runtime and parameter backend pointers and do not free them.
## Compatibility flags
The example CLI/server still accepts these older CPU placement flags as compatibility aliases:
- `--clip-on-cpu`
- `--vae-on-cpu`
- `--control-net-cpu`
- `--offload-to-cpu`
`--clip-on-cpu`, `--vae-on-cpu`, and `--control-net-cpu` are deprecated. The example argument layer prepends `te=cpu`, `vae=cpu`, and `controlnet=cpu` to `--backend` before creating the context.
`--offload-to-cpu` prepends a CPU default to the parameter assignment in the caller before creating the context:
```shell
--params-backend '*=cpu'
```
Because this default is inserted first, later explicit `--params-backend` entries can still override it, for example `--offload-to-cpu --params-backend te=disk` keeps non-TE parameters on CPU and reloads TE parameters from disk.
Library callers should set `backend` and `params_backend` directly. The old CPU/offload fields are no longer part of the C API. Explicit `--backend` and `--params-backend` assignments are preferred for new commands.