mirror of
https://github.com/leejet/stable-diffusion.cpp.git
synced 2026-06-09 15:56:39 +00:00
40 lines
2.0 KiB
Markdown
40 lines
2.0 KiB
Markdown
# How to Use
|
|
|
|
PiD is NVIDIA's Pixel Diffusion Decoder. It replaces the usual VAE decode or decode-then-upscale path with a pixel-space diffusion decoder conditioned on a
|
|
source latent and text prompt.
|
|
|
|
In stable-diffusion.cpp, PiD currently runs as an image edit pipeline: provide a reference image with `-r`/`--ref-image`, encode that image with a matching VAE, then let the PiD diffusion model decode/upscale directly to RGB.
|
|
|
|
## Download weights
|
|
|
|
- Download PiD
|
|
- safetensors: https://huggingface.co/Comfy-Org/PixelDiT/tree/main/diffusion_models
|
|
- Download Gemma 2 2B
|
|
- safetensors: https://huggingface.co/Comfy-Org/PixelDiT/tree/main/text_encoders
|
|
- Download the VAE that matches the PiD checkpoint backbone
|
|
- safetensors: https://huggingface.co/nvidia/PiD/tree/main/checkpoints
|
|
- Flux / Z-Image PiD: use the Flux VAE and pass `--vae-format flux`
|
|
- SD3 PiD: use the SD3 VAE and pass `--vae-format sd3`
|
|
- Flux.2 PiD: use the Flux.2 VAE and pass `--vae-format flux2`
|
|
|
|
The official PiD model card should be checked before use. At the time of the initial PiD release, the official weights are under the NSCLv1 non-commercial license.
|
|
|
|
## Examples
|
|
|
|
```
|
|
.\bin\Release\sd-cli.exe --diffusion-model ..\..\ComfyUI\models\diffusion_models\pid_flux1_512_to_2048_4step_bf16.safetensors --llm "..\..\ComfyUI\models\text_encoders\gemma_2_2b_it_elm_bf16.safetensors" --vae ..\..\ComfyUI\models\vae\ae.sft --vae-format flux --cfg-scale 1.0 -p "a lovely cat" -r ..\assets\ernie_image\turbo_example.png --diffusion-fa -v --steps 4 -H 2048 -W 2048 --rng cpu
|
|
```
|
|
|
|
Before:
|
|
|
|
<img width="256" alt="ERNIE-Image Turbo example" src="../assets/ernie_image/turbo_example.png" />
|
|
|
|
After:
|
|
<img width="1024" alt="PiD example" src="../assets/pid/example.png" />
|
|
|
|
## Notes
|
|
|
|
- `-r`/`--ref-image` is required. PiD uses the first reference image as the source latent condition.
|
|
- `--vae-format` should match the VAE latent layout used by the PiD checkpoint. This is important when using standalone VAE files because the PiD diffusion
|
|
checkpoint alone does not identify the VAE format.
|