diff --git a/examples/cli/README.md b/examples/cli/README.md index 3df91eeb..e8a14098 100644 --- a/examples/cli/README.md +++ b/examples/cli/README.md @@ -1,204 +1,9 @@ -# Run +# Usage -``` -usage: ./bin/sd-cli [options] +For detailed command-line arguments, run: -CLI Options: - -o, --output path to write result image to. you can use printf-style %d format specifiers for image - sequences (default: ./output.png) (eg. output_%03d.png). Single-file video outputs - support .avi, .webm, and animated .webp - --image path to the image to inspect (for metadata mode) - --metadata-format metadata output format, one of [text, json] (default: text) - --preview-path path to write preview image to (default: ./preview.png). Multi-frame previews support - .avi, .webm, and animated .webp - --preview-interval interval in denoising steps between consecutive updates of the image preview file - (default is 1, meaning updating at every step) - --output-begin-idx starting index for output image sequence, must be non-negative (default 0 if specified - %d in output path, 1 otherwise) - --canny apply canny preprocessor (edge detection) - --convert-name convert tensor name (for convert mode) - -v, --verbose print extra info - --color colors the logging tags according to level - --taesd-preview-only prevents usage of taesd for decoding the final image. (for use with --preview tae) - --preview-noisy enables previewing noisy inputs of the models rather than the denoised outputs - --metadata-raw include raw hex previews for unparsed metadata payloads - --metadata-brief truncate long metadata text values in text output - --metadata-all include structural/container entries such as IHDR, IDAT, and non-metadata JPEG segments - -M, --mode run mode, one of [img_gen, vid_gen, upscale, convert, metadata], default: img_gen - --preview preview method. must be one of the following [none, proj, tae, vae] (default is none) - -h, --help show this help message and exit - -Context Options: - -m, --model path to full model - --clip_l path to the clip-l text encoder - --clip_g path to the clip-g text encoder - --clip_vision path to the clip-vision encoder - --t5xxl path to the t5xxl text encoder - --llm path to the llm text encoder. For example: (qwenvl2.5 for qwen-image, - mistral-small3.2 for flux2, ...) - --llm_vision path to the llm vit - --qwen2vl alias of --llm. Deprecated. - --qwen2vl_vision alias of --llm_vision. Deprecated. - --diffusion-model path to the standalone diffusion model - --high-noise-diffusion-model path to the standalone high noise diffusion model - --uncond-diffusion-model path to the standalone unconditional diffusion model, currently used by - Ideogram4 CFG - --vae path to standalone vae model - --taesd path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) - --tae alias of --taesd - --control-net path to control net model - --embd-dir embeddings directory - --lora-model-dir lora model directory - --hires-upscalers-dir highres fix upscaler model directory - --tensor-type-rules weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") - --photo-maker path to PHOTOMAKER model - --upscale-model path to esrgan model. - -t, --threads number of threads to use during computation (default: -1). If threads <= 0, - then threads will be set to the number of CPU physical cores - --chroma-t5-mask-pad t5 mask pad size of chroma - --max-vram maximum VRAM budget in GiB for graph-cut segmented execution. 0 disables - graph splitting; a negative value auto-detects free VRAM, sparing the - specified value (e.g. -0.5 will keep at least 0.5 GiB free) - --force-sdxl-vae-conv-scale force use of conv scale on sdxl vae - --offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM - when needed - --mmap whether to memory-map model - --control-net-cpu deprecated; use --backend controlnet=cpu - --clip-on-cpu deprecated; use --backend te=cpu - --vae-on-cpu deprecated; use --backend vae=cpu - --fa use flash attention - --diffusion-fa use flash attention in the diffusion model only - --diffusion-conv-direct use ggml_conv2d_direct in the diffusion model - --vae-conv-direct use ggml_conv2d_direct in the vae model - --circular enable circular padding for convolutions - --circularx enable circular RoPE wrapping on x-axis (width) only - --circulary enable circular RoPE wrapping on y-axis (height) only - --chroma-disable-dit-mask disable dit mask for chroma - --qwen-image-zero-cond-t enable zero_cond_t for qwen image - --chroma-enable-t5-mask enable t5 mask for chroma - --type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, - q4_K). If not specified, the default is the type of the weight file - --rng RNG, one of [std_default, cuda, cpu], default: cuda(sd-webui), cpu(comfyui) - --sampler-rng sampler RNG, one of [std_default, cuda, cpu]. If not specified, use --rng - --prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow, - flux2_flow] - --lora-apply-mode the way to apply LoRA, one of [auto, immediately, at_runtime], default is - auto. In auto mode, if the model weights contain any quantized parameters, - the at_runtime mode will be used; otherwise, immediately will be used.The - immediately mode may have precision and compatibility issues with quantized - parameters, but it usually offers faster inference speed and, in some cases, - lower memory usage. The at_runtime mode, on the other hand, is exactly the - opposite. - -Generation Options: - -p, --prompt the prompt to render - -n, --negative-prompt the negative prompt (default: "") - -i, --init-img path to the init image - --end-img path to the end image, required by flf2v - --mask path to the mask image - --control-image path to control image, control net - --control-video path to control video frames, It must be a directory path. The video frames - inside should be stored as images in lexicographical (character) order. For - example, if the control video path is `frames`, the directory contain images - such as 00.png, 01.png, ... etc. - --pm-id-images-dir path to PHOTOMAKER input id images dir - --pm-id-embed-path path to PHOTOMAKER v2 id embed - --hires-upscaler highres fix upscaler, Lanczos, Nearest, Latent, Latent (nearest), Latent - (nearest-exact), Latent (antialiased), Latent (bicubic), Latent (bicubic - antialiased), or a model name under --hires-upscalers-dir (default: Latent) - --extra-sample-args extra sampler/scheduler/guidance args, key=value list. APG supports apg_eta, - apg_momentum, apg_norm_threshold, apg_norm_threshold_smoothing; SLG supports - slg_uncond; lcm supports noise_clip_std, noise_scale_start, noise_scale_end; - ltx2 supports max_shift, base_shift, stretch, terminal; euler_ge supports gamma - --extra-tiling-args extra VAE tiling args, key=value list. LTX video VAE supports - temporal_tile_frames (default: 4), temporal_tile_overlap (default: 1) - -H, --height image height, in pixel space (default: 512) - -W, --width image width, in pixel space (default: 512) - --steps number of sample steps (default: 20) - --high-noise-steps (high noise) number of sample steps (default: -1 = auto) - --clip-skip ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer - (default: -1). <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x - -b, --batch-count batch count - --video-frames video frames (default: 1) - --fps fps (default: 24) - --timestep-shift shift timestep for NitroFusion models (default: 0). recommended N for - NitroSD-Realism around 250 and 500 for NitroSD-Vibrant - --upscale-repeats Run the ESRGAN upscaler this many times (default: 1) - --upscale-tile-size tile size for ESRGAN upscaling (default: 128) - --hires-width highres fix target width, 0 to use --hires-scale (default: 0) - --hires-height highres fix target height, 0 to use --hires-scale (default: 0) - --hires-steps highres fix second pass sample steps, 0 to reuse --steps (default: 0) - --hires-upscale-tile-size highres fix upscaler tile size, reserved for model-backed upscalers (default: - 128) - --cfg-scale unconditional guidance scale: (default: 7.0) - --img-cfg-scale image guidance scale for inpaint or image edit models: (default: same as - --cfg-scale) - --guidance distilled guidance scale for models with guidance input (default: 3.5) - --slg-scale skip layer guidance (SLG) scale, only for DiT models: (default: 0). 0 means - disabled, a value of 2.5 is nice for sd3.5 medium - --skip-layer-start SLG enabling point (default: 0.01) - --skip-layer-end SLG disabling point (default: 0.2) - --eta noise multiplier (default: 0 for ddim_trailing, tcd, res_multistep and - res_2s; 1 for euler_a, er_sde and dpm++2s_a) - --flow-shift shift value for Flow models like SD3.x or WAN (default: auto) - --high-noise-cfg-scale (high noise) unconditional guidance scale: (default: 7.0) - --high-noise-img-cfg-scale (high noise) image guidance scale for inpaint or image edit models (default: - same as --cfg-scale) - --high-noise-guidance (high noise) distilled guidance scale for models with guidance input - (default: 3.5) - --high-noise-slg-scale (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: - 0) - --high-noise-skip-layer-start (high noise) SLG enabling point (default: 0.01) - --high-noise-skip-layer-end (high noise) SLG disabling point (default: 0.2) - --high-noise-eta (high noise) noise multiplier (default: 0 for ddim_trailing, tcd, - res_multistep and res_2s; 1 for euler_a, er_sde and dpm++2s_a) - --strength strength for noising/unnoising (default: 0.75) - --pm-style-strength - --control-strength strength to apply Control Net (default: 0.9). 1.0 corresponds to full - destruction of information in init image - --moe-boundary timestep boundary for Wan2.2 MoE model. (default: 0.875). Only enabled if - `--high-noise-steps` is set to -1 - --vace-strength wan vace strength - --vae-tile-overlap tile overlap for vae tiling, in fraction of tile size (default: 0.5) - --hires-scale highres fix scale when target size is not set (default: 2.0) - --hires-denoising-strength highres fix second pass denoising strength (default: 0.7) - --increase-ref-index automatically increase the indices of references images based on the order - they are listed (starting with 1). - --disable-auto-resize-ref-image disable auto resize of ref images - --disable-image-metadata do not embed generation metadata on image files - --vae-tiling process vae in tiles to reduce memory usage - --temporal-tiling enable temporal tiling for LTX video VAE decode - --hires enable highres fix - -s, --seed RNG seed (default: 42, use random seed for < 0) - --sampling-method sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, - dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd, res_multistep, res_2s, - er_sde, euler_cfg_pp, euler_a_cfg_pp] (default: euler for Flux/SD3/Wan, euler_a otherwise) - --high-noise-sampling-method (high noise) sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, - dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd, res_multistep, - res_2s, er_sde, euler_cfg_pp, euler_a_cfg_pp] default: euler for Flux/SD3/Wan, euler_a otherwise - --scheduler denoiser sigma scheduler, one of [discrete, karras, exponential, ays, gits, - smoothstep, sgm_uniform, simple, kl_optimal, lcm, bong_tangent, ltx2], default: - model-specific - --sigmas custom sigma values for the sampler, comma-separated (e.g., - "14.61,7.8,3.5,0.0"). - --hires-sigmas custom sigma values for the highres fix second pass, comma-separated (e.g., - "0.85,0.725,0.421875,0.0"). - --skip-layers layers to skip for SLG steps (default: [7,8,9]) - --high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9]) - -r, --ref-image reference image for Flux Kontext models (can be used multiple times) - --cache-mode caching method: 'easycache' (DiT), 'ucache' (UNET), - 'dbcache'/'taylorseer'/'cache-dit' (DiT block-level), 'spectrum' (UNET/DiT - Chebyshev+Taylor forecasting) - --cache-option named cache params (key=value format, comma-separated). easycache/ucache: - threshold=,start=,end=,decay=,relative=,reset=; dbcache/taylorseer/cache-dit: - Fn=,Bn=,threshold=,warmup=; spectrum: w=,m=,lam=,window=,flex=,warmup=,stop=. - Examples: "threshold=0.25" or "threshold=1.5,reset=0" - --scm-mask SCM steps mask for cache-dit: comma-separated 0/1 (e.g., - "1,1,1,0,0,1,0,0,1,0") - 1=compute, 0=can cache - --scm-policy SCM policy: 'dynamic' (default) or 'static' - --vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32) - --vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size - if < 1, in number of tiles per dim if >=1 (overrides --vae-tile-size) +```bash +./bin/sd-cli -h ``` Metadata mode inspects PNG/JPEG container metadata without loading any model: diff --git a/examples/server/README.md b/examples/server/README.md index 63e38977..c24ed083 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -117,188 +117,10 @@ In this case, the server will load and serve the specified `index.html` file ins * using a custom UI * avoiding rebuilding the binary after frontend modifications -# Run +# Usage -``` -usage: ./bin/sd-server [options] - -Svr Options: - -l, --listen-ip server listen ip (default: 127.0.0.1) - --serve-html-path path to HTML file to serve at root (optional) - --listen-port server listen port (default: 1234) - -v, --verbose print extra info - --color colors the logging tags according to level - -h, --help show this help message and exit - -Context Options: - -m, --model path to full model - --clip_l path to the clip-l text encoder - --clip_g path to the clip-g text encoder - --clip_vision path to the clip-vision encoder - --t5xxl path to the t5xxl text encoder - --llm path to the llm text encoder. For example: (qwenvl2.5 for qwen-image, - mistral-small3.2 for flux2, ...) - --llm_vision path to the llm vit - --qwen2vl alias of --llm. Deprecated. - --qwen2vl_vision alias of --llm_vision. Deprecated. - --diffusion-model path to the standalone diffusion model - --high-noise-diffusion-model path to the standalone high noise diffusion model - --uncond-diffusion-model path to the standalone unconditional diffusion model, currently used by - Ideogram4 CFG - --vae path to standalone vae model - --taesd path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) - --tae alias of --taesd - --control-net path to control net model - --embd-dir embeddings directory - --lora-model-dir lora model directory - --hires-upscalers-dir highres fix upscaler model directory - --tensor-type-rules weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") - --photo-maker path to PHOTOMAKER model - --upscale-model path to esrgan model. - -t, --threads number of threads to use during computation (default: -1). If threads <= 0, - then threads will be set to the number of CPU physical cores - --chroma-t5-mask-pad t5 mask pad size of chroma - --max-vram maximum VRAM budget in GiB for graph-cut segmented execution. 0 disables - graph splitting; a negative value auto-detects free VRAM, sparing the - specified value (e.g. -0.5 will keep at least 0.5 GiB free) - --force-sdxl-vae-conv-scale force use of conv scale on sdxl vae - --offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM - when needed - --mmap whether to memory-map model - --control-net-cpu deprecated; use --backend controlnet=cpu - --clip-on-cpu deprecated; use --backend te=cpu - --vae-on-cpu deprecated; use --backend vae=cpu - --fa use flash attention - --diffusion-fa use flash attention in the diffusion model only - --diffusion-conv-direct use ggml_conv2d_direct in the diffusion model - --vae-conv-direct use ggml_conv2d_direct in the vae model - --circular enable circular padding for convolutions - --circularx enable circular RoPE wrapping on x-axis (width) only - --circulary enable circular RoPE wrapping on y-axis (height) only - --chroma-disable-dit-mask disable dit mask for chroma - --qwen-image-zero-cond-t enable zero_cond_t for qwen image - --chroma-enable-t5-mask enable t5 mask for chroma - --type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, - q4_K). If not specified, the default is the type of the weight file - --rng RNG, one of [std_default, cuda, cpu], default: cuda(sd-webui), cpu(comfyui) - --sampler-rng sampler RNG, one of [std_default, cuda, cpu]. If not specified, use --rng - --prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow, - flux2_flow] - --lora-apply-mode the way to apply LoRA, one of [auto, immediately, at_runtime], default is - auto. In auto mode, if the model weights contain any quantized parameters, - the at_runtime mode will be used; otherwise, immediately will be used.The - immediately mode may have precision and compatibility issues with quantized - parameters, but it usually offers faster inference speed and, in some cases, - lower memory usage. The at_runtime mode, on the other hand, is exactly the - opposite. - -Default Generation Options: - -p, --prompt the prompt to render - -n, --negative-prompt the negative prompt (default: "") - -i, --init-img path to the init image - --end-img path to the end image, required by flf2v - --mask path to the mask image - --control-image path to control image, control net - --control-video path to control video frames, It must be a directory path. The video frames - inside should be stored as images in lexicographical (character) order. For - example, if the control video path is `frames`, the directory contain images - such as 00.png, 01.png, ... etc. - --pm-id-images-dir path to PHOTOMAKER input id images dir - --pm-id-embed-path path to PHOTOMAKER v2 id embed - --hires-upscaler highres fix upscaler, Lanczos, Nearest, Latent, Latent (nearest), Latent - (nearest-exact), Latent (antialiased), Latent (bicubic), Latent (bicubic - antialiased), or a model name under --hires-upscalers-dir (default: Latent) - --extra-sample-args extra sampler/scheduler/guidance args, key=value list. APG supports apg_eta, - apg_momentum, apg_norm_threshold, apg_norm_threshold_smoothing; SLG supports - slg_uncond; lcm supports noise_clip_std, noise_scale_start, noise_scale_end; - ltx2 supports max_shift, base_shift, stretch, terminal; euler_ge supports gamma - --extra-tiling-args extra VAE tiling args, key=value list. LTX video VAE supports - temporal_tile_frames (default: 4), temporal_tile_overlap (default: 1) - -H, --height image height, in pixel space (default: 512) - -W, --width image width, in pixel space (default: 512) - --steps number of sample steps (default: 20) - --high-noise-steps (high noise) number of sample steps (default: -1 = auto) - --clip-skip ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer - (default: -1). <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x - -b, --batch-count batch count - --video-frames video frames (default: 1) - --fps fps (default: 24) - --timestep-shift shift timestep for NitroFusion models (default: 0). recommended N for - NitroSD-Realism around 250 and 500 for NitroSD-Vibrant - --upscale-repeats Run the ESRGAN upscaler this many times (default: 1) - --upscale-tile-size tile size for ESRGAN upscaling (default: 128) - --hires-width highres fix target width, 0 to use --hires-scale (default: 0) - --hires-height highres fix target height, 0 to use --hires-scale (default: 0) - --hires-steps highres fix second pass sample steps, 0 to reuse --steps (default: 0) - --hires-upscale-tile-size highres fix upscaler tile size, reserved for model-backed upscalers (default: - 128) - --cfg-scale unconditional guidance scale: (default: 7.0) - --img-cfg-scale image guidance scale for inpaint or image edit models: (default: same as - --cfg-scale) - --guidance distilled guidance scale for models with guidance input (default: 3.5) - --slg-scale skip layer guidance (SLG) scale, only for DiT models: (default: 0). 0 means - disabled, a value of 2.5 is nice for sd3.5 medium - --skip-layer-start SLG enabling point (default: 0.01) - --skip-layer-end SLG disabling point (default: 0.2) - --eta noise multiplier (default: 0 for ddim_trailing, tcd, res_multistep and - res_2s; 1 for euler_a, er_sde and dpm++2s_a) - --flow-shift shift value for Flow models like SD3.x or WAN (default: auto) - --high-noise-cfg-scale (high noise) unconditional guidance scale: (default: 7.0) - --high-noise-img-cfg-scale (high noise) image guidance scale for inpaint or image edit models (default: - same as --cfg-scale) - --high-noise-guidance (high noise) distilled guidance scale for models with guidance input - (default: 3.5) - --high-noise-slg-scale (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: - 0) - --high-noise-skip-layer-start (high noise) SLG enabling point (default: 0.01) - --high-noise-skip-layer-end (high noise) SLG disabling point (default: 0.2) - --high-noise-eta (high noise) noise multiplier (default: 0 for ddim_trailing, tcd, - res_multistep and res_2s; 1 for euler_a, er_sde and dpm++2s_a) - --strength strength for noising/unnoising (default: 0.75) - --pm-style-strength - --control-strength strength to apply Control Net (default: 0.9). 1.0 corresponds to full - destruction of information in init image - --moe-boundary timestep boundary for Wan2.2 MoE model. (default: 0.875). Only enabled if - `--high-noise-steps` is set to -1 - --vace-strength wan vace strength - --vae-tile-overlap tile overlap for vae tiling, in fraction of tile size (default: 0.5) - --hires-scale highres fix scale when target size is not set (default: 2.0) - --hires-denoising-strength highres fix second pass denoising strength (default: 0.7) - --increase-ref-index automatically increase the indices of references images based on the order - they are listed (starting with 1). - --disable-auto-resize-ref-image disable auto resize of ref images - --disable-image-metadata do not embed generation metadata on image files - --vae-tiling process vae in tiles to reduce memory usage - --temporal-tiling enable temporal tiling for LTX video VAE decode - --hires enable highres fix - -s, --seed RNG seed (default: 42, use random seed for < 0) - --sampling-method sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, - dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd, res_multistep, res_2s, - er_sde, euler_cfg_pp, euler_a_cfg_pp] (default: euler for Flux/SD3/Wan, euler_a otherwise) - --high-noise-sampling-method (high noise) sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, - dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd, res_multistep, - res_2s, er_sde, euler_cfg_pp, euler_a_cfg_pp] default: euler for Flux/SD3/Wan, euler_a otherwise - --scheduler denoiser sigma scheduler, one of [discrete, karras, exponential, ays, gits, - smoothstep, sgm_uniform, simple, kl_optimal, lcm, bong_tangent, ltx2], default: - model-specific - --sigmas custom sigma values for the sampler, comma-separated (e.g., - "14.61,7.8,3.5,0.0"). - --hires-sigmas custom sigma values for the highres fix second pass, comma-separated (e.g., - "0.85,0.725,0.421875,0.0"). - --skip-layers layers to skip for SLG steps (default: [7,8,9]) - --high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9]) - -r, --ref-image reference image for Flux Kontext models (can be used multiple times) - --cache-mode caching method: 'easycache' (DiT), 'ucache' (UNET), - 'dbcache'/'taylorseer'/'cache-dit' (DiT block-level), 'spectrum' (UNET/DiT - Chebyshev+Taylor forecasting) - --cache-option named cache params (key=value format, comma-separated). easycache/ucache: - threshold=,start=,end=,decay=,relative=,reset=; dbcache/taylorseer/cache-dit: - Fn=,Bn=,threshold=,warmup=; spectrum: w=,m=,lam=,window=,flex=,warmup=,stop=. - Examples: "threshold=0.25" or "threshold=1.5,reset=0" - --scm-mask SCM steps mask for cache-dit: comma-separated 0/1 (e.g., - "1,1,1,0,0,1,0,0,1,0") - 1=compute, 0=can cache - --scm-policy SCM policy: 'dynamic' (default) or 'static' - --vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32) - --vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size - if < 1, in number of tiles per dim if >=1 (overrides --vae-tile-size) +For detailed command-line arguments, run: + +```bash +./bin/sd-server -h ```