# Run ``` usage: ./bin/sd-server [options] Svr Options: -l, --listen-ip server listen ip (default: 127.0.0.1) --listen-port server listen port (default: 1234) -v, --verbose print extra info --color colors the logging tags according to level -h, --help show this help message and exit Context Options: -m, --model path to full model --clip_l path to the clip-l text encoder --clip_g path to the clip-g text encoder --clip_vision path to the clip-vision encoder --t5xxl path to the t5xxl text encoder --llm path to the llm text encoder. For example: (qwenvl2.5 for qwen-image, mistral-small3.2 for flux2, ...) --llm_vision path to the llm vit --qwen2vl alias of --llm. Deprecated. --qwen2vl_vision alias of --llm_vision. Deprecated. --diffusion-model path to the standalone diffusion model --high-noise-diffusion-model path to the standalone high noise diffusion model --vae path to standalone vae model --taesd path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) --control-net path to control net model --embd-dir embeddings directory --lora-model-dir lora model directory --tensor-type-rules weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") --photo-maker path to PHOTOMAKER model --upscale-model path to esrgan model. -t, --threads number of threads to use during computation (default: -1). If threads <= 0, then threads will be set to the number of CPU physical cores --chroma-t5-mask-pad t5 mask pad size of chroma --vae-tile-overlap tile overlap for vae tiling, in fraction of tile size (default: 0.5) --flow-shift shift value for Flow models like SD3.x or WAN (default: auto) --vae-tiling process vae in tiles to reduce memory usage --force-sdxl-vae-conv-scale force use of conv scale on sdxl vae --offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM when needed --control-net-cpu keep controlnet in cpu (for low vram) --clip-on-cpu keep clip in cpu (for low vram) --vae-on-cpu keep vae in cpu (for low vram) --diffusion-fa use flash attention in the diffusion model --diffusion-conv-direct use ggml_conv2d_direct in the diffusion model --vae-conv-direct use ggml_conv2d_direct in the vae model --chroma-disable-dit-mask disable dit mask for chroma --chroma-enable-t5-mask enable t5 mask for chroma --type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K). If not specified, the default is the type of the weight file --rng RNG, one of [std_default, cuda, cpu], default: cuda(sd-webui), cpu(comfyui) --sampler-rng sampler RNG, one of [std_default, cuda, cpu]. If not specified, use --rng --prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow, flux2_flow] --lora-apply-mode the way to apply LoRA, one of [auto, immediately, at_runtime], default is auto. In auto mode, if the model weights contain any quantized parameters, the at_runtime mode will be used; otherwise, immediately will be used.The immediately mode may have precision and compatibility issues with quantized parameters, but it usually offers faster inference speed and, in some cases, lower memory usage. The at_runtime mode, on the other hand, is exactly the opposite. --vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32) --vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size if < 1, in number of tiles per dim if >=1 (overrides --vae-tile-size) ```