## Offload weights to the CPU to save VRAM without reducing generation speed.

Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed.

## Use quantization to reduce memory usage.

[quantization](./quantization_and_gguf.md)