## Offload weights to the CPU to save VRAM without reducing generation speed. Using `--offload-to-cpu` allows you to offload weights to the CPU, saving VRAM without reducing generation speed. ## Use quantization to reduce memory usage. [quantization](./quantization_and_gguf.md)