vLLM
⭐ 76.2k stars| Repository | vllm-project/vllm |
| Category | infra |
| Difficulty | advanced |
| Status | active |
| Tags | serving inference gpu high-throughput |
| Website | https://vllm.ai |
Review
High-throughput LLM inference engine with PagedAttention for efficient GPU memory management. The de facto standard for self-hosting open-weight models at scale. Best for teams deploying open models that need maximum throughput per GPU dollar.
Use Cases
- model-serving
- inference
- deployment