Skip to content

vLLM

⭐ 76.2k stars
Repositoryvllm-project/vllm
Categoryinfra
Difficultyadvanced
Statusactive
Tagsserving inference gpu high-throughput
Websitehttps://vllm.ai

Review

High-throughput LLM inference engine with PagedAttention for efficient GPU memory management. The de facto standard for self-hosting open-weight models at scale. Best for teams deploying open models that need maximum throughput per GPU dollar.

Use Cases

  • model-serving
  • inference
  • deployment

Curated with care for the AI developer community