vLLM

⭐ 76.2k stars


Repository	vllm-project/vllm
Category	infra
Difficulty	advanced
Status	active
Tags	`serving` `inference` `gpu` `high-throughput`
Website	https://vllm.ai

Review

High-throughput LLM inference engine with PagedAttention for efficient GPU memory management. The de facto standard for self-hosting open-weight models at scale. Best for teams deploying open models that need maximum throughput per GPU dollar.

Use Cases

model-serving
inference
deployment

ollama
llama
tgi

vLLM ​

Review ​

Use Cases ​

Related Projects ​

vLLM

Review

Use Cases

Related Projects