Skip to content

Unstructured

⭐ 14.4k stars
RepositoryUnstructured-IO/unstructured
Categoryskill
Difficultyintermediate
Statusactive
Tagsdocument-parsing pdf etl rag data-prep
Websitehttps://unstructured.io

Review

Library for extracting and preprocessing content from PDFs, HTML, images, and other documents. Handles messy real-world documents with OCR, table extraction, and chunking strategies. Best for RAG pipelines that need to ingest diverse document formats.

Use Cases

  • document-processing
  • rag
  • etl

Curated with care for the AI developer community