Unstructured
⭐ 14.4k stars| Repository | Unstructured-IO/unstructured |
| Category | skill |
| Difficulty | intermediate |
| Status | active |
| Tags | document-parsing pdf etl rag data-prep |
| Website | https://unstructured.io |
Review
Library for extracting and preprocessing content from PDFs, HTML, images, and other documents. Handles messy real-world documents with OCR, table extraction, and chunking strategies. Best for RAG pipelines that need to ingest diverse document formats.
Use Cases
- document-processing
- rag
- etl