Documentation Index
Fetch the complete documentation index at: https://kindling.birklid.com/llms.txt
Use this file to discover all available pages before exploring further.
Open SourceApplications Layer
MarkItDown
Convert virtually any file format into clean, LLM-ready Markdown — PDFs, Office documents, images, audio, and more.
Type
Open Source (MIT)
Stack Layer
Applications
Language
Python
Stars
122k+
What it is
MarkItDown is a lightweight Python utility from Microsoft that converts almost any document format — PDFs, Word, PowerPoint, Excel, images, HTML, audio — into clean Markdown output. It preserves structural information (headings, lists, tables, links) while discarding visual formatting, producing output that modern LLMs understand natively. With 122k+ GitHub stars, it has become a standard pre-processing step in AI document pipelines. The project exists because Markdown sits at the sweet spot between plain text and rich markup that LLMs handle well. It’s intentionally a pre-processing utility, not a standalone document converter — the output is optimized for feeding into an LLM context or RAG pipeline rather than for human reading.Get started
GitHub ↗
Source, pip install, and format support matrix.
Related tools
Linkwarden
Self-hosted bookmark manager that preserves full page content against link rot.
Scrapling
Python web scraping framework for extracting content from live pages.