Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kindling.birklid.com/llms.txt

Use this file to discover all available pages before exploring further.

Open SourceApplications Layer

MarkItDown

Convert virtually any file format into clean, LLM-ready Markdown — PDFs, Office documents, images, audio, and more.
MarkItDown GitHub

Type

Open Source (MIT)

Stack Layer

Applications

Language

Python

Stars

122k+

What it is

MarkItDown is a lightweight Python utility from Microsoft that converts almost any document format — PDFs, Word, PowerPoint, Excel, images, HTML, audio — into clean Markdown output. It preserves structural information (headings, lists, tables, links) while discarding visual formatting, producing output that modern LLMs understand natively. With 122k+ GitHub stars, it has become a standard pre-processing step in AI document pipelines. The project exists because Markdown sits at the sweet spot between plain text and rich markup that LLMs handle well. It’s intentionally a pre-processing utility, not a standalone document converter — the output is optimized for feeding into an LLM context or RAG pipeline rather than for human reading.
Use this when you need to feed arbitrary documents into an LLM pipeline without losing structural context — turn a 50-page PDF into a form Claude or GPT can actually reason over.

Get started

GitHub ↗

Source, pip install, and format support matrix.

Linkwarden

Self-hosted bookmark manager that preserves full page content against link rot.

Scrapling

Python web scraping framework for extracting content from live pages.