llmfood simplifies the process of converting Docusaurus static HTML builds into markdown files tailored for LLMs. By following the llms.txt convention, it discovers all pages, resolves dynamic content, and generates clean markdown files. This tool enhances documentation accessibility for large language models, streamlining the content consumption process.
llmfood is a powerful tool designed to streamline the conversion of Docusaurus HTML builds into clean, LLM-friendly Markdown files, adhering to the llms.txt convention. This functionality is essential for developers looking to optimize their documentation for Large Language Model (LLM) consumption.
Key Features
- Automatic Page Discovery: llmfood scans through the Docusaurus build directory to identify and process all HTML pages, ensuring comprehensive documentation coverage.
- Client-Side Content Resolution: It effectively resolves client-side content discrepancies that may appear within static HTML files. This includes enhancing GitHub code references, fetching remote content, and integrating Mermaid diagrams within the output.
- HTML to Markdown Conversion: Each detected HTML page is transformed into Markdown, stripping away unnecessary Docusaurus elements such as breadcrumbs, pagination, and footers, producing streamlined documents for further use.
- Structured Index Generation: The tool automatically creates an
llms.txtfile — a structured index that links to all converted Markdown files, facilitating easy navigation. - Custom File Creation: Users can specify custom aggregated Markdown files, such as
llms-full.txt, that compile documentation according to user-defined URL patterns.
Installation and Usage
To integrate llmfood into a Docusaurus project, it is recommended to add it as a plugin for seamless operation. The plugin automatically executes following the docusaurus build command. Here is an example configuration:
// docusaurus.config.js
module.exports = {
plugins: [
[
"llmfood/docusaurus",
{
sectionOrder: ["guides", "api", "concepts"],
sectionLabels: { guides: "Guides", api: "API Reference" },
customFiles: [
{
filename: "llms-full.txt",
title: "Full Documentation",
description: "Complete documentation in a single file",
includePatterns: [/.*/],
},
],
},
],
],
};
For standalone usage, llmfood can also be instantiated directly by importing the main function and specifying configuration options, as shown below:
import { generateLlmsMarkdown } from "llmfood";
await generateLlmsMarkdown({
baseUrl: "https://docs.example.com",
buildDir: "./build",
siteTitle: "My Docs",
siteDescription: "Documentation for my project",
docsDir: "./docs",
sectionOrder: ["guides", "api", "concepts"],
sectionLabels: { guides: "Guides", api: "API Reference" },
ignorePatterns: [/\/blog\/],
customFiles: [
{
filename: "llms-full.txt",
title: "Full Documentation",
description: "Complete documentation in a single file",
includePatterns: [/.*/],
},
],
});
Additionally, llmfood offers the option to convert raw HTML strings to Markdown directly, enabling integration into existing workflows:
import { htmlToMarkdown } from "llmfood";
const markdown = htmlToMarkdown(docusaurusHtmlString);
Supported Elements
llmfood is capable of handling various Docusaurus-specific elements such as code blocks, admonitions, tabs, and more, preserving their structure during the conversion process. This includes support for:
- Prism code blocks
- KaTeX math
- YouTube embeds
- Mermaid diagrams
In summary, llmfood not only enhances the documentation process by converting and optimizing Docusaurus builds but also ensures that important client-side content is properly managed, resulting in a well-structured, easy-to-consume Markdown output.
No comments yet.
Sign in to be the first to comment.