llmfood - Transform Docusaurus HTML builds into LLM-optimized Markdown.

llmfood

Transform Docusaurus HTML builds into LLM-optimized Markdown.

Pitch

llmfood simplifies the process of converting Docusaurus static HTML builds into markdown files tailored for LLMs. By following the llms.txt convention, it discovers all pages, resolves dynamic content, and generates clean markdown files. This tool enhances documentation accessibility for large language models, streamlining the content consumption process.

Description

llmfood is a powerful tool designed to streamline the conversion of Docusaurus HTML builds into clean, LLM-friendly Markdown files, adhering to the llms.txt convention. This functionality is essential for developers looking to optimize their documentation for Large Language Model (LLM) consumption.

Key Features

Automatic Page Discovery: llmfood scans through the Docusaurus build directory to identify and process all HTML pages, ensuring comprehensive documentation coverage.
Client-Side Content Resolution: It effectively resolves client-side content discrepancies that may appear within static HTML files. This includes enhancing GitHub code references, fetching remote content, and integrating Mermaid diagrams within the output.
HTML to Markdown Conversion: Each detected HTML page is transformed into Markdown, stripping away unnecessary Docusaurus elements such as breadcrumbs, pagination, and footers, producing streamlined documents for further use.
Structured Index Generation: The tool automatically creates an llms.txt file — a structured index that links to all converted Markdown files, facilitating easy navigation.
Custom File Creation: Users can specify custom aggregated Markdown files, such as llms-full.txt, that compile documentation according to user-defined URL patterns.

Installation and Usage

To integrate llmfood into a Docusaurus project, it is recommended to add it as a plugin for seamless operation. The plugin automatically executes following the docusaurus build command. Here is an example configuration:

// docusaurus.config.js  
module.exports = {  
  plugins: [  
    [  
      "llmfood/docusaurus",  
      {  
        sectionOrder: ["guides", "api", "concepts"],  
        sectionLabels: { guides: "Guides", api: "API Reference" },  
        customFiles: [  
          {  
            filename: "llms-full.txt",  
            title: "Full Documentation",  
            description: "Complete documentation in a single file",  
            includePatterns: [/.*/],  
          },  
        ],  
      },  
    ],  
  ],  
};

For standalone usage, llmfood can also be instantiated directly by importing the main function and specifying configuration options, as shown below:

import { generateLlmsMarkdown } from "llmfood";  
await generateLlmsMarkdown({  
  baseUrl: "https://docs.example.com",  
  buildDir: "./build",  
  siteTitle: "My Docs",  
  siteDescription: "Documentation for my project",  
  docsDir: "./docs",  
  sectionOrder: ["guides", "api", "concepts"],  
  sectionLabels: { guides: "Guides", api: "API Reference" },  
  ignorePatterns: [/\/blog\/],  
  customFiles: [  
    {  
      filename: "llms-full.txt",  
      title: "Full Documentation",  
      description: "Complete documentation in a single file",  
      includePatterns: [/.*/],  
    },  
  ],  
});

Additionally, llmfood offers the option to convert raw HTML strings to Markdown directly, enabling integration into existing workflows:

import { htmlToMarkdown } from "llmfood";  
const markdown = htmlToMarkdown(docusaurusHtmlString);

Supported Elements

llmfood is capable of handling various Docusaurus-specific elements such as code blocks, admonitions, tabs, and more, preserving their structure during the conversion process. This includes support for:

Prism code blocks
KaTeX math
YouTube embeds
Mermaid diagrams

In summary, llmfood not only enhances the documentation process by converting and optimizing Docusaurus builds but also ensures that important client-side content is properly managed, resulting in a well-structured, easy-to-consume Markdown output.

0 comments

No comments yet.

New comment