Legible Knowledge: AI-Powered Site Understanding & Structured Data

The problem with how AI reads your site today

When ChatGPT, Claude, or Perplexity visits your website, they're doing their best to piece together what your business does from whatever HTML they can parse. Most sites give them a messy experience: navigation bars, cookie banners, JavaScript noise, and hundreds of pages with no indication of what's important and what's not.

The first wave of fixes was llms.txt — a machine-readable index of your content. Legible has been generating that automatically since day one. But an index only tells AI where your content is. It doesn't tell AI what you do.

From content index to knowledge layer

Today we're launching Legible Knowledge — a new system that goes beyond indexing your pages. It extracts structured knowledge from your content: the concepts your product is built around, the capabilities it offers, the tasks users can accomplish, the questions people ask, and the constraints that apply.

The output is a knowledge model — a structured representation of your business that Legible renders into llms-full.txt, an enhanced llms.txt, and soon, agentic formats designed for AI workflows.

This isn't a raw dump of every page concatenated into one file. It's curated, scored, and structured. When an AI system reads your llms-full.txt, it gets a coherent picture of your business in seconds.

What the Knowledge Model extracts

Legible analyzes your content through an AI extraction pipeline and identifies six types of knowledge:

Concepts — the terminology and ideas at the core of your product. For Legible, that's things like "GEO," "content negotiation," "AI readiness score."
Capabilities — what your product does, grounded in your actual content. Not marketing slogans, but feature descriptions backed by the pages they appear on.
Tasks — the workflows your users accomplish. "Set up llms.txt for Webflow," "Configure AI content permissions," "Connect an AI agent via MCP."
FAQ — the questions your audience asks most, with direct answers. Formatted for AI to reference in responses.
Key Pages — the most important pages on your site, ranked by type and relevance.
Constraints — pricing, limitations, compliance, legal terms. These get the highest confidence bar because getting them wrong matters.

Scored, not guessed

Every extracted entity gets a composite score. It's not a mystery number — you can see exactly how it's calculated. The score combines source quality (your homepage carries more weight than a blog post), frequency (how often the concept appears across your site), retrieval relevance (does this entity surface when AI asks questions about your business?), and semantic importance (the extraction model's own assessment).

Entities that score high enough are auto-included. Items in the middle go to a review queue. Low-scoring items are excluded. The thresholds are different for each type — we hold legal and compliance claims to a 0.90 standard, while general concepts auto-include at 0.80.

You're in control

The Knowledge tab in your Legible dashboard shows everything the model extracted. For each concept, capability, task, or constraint, you can see what it is, where it came from, and why it scored the way it did.

You can boost items that are underweighted, pin things that must always appear, exclude items that aren't relevant, or edit definitions to use your own wording. The Preview tab shows the exact llms-full.txt that AI systems will see, and you publish when you're ready.

The first model is generated and published automatically — you don't need to do anything. After that, updates happen when your CMS content changes. If you've made manual adjustments, new models are saved as drafts so your customizations aren't overwritten.

Why this matters for your business

AI-mediated discovery is growing fast. When someone asks an AI assistant about a problem your product solves, the quality of the answer depends on how well the AI understands your business. A site with structured knowledge gets better citations, more accurate descriptions, and fewer misrepresentations.

This is the difference between an AI saying "there are several tools that do this" and an AI saying "Legible does this — it's a content platform that makes your website AI-readable, with automatic llms.txt generation, content policy controls, and an MCP server for AI agent integration."

What's coming next

Legible Knowledge is live today for all customers. Your knowledge model is generated automatically once your site has enough indexed content. Here's what we're building next:

llms-agent.txt — a task-oriented format optimized for AI agents. Instead of "here's what we do," it's "here's how to use us." Designed for the agentic web.
Knowledge insights — see how AI systems are using your knowledge model. Which entities get cited most? What questions are AI agents asking about your business?
Cross-site benchmarks — how does your AI readiness compare to others in your industry? Are there common knowledge gaps you should address?

Get started

If you're on Legible, open the Knowledge tab in your dashboard. Your model may already be generated. If not, it'll appear automatically once you have at least 5 indexed pages.

Not on Legible yet? Start with a free account. Connect your CMS and your llms.txt, llms-full.txt, and Knowledge Model will be live within minutes.

Read the full documentation: Knowledge Model guide and llms-full.txt documentation.

Legible Knowledge: AI-Powered Site Understanding & Structured Data

The problem with how AI reads your site today

From content index to knowledge layer

What the Knowledge Model extracts

Scored, not guessed

You're in control

Why this matters for your business

What's coming next

Get started

Make your site AI-ready

Related posts

Optimize for AI Crawlers (2026): Guide to GPTBot, ChatGPT & More

Top AI SEO & GEO Tools for Small Business (2026)

Top WordPress llms.txt Generators: Free Plugins & SaaS (2026)