Legible Knowledge Model: Turn Your Website Content into AI Understanding

Why this matters

Most AI tools dump your raw pages into a file and call it a day. Legible extracts structured knowledge from your content — the concepts your product is built on, the capabilities it offers, the tasks users accomplish, and the constraints that apply.

The Knowledge Model is the engine behind llms-full.txt, and it's fully reviewable. You can see exactly what AI systems will understand about your business, and adjust it.

The problem with raw content dumps

When most tools generate llms-full.txt, they concatenate your pages into one giant file. The result is a wall of text — blog posts mixed with legal pages mixed with product descriptions. There's no structure, no ranking, and no distinction between your pricing page and a two-year-old blog post about an industry trend.

An AI system reading that file has to figure out on its own what's important, what your product does, and what terminology matters. It often gets it wrong. It misses capabilities mentioned on your features page, conflates blog opinions with official product claims, or gives equal weight to everything.

What the Knowledge Model does instead

Legible's Knowledge Model is an intermediate representation of your site — a structured understanding built from your content. It extracts and organizes six types of knowledge:

Concepts: The terminology and ideas your product is built around. For a project management tool, this might be 'sprint,' 'backlog,' 'velocity.' For a fintech company, 'instant settlement,' 'multi-currency account.'
Capabilities: What your product actually does. Not marketing claims, but grounded descriptions of features, backed by the pages they appear on.
Tasks: The workflows your users accomplish. 'Set up a new project,' 'Export a financial report,' 'Invite a team member.' These are what AI agents look for when helping users.
Key Pages: The most important pages on your site, ranked by type (homepage, product pages, docs) and relevance. Not every page is equally important, and the model reflects that.
FAQ: The questions people ask most, extracted from your content and FAQ pages. Formatted as Q&A pairs that AI can reference directly.
Constraints: Pricing tiers, technical limitations, compliance statements, legal requirements. These carry the highest confidence threshold because getting them wrong is costly.

How scoring works

Every extracted entity gets a composite score that determines whether it's included in the final output. This isn't a black box — you can see and influence every score in the dashboard.

The score combines four signals:

Source strength (35% weight): Where did this knowledge come from? Your homepage carries more weight than a blog post. Product pages outrank legal disclaimers.
Frequency (25% weight): How often does this concept appear across your site? Something mentioned on 8 pages is probably more central to your business than something mentioned once.
Retrieval relevance (25% weight): When AI asks 'what does this company do?', does this entity come back as a top result? This measures how well the entity answers the questions AI systems actually ask.
Semantic importance (15% weight): The AI extraction model's own assessment of how important this entity is in the context of your site.

Manual overrides

On top of the calculated score, you can apply manual overrides: boost entities you think are underweighted, pin items that must always appear, or exclude things that shouldn't be there.

Confidence thresholds

Not all errors are equal. A slightly wrong concept definition is annoying. A wrong compliance claim could be a liability. That's why Legible uses different confidence thresholds for different entity types:

Concepts and Capabilities: Auto-included above the base threshold. Items just below go to your review queue.
Tasks and Use Cases: Slightly lower bar because task descriptions are more flexible.
FAQ Items: Higher bar because Q&A pairs are often cited directly by AI systems.
Constraints and Legal: The highest bar. Claims about pricing, compliance, or legal terms require strong evidence from authoritative source pages.

Reviewing entities

Items that fall in the 'review' band appear in your dashboard with an amber indicator. You can approve, edit, or exclude them.

The review dashboard

The Knowledge tab in your Legible dashboard shows everything the model extracted. You'll see tabs for Concepts, Capabilities, Tasks, Key Pages, Constraints, and a Preview of the rendered output.

For each entity, you can see its name, definition, score, and which pages it was extracted from. Click the score to see a full breakdown of how it was calculated. Use the actions to adjust:

Boost: Increase the score if the model is underweighting something important.
Pin: Force-include an item regardless of its score. Useful for things you know should always be in the output.
Exclude: Remove an item that isn't relevant or is incorrect.
Edit: Override the name or definition with your own wording.

Publishing your model

The Preview tab shows the rendered llms-full.txt exactly as AI systems will see it. When you're happy with the model, hit Publish to push it live.

Automatic generation and updates

The first knowledge model is generated automatically when your site has enough content — at least 5 indexed pages with embeddings. You don't need to click anything. The first model is auto-published so your llms-full.txt starts serving immediately.

When your CMS content changes, Legible regenerates the model. Small changes (blog posts, FAQ edits) trigger partial updates. Large changes (homepage, pricing, product pages) trigger broader regeneration. A weekly full rebuild ensures everything stays consistent.

After the first auto-publish, subsequent models are saved as drafts so you can review changes before they go live. If you haven't made any manual edits (boosts, pins, exclusions), small updates auto-publish without interrupting you.

What makes this different

Other tools generate llms-full.txt by concatenating your pages. That gives AI a data dump. Legible gives AI structured knowledge — ranked, scored, and organized by what matters.

The result: AI systems that reference your site give more accurate answers, cite the right pages, and understand your product the way you want it understood. And you have full visibility and control over what they see.