Legible Knowledge Architecture: Integrate Website, Docs, FAQs for AI Chat & Discovery

Why this matters

Legible is easiest to understand when you see it as one knowledge layer with multiple outputs. Website pages, FAQs, and uploaded documents all become part of the same content system, and that system can then power AI discovery, clean Markdown delivery, and chatbot retrieval.

This guide is the high-level map customers can use to explain how Legible connects GEO, content operations, and AI assistants.

The Big Picture

Different content sources flow into one shared Legible knowledge layer.
That shared layer powers both AI discovery and chatbot retrieval.
Customers do not need one system for GEO and another system for support AI.

Public website pages ----\
Manual FAQ items -------+--> Legible content layer --> Clean AI delivery --> llms.txt / ai-sitemap / Markdown
Uploaded documents ----/

Legible content layer --> chunking + retrieval prep --> Content Chat / Intercom / Zendesk / custom assistants

Content Sources

Legible can work with three main knowledge sources. Public website content is the canonical external source. FAQ items let teams add direct answers inside the product. Uploaded documents let teams bring in supporting material that may not belong on the public site.

Website pages: the public source of truth for articles, docs, product pages, and guides.
FAQ items: quick-answer content managed directly in Legible.
Uploaded documents: PDFs, Word docs, Markdown, and text files that improve context.

The Legible Content Layer

Once content enters Legible, it is normalized into a consistent internal representation. That is what makes it possible for the same content to support discovery files, clean Markdown, and retrieval-ready chunks without every team rebuilding the same work separately.

This is the center of the system. It is where content stops being just website HTML or uploaded files and becomes structured AI-ready knowledge.

AI Discovery Outputs

One branch of the architecture is about making content discoverable and legible to external AI systems. That includes `llms.txt`, `ai-sitemap.json`, hosted Markdown pages, and HTML discovery tags in proxy-free setups.

`llms.txt` helps AI systems discover important content.
`ai-sitemap.json` gives a machine-friendly map of AI-readable content.
Markdown pages reduce token waste and strip layout noise.
Discovery tags connect the main website to hosted AI-readable endpoints when proxy mode is not available.

Retrieval And Chunking Outputs

The other branch of the architecture is about retrieval. Legible chunks content, preserves heading context, and prepares it for semantic search. Internally, embeddings and retrieval logic help find the right passages. Externally, customers can consume the result through the AI Export API.

Chunking turns large documents into retrievable passages.
Heading-aware structure helps answers stay grounded in the right section.
The AI Export API exposes documents and chunks to downstream assistants.

Where Content Chat Fits

Content Chat is the easiest place to see the architecture in action. It uses the same retrieval-ready knowledge layer customers can later connect to Intercom, Zendesk, or a custom application.

That is why many teams start there first. It lets them validate content quality before choosing the final deployment surface.

Where Intercom, Zendesk, And Custom Bots Fit

Intercom, Zendesk, and custom assistants all sit downstream of the same Legible knowledge layer. The difference is not the content foundation. The difference is the delivery channel and how much control the customer wants over the assistant experience.

Intercom: customer-facing support inside Intercom workflows.
Zendesk: service and support operations grounded in current content.
Custom API: full control over the application, model orchestration, and UI.

What you get from this setup

One content improvement can benefit both AI discoverability and chatbot quality.
Teams can add missing knowledge through FAQs and uploaded docs without waiting for a full website release.
Customers avoid building separate pipelines for SEO/GEO, support AI, and retrieval infrastructure.
Legible gives a single operational layer instead of a patchwork of CMS plugins, manual exports, and custom RAG code.