FAQ

# Top questions about GEO, AI search, and Legible

A practical reference built around the most common search intents we see around AI visibility, llms.txt, clean content delivery, chatbots, and retrieval-ready websites.

Last updated March 23, 2026

GEO Basics

# GEO Basics

What is GEO?

GEO stands for Generative Engine Optimization. It is the practice of making your content easier for AI systems like ChatGPT, Claude, Perplexity, and Google AI experiences to discover, understand, cite, and reuse in answers. Where SEO focuses on ranking in search results, GEO focuses on becoming part of the answer itself. This means structuring content so AI systems can extract reliable facts, attribute them to your brand, and present them confidently to users. GEO encompasses clean content delivery, machine-readable metadata, discovery files like llms.txt and ai-sitemap.json, and analytics that show how AI systems interact with your content over time.

How is GEO different from SEO?

SEO is mainly about ranking in search results and winning clicks from a list of ten blue links. GEO is about helping your content become part of the synthesized answer that AI systems generate. In practice, strong SEO still helps because well-structured, authoritative content tends to perform better in both paradigms. But GEO adds layers that SEO does not cover: AI-readable formatting like clean Markdown delivery, clearer semantic structure, better source and entity signals, discoverability files that guide AI crawlers, and publisher-level controls over how AI systems may use your content. Teams that invest in both SEO and GEO tend to see the strongest results because they are visible in traditional search and in AI-generated answers.

Why does GEO matter now?

More and more discovery is happening inside AI-generated answers rather than traditional blue-link search results. Research from SparkToro and Datos shows that 58% of Google searches now end without a click, and AI answer engines like ChatGPT, Perplexity, and Google AI Overviews are accelerating that shift. If your content is hard for these systems to parse, summarize, or trust, your brand may not appear in the answers users see. At the same time, AI-referred traffic converts at significantly higher rates than standard organic traffic, which means the visitors who do arrive through AI citations are often more valuable. The sooner teams invest in GEO, the stronger their position as AI-mediated discovery becomes the norm.

How do I optimize for AI search?

Start with clear, source-of-truth content that directly answers the questions your audience asks. Then make it easier for AI systems to consume by improving semantic structure, adding structured data using Schema.org vocabularies, creating discoverability files like llms.txt and ai-sitemap.json, delivering clean Markdown instead of cluttered HTML, and keeping content fresh with regular updates. You should also monitor which AI crawlers are visiting your site and how often, so you can measure whether your GEO efforts are producing results. Legible automates most of this stack — content conversion, discovery file generation, crawler analytics, and permission controls — so teams can move faster without managing the full technical implementation manually.

What is answer engine optimization?

Answer engine optimization, sometimes abbreviated AEO, is closely related to GEO. Both focus on helping your content become useful inside direct answers rather than just search result listings. In practice, teams tend to use AEO when talking about optimizing for specific answer formats and user intent patterns, and GEO when talking about broader AI visibility across multiple systems and channels. The techniques overlap heavily: clear structure, authoritative sourcing, entity-rich content, and machine-readable delivery all matter in both paradigms. Whether you call it AEO or GEO, the core principle is the same — make your content easy for AI systems to find, understand, trust, and cite.

Are AI Overviews and AI Mode changing SEO?

Yes, and the impact is growing. Google AI Overviews push more discovery into summarized answers that appear above traditional search results, and AI Mode takes this further by generating full conversational responses. This means brands need to think beyond rankings alone. Traditional SEO still matters for visibility, but teams also need content that AI systems can trust, cite, and reuse inside generated experiences. The key difference is that AI Overviews typically cite only two to seven sources per answer instead of listing ten results, so the bar for being included is higher. Content that is well-structured, semantically clear, and delivered in formats AI systems prefer — like clean Markdown — has a better chance of becoming one of those cited sources.

AI Discovery

# AI Discovery

How do I get cited by ChatGPT or Perplexity?

There is no single switch that guarantees citations, but you can significantly improve your odds by following a few key practices. First, publish original, well-structured content that answers specific questions clearly and authoritatively. Second, expose clear source pages with strong entity signals so AI systems can identify who published the content and why it is trustworthy. Third, reduce noise around your main content by removing unnecessary navigation, scripts, and boilerplate that inflate token usage without adding value. Fourth, make discovery signals easier for AI systems to follow by publishing llms.txt, ai-sitemap.json, and structured data. Legible helps with all of these steps by converting your content into clean Markdown, generating discovery files automatically, and giving you analytics to track which AI systems are reading your content.

What is llms.txt?

llms.txt is a machine-readable file, typically served at the root of your domain, that helps AI systems understand the important content on your site. Think of it as a curated guide for large language models, pointing them to the pages, documents, and answers that matter most. Unlike a traditional sitemap that lists every URL, llms.txt is selective — it highlights the content you most want AI systems to consume and cite. The format typically includes a brief site description followed by categorized links to key resources. Legible generates and maintains llms.txt automatically from your indexed content, keeping it in sync as you publish new pages or update existing ones, so your AI discovery layer stays current without manual maintenance.

Do I need llms.txt for AI visibility?

Not every AI system depends on llms.txt today, but it is a strong and increasingly common discovery signal that more AI crawlers are beginning to respect. It helps you present the best parts of your content more clearly, which is especially valuable when your site contains a lot of navigation, templates, or mixed page types that can confuse AI systems. Even for systems that do not explicitly read llms.txt, the discipline of curating your most important content into a clear, machine-readable format tends to improve your overall GEO posture. For most teams, the effort of setting up llms.txt is minimal compared to the potential upside, and Legible generates it automatically so there is no ongoing maintenance burden.

What is ai-sitemap.json?

ai-sitemap.json is an AI-focused sitemap format that flags content specifically meant for AI consumption. It complements traditional XML sitemaps by giving AI systems a cleaner, more targeted list of pages or documents to use for understanding and retrieval. While a standard sitemap might include thousands of URLs including category pages, tag archives, and other low-value pages, ai-sitemap.json focuses on the content that matters most for AI citation and reuse. It typically includes metadata about each entry such as content type, last update time, and topic categorization, helping AI crawlers prioritize what to read and index. Legible generates ai-sitemap.json automatically alongside llms.txt as part of your AI discovery layer.

Does robots.txt control AI crawlers?

robots.txt can help communicate crawl preferences to many bots, including some AI crawlers, but support varies significantly by vendor and use case. Some major AI systems respect robots.txt directives, while others may not check it at all or interpret it differently than traditional search engine crawlers. It is best used as part of a broader policy approach rather than as your only control mechanism. For more granular control, teams should consider combining robots.txt with other signals such as Content-Signal headers, per-page metadata, and structured permission declarations. Legible helps teams express these preferences more clearly through coordinated policy signals across multiple layers, giving you a more reliable way to communicate your content access preferences to AI systems.

Can I block AI training but still allow citations?

Yes, many teams want exactly this — they are happy for AI systems to read and cite their content, but they do not want it used for model training. The challenge is that today's standards are still evolving and there is no single universally enforced mechanism for expressing this distinction. Some AI providers respect specific robots.txt directives or meta tags for training opt-out, while others have their own proprietary mechanisms. Legible helps express these preferences more clearly through coordinated policy signals, discovery files, and content-level metadata. By combining multiple signals — Content-Signal headers, structured data, and crawler-specific directives — you communicate your intent more reliably than relying on any single mechanism alone.

Does llms.txt improve citations today?

It can help with discovery and clarity, but it should not be treated as a magic ranking factor for AI citations. Public analysis so far suggests llms.txt is useful as a low-friction signal that helps AI systems find and prioritize your most important content. However, the bigger drivers of citation quality are still content quality, source authority, semantic structure, and how easy the page is for AI systems to consume cleanly. Think of llms.txt as one important piece of a broader GEO strategy rather than a standalone solution. Teams that combine llms.txt with clean Markdown delivery, structured data, and strong source content tend to see the best results in terms of AI visibility and citation frequency.

How do I show up in Google AI Overviews?

There is no separate submission process for Google AI Overviews — Google draws from its existing index to generate these responses. The best approach is to publish strong source content that directly answers common questions, keep your technical SEO healthy with fast page loads and proper markup, use clear structure and entity signals so Google can understand what your content is about, and make the page easier for machines to understand through clean formatting and structured data. GEO helps because it improves how reusable your content is when Google or another system needs a reliable source. Pages that are semantically clear, well-attributed, and delivered in formats AI can process efficiently are more likely to be selected for AI Overview citations.

Can AI search reduce website clicks?

Yes, especially for informational queries where AI can synthesize a complete answer without requiring the user to visit any website. Research shows that more searches now end in summaries or direct answers, and this trend is accelerating as AI capabilities improve. That is why it matters to be the source that gets cited and trusted inside those answers, not just the one that might get clicked. When your brand appears as a cited source in an AI-generated answer, you gain visibility and credibility even when users do not click through. And when they do click, the traffic tends to be higher quality because the user has already seen your brand referenced as an authoritative source. GEO helps you optimize for this new reality by ensuring your content is easy for AI to discover, understand, and attribute.

Technical Setup

# Technical Setup

What is Vary: Accept for Markdown delivery?

Vary: Accept is an HTTP header used when the same URL can return different content formats depending on what the client asks for in its Accept header. In Legible proxy mode, this technique can support Markdown-aware delivery while preserving the original HTML page for human visitors — an AI crawler requesting text/markdown gets the clean version, while a browser gets the standard webpage. In proxy-free mode, Legible uses separate hosted URLs for the AI-readable version instead, which avoids the complexity of content negotiation and works with any hosting setup. Both approaches achieve the same goal: giving AI systems clean content while keeping the human experience unchanged. Most teams start with proxy-free delivery because it requires zero changes to their existing infrastructure.

Why does clean Markdown matter for AI?

AI systems work on tokens, and every token spent on navigation menus, scripts, footer links, and HTML formatting code is a token that could have been spent on your actual message. A typical webpage wastes over 15,000 tokens on this structural noise before the AI system ever reaches the real content. Clean Markdown removes all of that clutter and delivers just the content that matters, which typically reduces token usage by around 80%. This means the AI system can read more of your content within its context window, produce more accurate summaries, and cite your brand more reliably. For teams publishing hundreds or thousands of pages, the compound effect of cleaner delivery can significantly improve how well AI systems understand and represent your content across all interactions.

Do I need structured data for GEO?

Structured data remains an important part of a comprehensive GEO strategy because it helps machines understand what a page is about, who published it, what entities it refers to, and how the content relates to other resources. Using Schema.org vocabularies like Article, FAQ, Organization, and Product gives AI systems stronger signals about the type and trustworthiness of your content. However, structured data alone is not a complete GEO strategy — it works best when combined with clean content delivery, discovery files, and strong source content. Think of structured data as a trust and comprehension signal that helps AI systems interpret your content correctly, while Markdown delivery and discovery files handle the distribution and accessibility layers.

Can I do GEO without Cloudflare?

Yes, absolutely. Some setups use Cloudflare or other reverse proxies for content delivery, but Legible also supports proxy-free delivery with hosted AI-readable endpoints and discovery tags that work with any hosting provider. That gives teams a path to llms.txt, ai-sitemap.json, and clean Markdown delivery without reworking their edge stack or adding new infrastructure dependencies. Proxy-free mode works by adding a small set of meta tags to your pages that point AI systems to the hosted Markdown versions, which Legible generates and serves automatically. This approach works with WordPress, Webflow, Drupal, Squarespace, and any custom-built website. Most teams can be up and running in under five minutes regardless of their hosting setup.

Does GEO require developers?

It can, depending on how you implement it. Building a complete GEO infrastructure manually — content conversion, discovery file generation, crawler analytics, permission management, and content negotiation — often turns into a real technical project that requires developer time and ongoing maintenance. Legible is designed to reduce that work so marketing teams and founders can move faster, while still giving technical teams a clean, reviewable implementation path. The goal is to let non-technical team members manage their AI visibility without needing to write code, while developers can inspect everything Legible generates and integrate it into their existing workflows if they want deeper control.

How do I measure AI traffic to my website?

Measuring AI traffic requires identifying AI crawler requests and, where possible, tracking AI-driven referral behavior. Standard analytics tools like Google Analytics are not designed to surface this data because AI crawlers typically do not execute JavaScript and their visits do not show up as pageviews. You need server-side or edge-layer detection that can identify AI user agents, track request patterns, and correlate them with content consumption metrics. Legible surfaces AI-specific crawler activity so teams can see which systems — ChatGPT, Claude, Perplexity, Google AI, and others — are reading their content, how often they visit, which pages they access most, and how that activity changes over time. This gives teams a clear feedback loop for their GEO efforts.

Do I need a separate AI version of every page?

Not necessarily, but you do need a cleaner machine-readable path to the content on each important page. In some setups that happens on the same URL through content negotiation, where the server returns Markdown to AI clients and HTML to browsers based on the Accept header. In others, like proxy-free delivery, it happens through hosted Markdown endpoints that Legible generates and maintains automatically, with discovery tags on your original pages pointing AI systems to the clean versions. Legible supports both patterns, and the proxy-free approach works without any changes to your existing website. The key insight is that AI systems and human visitors have fundamentally different needs from the same content, and serving both well does not require duplicating your content — it requires a smarter delivery layer.

How does Legible handle page-level meta robots tags?

Legible detects both generic meta robots tags and bot-specific variants like meta name GPTBot or meta name Google-Extended in its GEO Readiness audit. If a page has a noindex directive, the audit flags it as a critical visibility issue and tells you exactly which crawlers are affected. Beyond detection, Legible can also generate and manage page-level indexing directives through its crawler policy. You configure which bots should be allowed or restricted, and Legible emits the corresponding X-Robots-Tag headers on your HTML responses — or optionally injects meta tags directly into the page head for compatibility with on-page SEO audit tools. This means you can allow Google Search to index a page while restricting GPTBot, all from a single policy configuration.

What are freshness signals and why do they matter for AI?

Freshness signals are HTTP response headers — specifically Last-Modified and ETag — that tell crawlers whether content has changed since their last visit. Without these headers, every crawl is a full re-download that wastes bandwidth and makes your content look undated. Legible automatically generates both headers on every AI-readable response. Last-Modified is set to the content's actual publication or modification date from your CMS, and ETag is a content-based hash. When a crawler returns with an If-None-Match or If-Modified-Since header, Legible can respond with a lightweight 304 Not Modified instead of re-transmitting the full page. This reduces bandwidth for high-frequency AI crawlers and earns up to 3 points on your GEO Readiness score.

Can I set different rules for different AI crawlers?

Yes. Legible's crawler policy lets you define a default posture for all bots and then override it for specific crawlers. For example, you might allow GPTBot and ClaudeBot to discover and cite your content while blocking CCBot from crawling at all and restricting Google-Extended from indexing. These rules are enforced at every layer: robots.txt, bot denial at the edge, X-Robots-Tag headers, Content-Signal headers, and content delivery decisions. Legible recognizes over a dozen AI crawlers including GPTBot, OAI-SearchBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot, Amazonbot, Applebot-Extended, Bytespider, Cohere-ai, FacebookBot, and meta-externalagent. You manage it all from one policy configuration rather than editing robots.txt, CMS templates, and CDN rules separately.

Chatbots & RAG

# Chatbots & RAG

What is RAG and how does it relate to website content?

RAG stands for retrieval-augmented generation. It means an AI system looks up relevant content from a knowledge base before generating an answer, rather than relying solely on what it learned during training. Your website pages, FAQ entries, and uploaded documents can all become part of that retrieval layer, which makes chatbot answers more accurate, more current, and better grounded in your actual business information. Without RAG, chatbots often hallucinate or give generic responses. With RAG powered by your real content, they can provide specific, trustworthy answers that reflect your brand voice and current offerings. Legible turns your website into a retrieval-ready knowledge layer that can power chatbot experiences across platforms including Intercom, Zendesk, and custom implementations.

Can I chat with my website content?

Yes. Legible turns your website pages, FAQ entries, and uploaded documents into a retrieval-ready knowledge layer that powers conversational search experiences. You can use Content Chat for internal testing and content exploration — ask your website anything and get answers grounded in your actual published content. For customer-facing use cases, you can connect this same knowledge layer to chatbot platforms like Intercom and Zendesk, or build custom chat experiences using the Legible API. The key advantage over training a chatbot from scratch is that your answers are always grounded in your real, current content, which reduces hallucination and keeps responses consistent with what your website actually says.

Can Legible power Intercom or Zendesk bots?

Yes. Legible's content layer can feed Intercom, Zendesk, and custom chat implementations with retrieval-ready content from your website. The goal is to let your existing content power support answers more accurately without forcing you to rebuild your whole help stack or manually maintain a separate knowledge base. When a customer asks a question in Intercom or Zendesk, the bot retrieves relevant content from your Legible knowledge layer and generates an answer grounded in your actual website content. This means support answers stay consistent with your published information and update automatically as your content changes. For most teams, this dramatically reduces the time spent maintaining chatbot training data while improving answer quality.

Can content that is not on my website still be used by AI systems?

Yes. Legible can turn FAQ entries and uploaded documents into retrieval-ready content for chat experiences, even if that content is not published as a webpage. FAQ entries can also become part of your AI-readable layer through llms.txt and ai-sitemap.json, making them discoverable by external AI systems. This helps teams expose important answers and product details even before they are fully published as website pages. For example, you might upload internal product documentation, sales enablement materials, or detailed technical specifications that your support chatbot needs but that you do not want to publish on your public website. Legible handles all of these content types and makes them available for retrieval without requiring them to live as standalone web pages.

Should I use website pages, FAQs, or uploaded documents for chatbot knowledge?

Each content type serves a different purpose in building a strong knowledge layer. Use website pages for canonical public content that represents your brand's authoritative position on topics — these are your product pages, blog posts, and documentation articles. Use FAQ entries for short, high-priority answers to specific questions that customers frequently ask, formatted for quick retrieval and direct citation. Use uploaded documents for deeper support context, internal knowledge, technical specifications, or content that you want your chatbot to reference but may not want to publish as standalone web pages. Legible supports all three content types so you can build a comprehensive knowledge layer that gives your chatbot the best possible foundation for accurate, trustworthy answers.

Legible

# Legible

What does Legible actually do?

Legible helps teams make their content easier for AI systems to find, cite, and reuse in answers and conversations. It works by connecting to your existing website or CMS, converting your content into clean Markdown that AI systems prefer, generating and maintaining discovery files like llms.txt and ai-sitemap.json, managing content policy signals and permissions, and providing analytics that show which AI systems are reading your content and how often. Beyond AI discoverability, Legible also supports retrieval-ready outputs for chat and RAG use cases, so your website content can power chatbot experiences across platforms like Intercom and Zendesk. The goal is to handle the entire AI content infrastructure stack so teams can focus on creating great content rather than managing the technical delivery layer.

Who is Legible for?

Legible is designed for marketing teams, founders, content teams, and support leaders who want better AI visibility and better AI-powered answers without a heavy engineering project. It is particularly valuable for organizations that publish a significant amount of content — product pages, blog posts, documentation, FAQs — and want that content to work harder in AI-mediated discovery channels. Legible also stays credible with developers and technical teams by using standards-based delivery formats, clear metadata patterns, and reviewable implementation paths. Whether you are a solo founder trying to get your startup's content cited by ChatGPT, or a marketing director at a mid-market company looking to measure and improve AI traffic, Legible gives you the tools and visibility to make informed decisions about your AI content strategy.

When should I use Legible instead of doing this manually?

Manual setup can work for highly technical teams with dedicated developer time to build and maintain discovery files, content delivery logic, permission systems, and analytics dashboards themselves. However, that approach often results in implementation drift — files go stale, analytics gaps appear, and the system gradually falls out of sync with your actual content. Legible makes more sense when you want a faster launch measured in minutes rather than weeks, less ongoing maintenance burden, clearer visibility into AI activity with real-time analytics, and one integrated system that also supports chat and retrieval use cases. Most teams find that the cost of maintaining a manual GEO infrastructure exceeds the cost of Legible within the first month, especially when you factor in the developer time needed for ongoing updates and troubleshooting.

# Want the docs and setup guides too?

Explore detailed guides for proxy-free setup, Cloudflare, llms.txt, chatbot integrations, FAQ content, document uploads, and the full Legible knowledge architecture.