What is Vary: Accept for Markdown delivery?
Vary: Accept is an HTTP header used when the same URL can return different content formats depending on what the client asks for in its Accept header. In Legible proxy mode, this technique can support Markdown-aware delivery while preserving the original HTML page for human visitors — an AI crawler requesting text/markdown gets the clean version, while a browser gets the standard webpage. In proxy-free mode, Legible uses separate hosted URLs for the AI-readable version instead, which avoids the complexity of content negotiation and works with any hosting setup. Both approaches achieve the same goal: giving AI systems clean content while keeping the human experience unchanged. Most teams start with proxy-free delivery because it requires zero changes to their existing infrastructure.
Why does clean Markdown matter for AI?
AI systems work on tokens, and every token spent on navigation menus, scripts, footer links, and HTML formatting code is a token that could have been spent on your actual message. A typical webpage wastes over 15,000 tokens on this structural noise before the AI system ever reaches the real content. Clean Markdown removes all of that clutter and delivers just the content that matters, which typically reduces token usage by around 80%. This means the AI system can read more of your content within its context window, produce more accurate summaries, and cite your brand more reliably. For teams publishing hundreds or thousands of pages, the compound effect of cleaner delivery can significantly improve how well AI systems understand and represent your content across all interactions.
Do I need structured data for GEO?
Structured data remains an important part of a comprehensive GEO strategy because it helps machines understand what a page is about, who published it, what entities it refers to, and how the content relates to other resources. Using Schema.org vocabularies like Article, FAQ, Organization, and Product gives AI systems stronger signals about the type and trustworthiness of your content. However, structured data alone is not a complete GEO strategy — it works best when combined with clean content delivery, discovery files, and strong source content. Think of structured data as a trust and comprehension signal that helps AI systems interpret your content correctly, while Markdown delivery and discovery files handle the distribution and accessibility layers.
Can I do GEO without Cloudflare?
Yes, absolutely. Some setups use Cloudflare or other reverse proxies for content delivery, but Legible also supports proxy-free delivery with hosted AI-readable endpoints and discovery tags that work with any hosting provider. That gives teams a path to llms.txt, ai-sitemap.json, and clean Markdown delivery without reworking their edge stack or adding new infrastructure dependencies. Proxy-free mode works by adding a small set of meta tags to your pages that point AI systems to the hosted Markdown versions, which Legible generates and serves automatically. This approach works with WordPress, Webflow, Drupal, Squarespace, and any custom-built website. Most teams can be up and running in under five minutes regardless of their hosting setup.
Does GEO require developers?
It can, depending on how you implement it. Building a complete GEO infrastructure manually — content conversion, discovery file generation, crawler analytics, permission management, and content negotiation — often turns into a real technical project that requires developer time and ongoing maintenance. Legible is designed to reduce that work so marketing teams and founders can move faster, while still giving technical teams a clean, reviewable implementation path. The goal is to let non-technical team members manage their AI visibility without needing to write code, while developers can inspect everything Legible generates and integrate it into their existing workflows if they want deeper control.
How do I measure AI traffic to my website?
Measuring AI traffic requires identifying AI crawler requests and, where possible, tracking AI-driven referral behavior. Standard analytics tools like Google Analytics are not designed to surface this data because AI crawlers typically do not execute JavaScript and their visits do not show up as pageviews. You need server-side or edge-layer detection that can identify AI user agents, track request patterns, and correlate them with content consumption metrics. Legible surfaces AI-specific crawler activity so teams can see which systems — ChatGPT, Claude, Perplexity, Google AI, and others — are reading their content, how often they visit, which pages they access most, and how that activity changes over time. This gives teams a clear feedback loop for their GEO efforts.
Do I need a separate AI version of every page?
Not necessarily, but you do need a cleaner machine-readable path to the content on each important page. In some setups that happens on the same URL through content negotiation, where the server returns Markdown to AI clients and HTML to browsers based on the Accept header. In others, like proxy-free delivery, it happens through hosted Markdown endpoints that Legible generates and maintains automatically, with discovery tags on your original pages pointing AI systems to the clean versions. Legible supports both patterns, and the proxy-free approach works without any changes to your existing website. The key insight is that AI systems and human visitors have fundamentally different needs from the same content, and serving both well does not require duplicating your content — it requires a smarter delivery layer.
How does Legible handle page-level meta robots tags?
Legible detects both generic meta robots tags and bot-specific variants like meta name GPTBot or meta name Google-Extended in its GEO Readiness audit. If a page has a noindex directive, the audit flags it as a critical visibility issue and tells you exactly which crawlers are affected. Beyond detection, Legible can also generate and manage page-level indexing directives through its crawler policy. You configure which bots should be allowed or restricted, and Legible emits the corresponding X-Robots-Tag headers on your HTML responses — or optionally injects meta tags directly into the page head for compatibility with on-page SEO audit tools. This means you can allow Google Search to index a page while restricting GPTBot, all from a single policy configuration.
What are freshness signals and why do they matter for AI?
Freshness signals are HTTP response headers — specifically Last-Modified and ETag — that tell crawlers whether content has changed since their last visit. Without these headers, every crawl is a full re-download that wastes bandwidth and makes your content look undated. Legible automatically generates both headers on every AI-readable response. Last-Modified is set to the content's actual publication or modification date from your CMS, and ETag is a content-based hash. When a crawler returns with an If-None-Match or If-Modified-Since header, Legible can respond with a lightweight 304 Not Modified instead of re-transmitting the full page. This reduces bandwidth for high-frequency AI crawlers and earns up to 3 points on your GEO Readiness score.
Can I set different rules for different AI crawlers?
Yes. Legible's crawler policy lets you define a default posture for all bots and then override it for specific crawlers. For example, you might allow GPTBot and ClaudeBot to discover and cite your content while blocking CCBot from crawling at all and restricting Google-Extended from indexing. These rules are enforced at every layer: robots.txt, bot denial at the edge, X-Robots-Tag headers, Content-Signal headers, and content delivery decisions. Legible recognizes over a dozen AI crawlers including GPTBot, OAI-SearchBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot, Amazonbot, Applebot-Extended, Bytespider, Cohere-ai, FacebookBot, and meta-externalagent. You manage it all from one policy configuration rather than editing robots.txt, CMS templates, and CDN rules separately.