Page-Level Robots Meta: Control AI Indexing

Why this matters

A page can have perfect discovery files and still be invisible to AI crawlers because of a single `noindex` meta tag. Worse, bot-specific meta tags can selectively block AI systems while allowing traditional search — and most audit tools won't tell you.

Legible now detects, reports, and manages page-level robots directives across both generic and bot-specific meta tags, giving you full control over AI visibility at the page level.

The problem: invisible pages

Most websites use `<meta name="robots" content="index, follow">` to tell crawlers that a page should be indexed. But a growing number of sites are adding bot-specific variants to control AI crawlers separately:

These bot-specific meta tags are invisible to most SEO audit tools, which only check the generic `robots` meta. A page can appear fully indexed in a traditional audit while being completely blocked for AI systems.

<!-- This page looks indexable to traditional SEO tools... -->
<meta name="robots" content="index, follow">

<!-- ...but is completely blocked for these AI crawlers -->
<meta name="GPTBot" content="noindex">
<meta name="Google-Extended" content="noindex">
<meta name="ClaudeBot" content="noindex">

What Legible detects

Legible's GEO Readiness audit inspects both generic and bot-specific meta robots tags. The audit scans for these meta name attributes: `robots` (generic), `googlebot`, `google-extended`, `bingbot`, `GPTBot`, `ClaudeBot`, `PerplexityBot`, and `CCBot`.

For each detected tag, Legible parses the directive content and reports whether it blocks indexing, following, or both. The audit distinguishes between pages that block all crawlers versus pages that selectively block specific AI bots.

Generic `noindex` is flagged as a critical issue affecting all crawler visibility.
Bot-specific blocks are reported individually so you know exactly which AI systems are affected.
The `none` directive (equivalent to `noindex, nofollow`) is detected and explained.
Emerging directives like `noai` and `noimageai` are recognized and surfaced.

How Legible manages meta robots

Beyond detection, Legible can generate and enforce page-level indexing controls through the crawler policy system. You configure your intent once — Legible propagates it into the correct technical signals.

X-Robots-Tag headers: Legible emits per-bot `X-Robots-Tag` headers on HTML responses (e.g., `X-Robots-Tag: GPTBot: noindex`). This is functionally equivalent to meta tags for all compliant crawlers.
Meta tag injection: For maximum compatibility with SEO audit tools, Legible can optionally inject `<meta>` tags directly into the HTML `<head>` using Cloudflare's streaming HTML parser — no buffering, no performance penalty.
Policy-driven: Both mechanisms are controlled through the same crawler policy that drives `robots.txt` generation and bot denial rules. One policy, consistent enforcement.

Configuring per-bot directives

In the Legible dashboard, per-bot robots directives are part of the crawler policy settings. You can set a default directive that applies to all crawlers, plus individual overrides for specific bots.

For example, you might set a default of `index, follow` but override GPTBot with `noindex` to prevent OpenAI from indexing while keeping other AI systems and search engines unaffected.

{
  "robotsTagDirectives": {
    "default": "index, follow",
    "perBot": {
      "GPTBot": "noindex",
      "Google-Extended": "noindex"
    }
  }
}

How this differs from robots.txt

`robots.txt` controls whether a crawler can fetch a URL at all. Meta robots (and `X-Robots-Tag`) control whether a crawler should index or follow the content it has already fetched.

The distinction matters because some AI systems need to read your content to generate answers (fetch = yes) but you may not want them to store it in their index (index = no). Meta robots gives you that granularity.

Legible manages both layers — `robots.txt` for crawl access and meta robots for indexing policy — from one consistent crawler policy.