Legible extracts content from your website and converts it into clean, AI-readable Markdown. Depending on how your site is built, different extraction modes produce better results.
This guide explains how to choose between Reader mode and Full page extraction, how to configure them site-wide or per page, and how to recrawl individual pages to test your changes.
Two Extraction Modes
When Legible crawls your website, it uses one of two extraction modes to turn your HTML into clean Markdown. The right choice depends on how your site is built.
- Reader mode uses Mozilla's Readability algorithm to isolate the main article content. It strips navigation, sidebars, footers, and other layout elements. This is the default and works best for blogs, articles, documentation, and standard CMS pages.
- Full page mode extracts content from the entire page DOM, removing only scripts, styles, and navigation. This is better for sites built with visual page builders like SiteOrigin, Elementor, Divi, WPBakery, or Beaver Builder, where the main content is not always inside a single article container.
When To Use Reader Mode
Reader mode is the default because it works well for the majority of websites. It is the best choice when your content follows standard HTML patterns with a clear article or main content area.
- Blog posts and articles with a clear title and body.
- Documentation sites with structured content.
- Standard WordPress, Webflow, or Squarespace pages.
- Any page where you can use the browser's built-in Reader View and see the full content.
When To Use Full Page Mode
Full page mode is designed for sites where Readability strips too much content. This commonly happens with page builder plugins that distribute content across multiple sections, widgets, or layout containers.
- Pages built with SiteOrigin Page Builder, Elementor, Divi, WPBakery, or Beaver Builder.
- Landing pages with multiple content sections outside a single article wrapper.
- Pages where Reader mode returns very short or incomplete content.
- Custom-coded pages with non-standard HTML structure.
Configuring Extraction Mode For Your Entire Site
You can set a site-wide extraction mode that applies to all pages by default. This is useful when your entire site uses a page builder or when you know all your pages work well with one mode.
- Go to Dashboard > your site > Settings > Discovery tab.
- Find the Content Extraction Mode section.
- Choose Reader mode (default) or Full page from the dropdown.
- Click Re-crawl in the Content Library to re-extract all pages with the new setting.
Overriding Extraction Mode For Individual Pages
Sometimes your site uses a mix of standard pages and page-builder pages. In that case, you can override the extraction mode on specific pages without changing the site-wide setting.
- Go to Dashboard > your site > Content Library.
- Click on the page you want to adjust.
- Switch to the Classify tab in the detail panel.
- Under Extraction mode, choose Use site default, Reader mode, or Full page.
- Click the Re-crawl button in the top right of the detail panel to re-extract the page with the new setting.
Re-Crawling Individual Pages
After changing the extraction mode on a page, you need to re-crawl it to apply the new setting. Legible lets you re-crawl individual pages without re-crawling your entire site.
- Open the page in the Content Library detail panel.
- Click the Re-crawl button in the panel header.
- The page will be re-extracted using the current extraction mode setting. This uses the per-page override if set, otherwise falls back to the site-wide default.
- The updated content will appear in the Preview and Raw Markdown tabs within a few seconds.
How To Tell Which Mode Works Better
The easiest way to tell if your extraction mode is working is to check the content in the Content Library. If pages are missing sections, showing very short content, or including unwanted CSS or layout artifacts, try switching modes.
- Open a page in the Content Library and check the Preview tab. Does it show the full page content?
- Switch to the Raw Markdown tab. Is the markdown clean and complete?
- If content is missing, try switching from Reader mode to Full page (or vice versa) and re-crawling the page.
- Compare the token count before and after. A significantly higher token count after switching to Full page mode usually means Reader mode was stripping real content.
Common Scenarios
Standard WordPress blog → Reader mode (default)
WordPress + SiteOrigin builder → Full page mode
WordPress + Elementor → Full page mode
Webflow → Reader mode (default)
Squarespace → Reader mode (default)
Custom HTML landing page → Try both, compare results
Documentation site → Reader mode (default)
Product pages with widgets → Full page modeWhat Legible Does After Extraction
Regardless of which mode you choose, Legible applies a cleanup step after extraction. This removes any residual CSS, inline styles, empty elements, and other noise. The result is clean Markdown ready for AI systems, llms.txt, and chatbot retrieval.
