Using Robots Meta Tag, data-nosnippet, and X-Robots-Tag noindex

Controlling Content in Google Search Results: A Technical Guide

This document explains how to manage how Google displays your content in search results using page-level and text-level settings.

Page-level settings are defined using either:

  • robots meta tag: Placed within the <head> section of individual HTML pages.

  • X-Robots-Tag HTTP header: Implemented in the server response for a URL.

Text-level settings use the data-nosnippet attribute within HTML elements to control the display of specific content within a page.

Important: These settings only work if Google's crawlers can access your pages. Blocking crawlers prevents them from discovering these instructions.

Blocking Non-Search Crawlers

While this document focuses on Google Search, you can block other bots like AdsBot-Google using specific rules. For instance, in your robots.txt file:

User-agent: AdsBot-Google
Disallow: /private-directory/

This would block AdsBot-Google from accessing the /private-directory/ on your website.

1. Using the Robots Meta Tag

The robots meta tag provides granular, page-specific control over indexing and serving in Google Search.

Implementation:

<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noindex" />
  <title>My Webpage</title>
</head>
<body>
  <!-- Page content -->
</body>
</html>

In this example, the noindex value instructs all search engines to exclude the page from search results.

Key Points:

  • Place the robots meta tag within the <head> section of your HTML.

  • Both the name and content attributes are case-insensitive.

CMS Users:

Content Management Systems (CMS) like Wix, WordPress, or Blogger often offer built-in settings for managing meta tags. Look for options related to "Search Engine Optimization" or "SEO Settings."

Targeting Specific Crawlers:

Google supports specific user agent tokens within the robots meta tag:

  • googlebot: Controls indexing and serving for Google's general web search results.

  • googlebot-news: Controls inclusion in Google News.

Examples:

  • Exclude a page from Google Search:

    <meta name="robots" content="noindex">
  • Exclude a page from Google News:

    <meta name="robots" content="googlebot-news: noindex">
  • Exclude a page from both Google Search and Google News:

    <meta name="robots" content="noindex, googlebot-news: noindex">
  • Multiple Crawlers, Different Rules:

    <meta name="robots" content="googlebot: noindex"> 
    <meta name="robots" content="googlebot-news: nosnippet">

    This instructs Googlebot to not index the page while only preventing snippets from appearing in Google News.

Important: For blocking non-HTML resources (PDFs, images, videos), use the X-Robots-Tag instead.

2. Using the X-Robots-Tag HTTP Header

The X-Robots-Tag provides similar control to the robots meta tag but is implemented within the HTTP header response for a URL.

Example:

HTTP/1.1 200 OK
X-Robots-Tag: noindex
Content-Type: text/html; charset=UTF-8

This instructs all crawlers to not index the page.

Key Points:

  • Any rule applicable in the robots meta tag can be used with the X-Robots-Tag.

  • Multiple X-Robots-Tag headers can be used within a single response.

  • Rules are not case-sensitive.

Multiple Rules:

HTTP/1.1 200 OK
X-Robots-Tag: noarchive, unavailable_after: Fri, 01 Jan 2024 00:00:00 GMT
Content-Type: text/html; charset=UTF-8

This example combines noarchive (prevents caching) and unavailable_after (sets an expiration date).

Targeting Specific Crawlers:

HTTP/1.1 200 OK
X-Robots-Tag: googlebot: noindex
X-Robots-Tag: bingbot: noarchive
Content-Type: text/html; charset=UTF-8

Here, googlebot is instructed to not index the page, while bingbot is told not to cache it.

Conflicting Rules:

When conflicting rules are present, the more restrictive rule always takes precedence. For example, nosnippet will override max-snippet:50.

3. Indexing and Serving Rules

Both the robots meta tag and X-Robots-Tag support a set of rules to control indexing and snippet generation:

Combining Rules:

  • Multiple rules within a single tag/header:

    <meta name="robots" content="noindex, nofollow" /> 
    X-Robots-Tag: noarchive, unavailable_after: Thu, 01 Dec 2023 00:00:00 GMT
  • Multiple meta tags:

    <meta name="robots" content="max-snippet:150" /> 
    <meta name="robots" content="max-image-preview:standard" /> 

4. Using the data-nosnippet HTML Attribute

For finer control over snippets, use the data-nosnippet attribute within HTML elements like <span>, <div>, and <section>.

Example:

<p>This is some text that can be included in a snippet.</p>
<p data-nosnippet>This text will be excluded from snippets.</p>

Key Points:

  • data-nosnippet is a boolean attribute; its presence alone is enough to take effect.

  • Ensure your HTML is valid and all tags are properly closed.

  • Avoid dynamically adding or removing data-nosnippet using JavaScript after page load, as it might not be reliably recognized.

5. Using Structured Data

While robots meta tags control general content extraction, structured data using schema.org vocabulary provides specific information to Google for enhanced search features.

Key Points:

  • Robots meta tag rules, except for max-snippet, do not affect structured data.

  • Use structured data to enhance search results with rich snippets and features.

Example:

Even with nosnippet applied, a page with properly implemented recipe structured data can still appear in recipe carousels.

6. Practical Implementation of X-Robots-Tag

The X-Robots-Tag is implemented within your web server's configuration files.

Apache Examples:

  • Blocking PDF indexing:

    <FilesMatch "\.pdf$">
      Header set X-Robots-Tag "noindex, nofollow"
    </FilesMatch>
  • Blocking image indexing:

    <FilesMatch "\.(png|jpe?g|gif)$">
      Header set X-Robots-Tag "noindex"
    </FilesMatch>

NGINX Example:

  • Blocking PDF indexing:

    location ~* \.pdf$ {
      add_header X-Robots-Tag "noindex, nofollow";
    }

7. Interaction with robots.txt

Remember: Robots meta tags and X-Robots-Tag headers are discovered during crawling. If a URL is disallowed in your robots.txt file, these instructions will be ignored. Ensure that pages with these directives are accessible to crawlers.

Last updated