Using Robots Meta Tag, data-nosnippet, and X-Robots-Tag noindex

Controlling Content in Google Search Results: A Technical Guide

This document explains how to manage how Google displays your content in search results using page-level and text-level settings.

Page-level settings are defined using either:

robots meta tag: Placed within the <head> section of individual HTML pages.
X-Robots-Tag HTTP header: Implemented in the server response for a URL.

Text-level settings use the data-nosnippet attribute within HTML elements to control the display of specific content within a page.

Important: These settings only work if Google's crawlers can access your pages. Blocking crawlers prevents them from discovering these instructions.

Blocking Non-Search Crawlers

While this document focuses on Google Search, you can block other bots like AdsBot-Google using specific rules. For instance, in your robots.txt file:

User-agent: AdsBot-Google
Disallow: /private-directory/

This would block AdsBot-Google from accessing the /private-directory/ on your website.

1. Using the Robots Meta Tag

The robots meta tag provides granular, page-specific control over indexing and serving in Google Search.

Implementation:

<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noindex" />
  <title>My Webpage</title>
</head>
<body>
  <!-- Page content -->
</body>
</html>

In this example, the noindex value instructs all search engines to exclude the page from search results.

Key Points:

Place the robots meta tag within the <head> section of your HTML.
Both the name and content attributes are case-insensitive.

CMS Users:

Content Management Systems (CMS) like Wix, WordPress, or Blogger often offer built-in settings for managing meta tags. Look for options related to "Search Engine Optimization" or "SEO Settings."

Targeting Specific Crawlers:

Google supports specific user agent tokens within the robots meta tag:

googlebot: Controls indexing and serving for Google's general web search results.
googlebot-news: Controls inclusion in Google News.

Examples:

Exclude a page from Google Search:
```
<meta name="robots" content="noindex">
```

Exclude a page from Google News:

<meta name="robots" content="googlebot-news: noindex">

Exclude a page from both Google Search and Google News:

<meta name="robots" content="noindex, googlebot-news: noindex">

Multiple Crawlers, Different Rules:
```
<meta name="robots" content="googlebot: noindex"> 
<meta name="robots" content="googlebot-news: nosnippet">
```
This instructs Googlebot to not index the page while only preventing snippets from appearing in Google News.

Important: For blocking non-HTML resources (PDFs, images, videos), use the X-Robots-Tag instead.

2. Using the X-Robots-Tag HTTP Header

The X-Robots-Tag provides similar control to the robots meta tag but is implemented within the HTTP header response for a URL.

Example:

HTTP/1.1 200 OK
X-Robots-Tag: noindex
Content-Type: text/html; charset=UTF-8

This instructs all crawlers to not index the page.

Key Points:

Any rule applicable in the robots meta tag can be used with the X-Robots-Tag.
Multiple X-Robots-Tag headers can be used within a single response.
Rules are not case-sensitive.

Multiple Rules:

HTTP/1.1 200 OK
X-Robots-Tag: noarchive, unavailable_after: Fri, 01 Jan 2024 00:00:00 GMT
Content-Type: text/html; charset=UTF-8

This example combines noarchive (prevents caching) and unavailable_after (sets an expiration date).

Targeting Specific Crawlers:

HTTP/1.1 200 OK
X-Robots-Tag: googlebot: noindex
X-Robots-Tag: bingbot: noarchive
Content-Type: text/html; charset=UTF-8

Here, googlebot is instructed to not index the page, while bingbot is told not to cache it.

Conflicting Rules:

When conflicting rules are present, the more restrictive rule always takes precedence. For example, nosnippet will override max-snippet:50.

3. Indexing and Serving Rules

Both the robots meta tag and X-Robots-Tag support a set of rules to control indexing and snippet generation:

Rule

Description

all

Default behavior; no restrictions.

noindex

Prevents the page from appearing in search results.

nofollow

Prevents Google from following links on the page.

none

Equivalent to combining noindex and nofollow.

noarchive

Prevents Google from showing a cached link in search results.

nositelinkssearchbox

Prevents a sitelinks search box from appearing with the site's results.

nosnippet

Prevents a text snippet or video preview from appearing in search results. This also affects AI-powered summaries.

indexifembedded

Allows indexing of a page's content even with noindex if embedded within another page (e.g., using iframes).

max-snippet:[number]

Limits the text snippet length to the specified number of characters. Using 0 is equivalent to nosnippet, while -1 allows Google to choose the length.

max-image-preview:[setting]

Controls the maximum size of image previews. Options include none, standard, and large. Note: This doesn't override specific permissions given for content use (e.g., through structured data).

max-video-preview:[number]

Sets the maximum duration (in seconds) of video previews in search results. Similar to max-snippet, 0 minimizes the preview, and -1 allows Google to decide.

notranslate

Prevents Google from offering translations of the page title and snippet in search results.

noimageindex

Prevents images on the page from being indexed.

unavailable_after:[date/time]

Excludes the page from search results after the specified date and time. Use a standard date/time format (RFC 822, RFC 850, ISO 8601).

Combining Rules:

Multiple rules within a single tag/header:

<meta name="robots" content="noindex, nofollow" />

X-Robots-Tag: noarchive, unavailable_after: Thu, 01 Dec 2023 00:00:00 GMT

Multiple meta tags:

<meta name="robots" content="max-snippet:150" /> 
<meta name="robots" content="max-image-preview:standard" />

4. Using the data-nosnippet HTML Attribute

For finer control over snippets, use the data-nosnippet attribute within HTML elements like <span>, <div>, and <section>.

Example:

<p>This is some text that can be included in a snippet.</p>
<p data-nosnippet>This text will be excluded from snippets.</p>

Key Points:

data-nosnippet is a boolean attribute; its presence alone is enough to take effect.
Ensure your HTML is valid and all tags are properly closed.
Avoid dynamically adding or removing data-nosnippet using JavaScript after page load, as it might not be reliably recognized.

5. Using Structured Data

While robots meta tags control general content extraction, structured data using schema.org vocabulary provides specific information to Google for enhanced search features.

Key Points:

Robots meta tag rules, except for max-snippet, do not affect structured data.
Use structured data to enhance search results with rich snippets and features.

Example:

Even with nosnippet applied, a page with properly implemented recipe structured data can still appear in recipe carousels.

6. Practical Implementation of X-Robots-Tag

The X-Robots-Tag is implemented within your web server's configuration files.

Apache Examples:

Blocking PDF indexing:

<FilesMatch "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

Blocking image indexing:

<FilesMatch "\.(png|jpe?g|gif)$">
  Header set X-Robots-Tag "noindex"
</FilesMatch>

NGINX Example:

Blocking PDF indexing:

location ~* \.pdf$ {
  add_header X-Robots-Tag "noindex, nofollow";
}

7. Interaction with robots.txt

Remember: Robots meta tags and X-Robots-Tag headers are discovered during crawling. If a URL is disallowed in your robots.txt file, these instructions will be ignored. Ensure that pages with these directives are accessible to crawlers.

PreviousMeta Tags Nextnoindex Explained

Last updated 1 year ago