🔎
Google Search for beginners
Home
  • Introduction
  • Google Search Essentials
    • Overview
    • Google Search Technical Requirements
    • Spam Policies
  • SEO Basics
    • SEO Beginner's Guide
    • How Google Search Works
    • Creating Helpful, Reliable Content
    • Do You Need an SEO Expert?
    • Maintaining Your Website’s SEO
    • Developer's Guide to Google Search
    • How to Get Your Website Listed on Google
  • crawling and indexing
    • Overview
    • File formats Google can index
    • URL structure
    • Links
    • Sitemaps
      • Create and submit a sitemap
      • Manage your sitemaps
      • Image-specific sitemaps
      • News-oriented sitemaps
      • Video sitemaps and alternatives
      • Combining different sitemap types
    • Managing Google Crawlers
      • Reducing the crawl rate of Googlebot
      • Verifying the Googlebot and other Google crawlers
      • Managing Crawl Budget for Large Sites
      • HTTP Status Codes, Network, and DNS Errors
      • Types of Google Crawlers
      • Googlebot Explained
      • Google Read Aloud Service
      • Google API
      • Understanding Feedfetcher
    • Robots.txt
      • Creating and Submitting Robots.txt
      • Updating Robots.txt
      • Google's Interpretation of Robots.txt
    • Canonicalization
      • Specifying Canonicals Using rel="canonical" and Other Methods
      • Resolving Canonicalization Issues
    • Canonicalization for Mobile Sites and Mobile-First Indexing
    • AMP (Accelerated Mobile Pages)
      • Understanding How AMP Works in Search Results
      • Enhancing Your AMP Content
      • Validating AMP Content
      • Removing AMP Content
    • JavaScript
      • Fixing Search-Related JavaScript Issues
      • Resolving Issues with Lazy-Loaded Content
      • Using Dynamic Rendering as a Workaround
    • Page and Content Metadata
      • Meta Tags
      • Using Robots Meta Tag, data-nosnippet, and X-Robots-Tag noindex
      • noindex Explained
      • rel Attributes
    • Removals
      • Removing Pages from Search Results
      • Removing Images from Search Results
      • Handling Redacted Information
    • Redirects and Google Search
      • Switching Website Hosting Services
      • Handling URL Changes During Site Moves
      • A/B Testing for Sites
      • Pause or Disable a Website
Powered by GitBook
On this page
  1. crawling and indexing
  2. Managing Google Crawlers

Verifying the Googlebot and other Google crawlers

Verifying Googlebot and Other Google Crawlers

It's important to ensure that requests claiming to be from Googlebot are legitimate. This verification process protects your site from malicious actors pretending to be Google crawlers. Here's a comprehensive guide to verifying Google crawlers:

Understanding Google Crawler Types

Google employs various crawlers, each serving different purposes. They are categorized as follows:

Type
Description
Reverse DNS Mask
IP Ranges

Googlebot

The primary crawler for Google Search, strictly adheres to robots.txt rules.

crawl-***-***-***-***.googlebot.com or geo-crawl-***-***-***-***.geo.googlebot.com

Special-case crawlers

Perform specific tasks (e.g., AdsBot), may or may not adhere to robots.txt rules.

rate-limited-proxy-***-***-***-***.google.com

User-triggered fetchers

Activated by user actions (e.g., Google Site Verifier), ignore robots.txt rules.

***-***-***-***.gae.googleusercontent.com or google-proxy-***-***-***-***.google.com

Verification Methods

There are two primary methods to verify if a crawler is legitimate:

1. Manual Verification (Using Command Line Tools)

This method is ideal for occasional checks and can be performed using the host command:

Step 1: Reverse DNS Lookup

Use the IP address from your server logs and run a reverse DNS lookup to obtain the hostname:

host 66.249.66.1

Example Output:

1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

Step 2: Verify Domain Name

Confirm that the returned domain name belongs to Google:

  • googlebot.com

  • google.com

  • googleusercontent.com

Step 3: Forward DNS Lookup

Perform a forward DNS lookup using the domain name obtained in Step 1:

host crawl-66-249-66-1.googlebot.com

Example Output:

crawl-66-249-66-1.googlebot.com has address 66.249.66.1

Step 4: Cross-Verification

Ensure the IP address returned in Step 3 matches the original IP address from your server logs.

2. Automatic Verification (IP Address Matching)

For large-scale verification, automatically compare the crawler's IP address against Google's published IP ranges.

Step 1: Access IP Range Lists Download the appropriate JSON file based on the crawler type:

Step 2: IP Address Matching Implement a script or utilize a library (depending on your programming language) to:

  • Parse the downloaded JSON file.

  • Check if the crawler's IP address falls within any of the listed IP ranges (represented in CIDR notation).

Example Python Code (Using the ipaddress module):

import ipaddress
import json

def is_google_ip(ip_address, json_file):
  with open(json_file, 'r') as f:
    data = json.load(f)
  
  for entry in data["prefixes"]:
    if "ipv4Prefix" in entry:
      ip_network = ipaddress.ip_network(entry["ipv4Prefix"])
      if ipaddress.ip_address(ip_address) in ip_network:
        return True
  return False

# Example usage
crawler_ip = "66.249.66.1"
if is_google_ip(crawler_ip, "googlebot.json"):
  print(f"{crawler_ip} belongs to Googlebot.")
else:
  print(f"{crawler_ip} is not a verified Googlebot IP.")

Verifying Other Google Services

To verify if an IP address belongs to other Google services (like Google Cloud functions), you can use the general list of Google IP addresses available publicly.

Note: The IP addresses in the JSON files are in CIDR format. You can use online tools or programming libraries to efficiently check if an IP address belongs to a specific CIDR block.

PreviousReducing the crawl rate of GooglebotNextManaging Crawl Budget for Large Sites

Last updated 11 months ago

and

Googlebot:

Special Crawlers:

User-Triggered Fetches: and

googlebot.json
special-crawlers.json
user-triggered-fetchers.json
user-triggered-fetchers-google.json
googlebot.json
special-crawlers.json
user-triggered-fetchers.json
user-triggered-fetchers-google.json