Verifying the Googlebot and other Google crawlers
Last updated
Last updated
It's important to ensure that requests claiming to be from Googlebot are legitimate. This verification process protects your site from malicious actors pretending to be Google crawlers. Here's a comprehensive guide to verifying Google crawlers:
Google employs various crawlers, each serving different purposes. They are categorized as follows:
Type | Description | Reverse DNS Mask | IP Ranges |
---|---|---|---|
There are two primary methods to verify if a crawler is legitimate:
1. Manual Verification (Using Command Line Tools)
This method is ideal for occasional checks and can be performed using the host
command:
Step 1: Reverse DNS Lookup
Use the IP address from your server logs and run a reverse DNS lookup to obtain the hostname:
Example Output:
Step 2: Verify Domain Name
Confirm that the returned domain name belongs to Google:
googlebot.com
google.com
googleusercontent.com
Step 3: Forward DNS Lookup
Perform a forward DNS lookup using the domain name obtained in Step 1:
Example Output:
Step 4: Cross-Verification
Ensure the IP address returned in Step 3 matches the original IP address from your server logs.
2. Automatic Verification (IP Address Matching)
For large-scale verification, automatically compare the crawler's IP address against Google's published IP ranges.
Step 1: Access IP Range Lists Download the appropriate JSON file based on the crawler type:
Googlebot: googlebot.json
Special Crawlers: special-crawlers.json
User-Triggered Fetches: user-triggered-fetchers.json and user-triggered-fetchers-google.json
Step 2: IP Address Matching Implement a script or utilize a library (depending on your programming language) to:
Parse the downloaded JSON file.
Check if the crawler's IP address falls within any of the listed IP ranges (represented in CIDR notation).
Example Python Code (Using the ipaddress
module):
Verifying Other Google Services
To verify if an IP address belongs to other Google services (like Google Cloud functions), you can use the general list of Google IP addresses available publicly.
Note: The IP addresses in the JSON files are in CIDR format. You can use online tools or programming libraries to efficiently check if an IP address belongs to a specific CIDR block.
Googlebot
The primary crawler for Google Search, strictly adheres to robots.txt rules.
crawl-***-***-***-***.googlebot.com
or geo-crawl-***-***-***-***.geo.googlebot.com
Special-case crawlers
Perform specific tasks (e.g., AdsBot), may or may not adhere to robots.txt rules.
rate-limited-proxy-***-***-***-***.google.com
User-triggered fetchers
Activated by user actions (e.g., Google Site Verifier), ignore robots.txt rules.
***-***-***-***.gae.googleusercontent.com
or google-proxy-***-***-***-***.google.com