🔎
Google Search for beginners
Home
  • Introduction
  • Google Search Essentials
    • Overview
    • Google Search Technical Requirements
    • Spam Policies
  • SEO Basics
    • SEO Beginner's Guide
    • How Google Search Works
    • Creating Helpful, Reliable Content
    • Do You Need an SEO Expert?
    • Maintaining Your Website’s SEO
    • Developer's Guide to Google Search
    • How to Get Your Website Listed on Google
  • crawling and indexing
    • Overview
    • File formats Google can index
    • URL structure
    • Links
    • Sitemaps
      • Create and submit a sitemap
      • Manage your sitemaps
      • Image-specific sitemaps
      • News-oriented sitemaps
      • Video sitemaps and alternatives
      • Combining different sitemap types
    • Managing Google Crawlers
      • Reducing the crawl rate of Googlebot
      • Verifying the Googlebot and other Google crawlers
      • Managing Crawl Budget for Large Sites
      • HTTP Status Codes, Network, and DNS Errors
      • Types of Google Crawlers
      • Googlebot Explained
      • Google Read Aloud Service
      • Google API
      • Understanding Feedfetcher
    • Robots.txt
      • Creating and Submitting Robots.txt
      • Updating Robots.txt
      • Google's Interpretation of Robots.txt
    • Canonicalization
      • Specifying Canonicals Using rel="canonical" and Other Methods
      • Resolving Canonicalization Issues
    • Canonicalization for Mobile Sites and Mobile-First Indexing
    • AMP (Accelerated Mobile Pages)
      • Understanding How AMP Works in Search Results
      • Enhancing Your AMP Content
      • Validating AMP Content
      • Removing AMP Content
    • JavaScript
      • Fixing Search-Related JavaScript Issues
      • Resolving Issues with Lazy-Loaded Content
      • Using Dynamic Rendering as a Workaround
    • Page and Content Metadata
      • Meta Tags
      • Using Robots Meta Tag, data-nosnippet, and X-Robots-Tag noindex
      • noindex Explained
      • rel Attributes
    • Removals
      • Removing Pages from Search Results
      • Removing Images from Search Results
      • Handling Redacted Information
    • Redirects and Google Search
      • Switching Website Hosting Services
      • Handling URL Changes During Site Moves
      • A/B Testing for Sites
      • Pause or Disable a Website
Powered by GitBook
On this page
  1. crawling and indexing
  2. Managing Google Crawlers

Understanding Feedfetcher

Understanding Google Feedfetcher: A Technical Deep Dive

Feedfetcher is Google's dedicated tool for fetching and updating RSS or Atom feeds, primarily used for services like Google Podcasts, Google News, and PubSubHubbub. While only podcast feeds are directly indexed for Google Search results, understanding how Feedfetcher interacts with your site is crucial for managing your content's visibility.

This document provides in-depth answers to frequently asked questions about Feedfetcher, complete with illustrative examples and code snippets to help you better grasp the concepts.

How Feedfetcher Works

Unlike traditional web crawlers that follow links to discover content, Feedfetcher operates based on user requests. When a user interacts with a service using Feedfetcher (like subscribing to a podcast), Google receives a request to fetch and update that specific feed.

Think of it like this:

  • User: Subscribes to "Tech Daily" podcast on Google Podcasts.

  • Google Podcasts: Sends a request to Feedfetcher to fetch the "Tech Daily" podcast feed.

  • Feedfetcher: Downloads the feed content and periodically checks for updates.

Controlling Feedfetcher Access

1. Can I block Feedfetcher from accessing my feeds?

While you can't prevent users from accessing publicly available feeds, you can control how your server responds to Feedfetcher's requests.

Example: Let's say you want to block Feedfetcher from accessing the feed located at "https://www.example.com/feed.xml". You can configure your server to respond with a 404 Not Found error specifically when the request originates from Feedfetcher.

# Sample .htaccess configuration for Apache servers
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Feedfetcher-Google [NC]
RewriteRule ^/feed.xml$ - [R=404,L] 

This code snippet checks the user agent string of the incoming request. If it matches "Feedfetcher-Google", it triggers a 404 error for requests made to /feed.xml.

Note: The method for blocking Feedfetcher may vary depending on your server setup (e.g., Nginx, IIS). Refer to your server documentation for specific instructions.

2. What if my feed is hosted by a third-party service?

If you're using a blogging platform or a feed hosting service, you'll need to manage access through their provided tools. Most platforms offer settings to control feed visibility or restrict access. Consult your service provider's documentation for detailed instructions.

Feedfetcher Retrieval Frequency

How often does Feedfetcher update my feeds?

Feedfetcher aims to be resource-efficient. For most websites, it refreshes feeds approximately once per hour. However, popular and frequently updated sites might experience more frequent checks.

Keep in mind:

  • Network Delays: Temporary network fluctuations can create the appearance of more frequent retrievals. This is usually temporary and doesn't necessarily reflect the actual update schedule.

Troubleshooting Feedfetcher Issues

1. Feedfetcher is trying to access incorrect links. Why?

Remember, Feedfetcher acts on user requests. If a user mistakenly provides an incorrect feed URL (typo, outdated link), Feedfetcher will attempt to access it. This can happen even if the URL points to a non-existent domain.

2. Feedfetcher is accessing a private section of my website. Why?

If a user is aware of a non-public section of your website and manually enters the feed URL, Feedfetcher will attempt to retrieve it.

Example: Imagine you have a password-protected area on your website with a feed at "https://www.example.com/private/feed.xml". If a user who knows about this section inputs the URL, Feedfetcher will attempt access. Consider adding authentication mechanisms to protect sensitive feeds.

3. Why doesn't Feedfetcher respect my robots.txt file?

Feedfetcher prioritizes user intent. Since a user explicitly initiated the request for your feed (by subscribing to your podcast or using a service that relies on it), Feedfetcher views itself as acting on behalf of that user, not as a regular bot.

Additional Information

1. Multiple Feedfetcher User Agents: You might notice multiple instances of "Feedfetcher-Google" in your server logs originating from different IP addresses. This is normal! Google distributes Feedfetcher across various machines for performance and scalability.

2. Identifying Feedfetcher Requests: Focus on the user agent string "Feedfetcher-Google" in your server logs to reliably identify these requests, as IP addresses can change.

PreviousGoogle APINextRobots.txt

Last updated 10 months ago