Understanding Feedfetcher

Understanding Google Feedfetcher: A Technical Deep Dive

Feedfetcher is Google's dedicated tool for fetching and updating RSS or Atom feeds, primarily used for services like Google Podcasts, Google News, and PubSubHubbub. While only podcast feeds are directly indexed for Google Search results, understanding how Feedfetcher interacts with your site is crucial for managing your content's visibility.

This document provides in-depth answers to frequently asked questions about Feedfetcher, complete with illustrative examples and code snippets to help you better grasp the concepts.

How Feedfetcher Works

Unlike traditional web crawlers that follow links to discover content, Feedfetcher operates based on user requests. When a user interacts with a service using Feedfetcher (like subscribing to a podcast), Google receives a request to fetch and update that specific feed.

Think of it like this:

  • User: Subscribes to "Tech Daily" podcast on Google Podcasts.

  • Google Podcasts: Sends a request to Feedfetcher to fetch the "Tech Daily" podcast feed.

  • Feedfetcher: Downloads the feed content and periodically checks for updates.

Controlling Feedfetcher Access

1. Can I block Feedfetcher from accessing my feeds?

While you can't prevent users from accessing publicly available feeds, you can control how your server responds to Feedfetcher's requests.

Example: Let's say you want to block Feedfetcher from accessing the feed located at "https://www.example.com/feed.xml". You can configure your server to respond with a 404 Not Found error specifically when the request originates from Feedfetcher.

# Sample .htaccess configuration for Apache servers
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Feedfetcher-Google [NC]
RewriteRule ^/feed.xml$ - [R=404,L] 

This code snippet checks the user agent string of the incoming request. If it matches "Feedfetcher-Google", it triggers a 404 error for requests made to /feed.xml.

Note: The method for blocking Feedfetcher may vary depending on your server setup (e.g., Nginx, IIS). Refer to your server documentation for specific instructions.

2. What if my feed is hosted by a third-party service?

If you're using a blogging platform or a feed hosting service, you'll need to manage access through their provided tools. Most platforms offer settings to control feed visibility or restrict access. Consult your service provider's documentation for detailed instructions.

Feedfetcher Retrieval Frequency

How often does Feedfetcher update my feeds?

Feedfetcher aims to be resource-efficient. For most websites, it refreshes feeds approximately once per hour. However, popular and frequently updated sites might experience more frequent checks.

Keep in mind:

  • Network Delays: Temporary network fluctuations can create the appearance of more frequent retrievals. This is usually temporary and doesn't necessarily reflect the actual update schedule.

Troubleshooting Feedfetcher Issues

1. Feedfetcher is trying to access incorrect links. Why?

Remember, Feedfetcher acts on user requests. If a user mistakenly provides an incorrect feed URL (typo, outdated link), Feedfetcher will attempt to access it. This can happen even if the URL points to a non-existent domain.

2. Feedfetcher is accessing a private section of my website. Why?

If a user is aware of a non-public section of your website and manually enters the feed URL, Feedfetcher will attempt to retrieve it.

Example: Imagine you have a password-protected area on your website with a feed at "https://www.example.com/private/feed.xml". If a user who knows about this section inputs the URL, Feedfetcher will attempt access. Consider adding authentication mechanisms to protect sensitive feeds.

3. Why doesn't Feedfetcher respect my robots.txt file?

Feedfetcher prioritizes user intent. Since a user explicitly initiated the request for your feed (by subscribing to your podcast or using a service that relies on it), Feedfetcher views itself as acting on behalf of that user, not as a regular bot.

Additional Information

1. Multiple Feedfetcher User Agents: You might notice multiple instances of "Feedfetcher-Google" in your server logs originating from different IP addresses. This is normal! Google distributes Feedfetcher across various machines for performance and scalability.

2. Identifying Feedfetcher Requests: Focus on the user agent string "Feedfetcher-Google" in your server logs to reliably identify these requests, as IP addresses can change.

Last updated