HTTP Status Codes, Network, and DNS Errors

This document explains how various HTTP status codes, network hiccups, and DNS problems can impact your website's visibility on Google Search. We'll delve into the top 20 status codes Googlebot frequently encounters on the web and the most common network and DNS errors.

Please note: This guide doesn't cover uncommon status codes like 418 (I'm a teapot). All issues discussed here trigger corresponding warnings or errors within the Page Indexing report of Google Search Console. Experimental features of protocols like HTTP and FTP are not supported unless explicitly mentioned.

HTTP Status Codes

Web servers generate HTTP status codes when responding to requests from clients like browsers or web crawlers. Each code conveys a specific meaning, even if the request's outcome appears similar. For instance, various status codes signal redirection, but their ultimate effect remains the same.

Search Console flags error messages for status codes within the 4xx-5xx range and for unsuccessful redirections (3xx). Content received in response to a 2xx status code might be considered for indexing. However, receiving a 2xx status code does not guarantee indexing.

Let's break down the most common HTTP status codes Googlebot encounters and how Google handles them:

2xx (Success)

  • General: Google evaluates the content for potential indexing. If the content hints at an error (e.g., empty page, error message), Search Console will flag a "soft 404" error.

  • 200 (OK): Google forwards the content to its indexing pipeline. While the indexing systems might index this content, it's not guaranteed.

    Example: You request a blog post, and the server successfully delivers the full content with a 200 status code.

  • 201 (Created):

    Example: You submit a new blog comment. The server creates the comment resource and returns a 201 status code, confirming the creation.

  • 202 (Accepted): Googlebot waits for the content, up to a predefined limit, and then passes whatever it received to the indexing pipeline. This timeout varies depending on the user agent. For example, Googlebot Smartphone might have a different timeout compared to Googlebot Image.

    Example: You upload a large video file. The server accepts the upload request (202) and processes it in the background.

  • 204 (No Content): Googlebot informs the indexing pipeline that it didn't receive any content. Search Console might display a "soft 404" error in your site's Page Indexing report.

    Example: You click a "Clear Notifications" button, and the server responds with a 204, indicating successful removal without sending new content.

3xx (Redirection)

  • General: Googlebot will follow up to ten redirect hops. If content isn't received within ten hops, Search Console flags a "redirect error" in the site's Page Indexing report. The hop limit can vary depending on the user agent; Googlebot Smartphone might differ from Googlebot Image. For robots.txt, Googlebot adheres to RFC 1945, following at least five redirect hops before stopping and treating it as a 404 for that robots.txt.

    Content from redirecting URLs is disregarded; only the final target URL's content is considered for indexing.

  • 301 (Moved Permanently): Googlebot follows the redirect, and the indexing pipeline interprets this as a strong signal that the redirect target should be the canonical URL.

    Example: You've permanently moved your website from http://www.example.com to https://www.example.com. A 301 redirect ensures users and Googlebot are directed to the correct, secure location.

  • 302 (Found): Googlebot follows the redirect, but the indexing pipeline treats this as a weak signal that the redirect target should be the canonical URL.

    Example: A product page is temporarily out of stock and redirects to a similar product page.

  • 303 (See Other):

    Example: You submit a form, and the server processes it and uses a 303 to redirect you to a separate confirmation page.

  • 304 (Not Modified): Googlebot signals the indexing pipeline that the content hasn't changed since the last crawl. The indexing pipeline might recalculate signals for the URL, but this status code doesn't directly influence indexing.

    Example: You request a page you've visited recently. If the content hasn't changed, the server might respond with a 304 to avoid re-sending the same data.

  • 307 (Temporary Redirect): Functionally equivalent to a 302.

    Example: A website under maintenance might use a 307 to temporarily redirect visitors to an informational page until maintenance is complete.

  • 308 (Permanent Redirect): Functionally equivalent to a 301.

    Important: While Google Search handles these codes similarly, remember they have semantic differences. Always use the status code appropriate for the specific redirect. This practice benefits other clients, like e-readers or other search engines.

4xx (Client Errors)

  • General: Google's indexing pipeline won't index URLs returning a 4xx status code. URLs already indexed that begin returning a 4xx are removed from the index. Any content received by Googlebot from URLs returning a 4xx code is ignored.

  • 400 (Bad Request):

    Example: You send a request with incorrect parameters, like an invalid date format in a search query.

  • 401 (Unauthorized):

    Example: You attempt to access a restricted admin panel without proper login credentials.

  • 403 (Forbidden):

    Example: You try to access a file you don't have permission to view.

  • 404 (Not Found):

    Example: You try to access a page that doesn't exist on the server, perhaps due to a broken link or an incorrect URL.

  • 410 (Gone):

    Example: You're looking for a blog post that has been permanently deleted. The server returns a 410 to indicate it's gone for good.

  • 411 (Length Required):

    Example: You attempt to upload a file without specifying the content length in the header, and the server requires this information.

  • 429 (Too Many Requests): Googlebot interprets this as a signal of server overload, classifying it as a server error.

    Example: You're using an API and exceed your allowed request quota, triggering a 429 error.

Important: All 4xx errors, except 429, are treated similarly: Googlebot tells the indexing pipeline the content is unavailable. The indexing pipeline removes previously indexed URLs. Newly encountered 404 pages aren't processed, and crawl frequency for that site gradually decreases.

**Don't use 401 or 403 status codes to control crawl rate. ** The 4xx status codes (except 429) don't affect crawl rate. Refer to Google's documentation on how to manage your crawl rate effectively.

5xx (Server Errors)

  • General: Both 5xx and 429 errors cause Google's crawlers to temporarily slow down crawling. Indexed URLs remain in the index initially but are eventually removed if the issue persists.

    If a robots.txt file returns a server error status code for over 30 days, Google resorts to using its last cached copy. If unavailable, Google assumes no crawl restrictions are in place.

    Content from URLs returning a 5xx status code is disregarded by Googlebot.

  • 500 (Internal Server Error): Googlebot reduces the crawl rate for the affected site. The crawl rate reduction is proportional to the number of URLs returning the error. Google's indexing pipeline removes persistently failing URLs from its index.

    Example: A bug in your server-side code prevents a page from loading.

  • 502 (Bad Gateway):

    Example: Your server, acting as a gateway or proxy, receives an invalid response from an upstream server.

  • 503 (Service Unavailable):

    Example: Your server is temporarily overloaded or down for maintenance.

Soft 404 Errors

A soft 404 happens when a URL returns a page suggesting the content doesn't exist but still delivers a 200 (success) status code. This might manifest as a page devoid of main content or a completely blank page.

Several factors can lead to your web server, content management system, or user's browser generating such pages. Examples include:

  • A missing server-side include file

  • A disrupted database connection

  • An empty internal search result page

  • An unloaded or missing JavaScript file

Returning a 200 (success) code while simultaneously displaying an error message creates a negative user experience. Users might perceive the page as functional only to encounter an error. Google excludes such pages from search results.

When Google's algorithms detect an error page based on content analysis, Search Console flags it as a "soft 404" error in the site's Page Indexing report.

Fixing Soft 404 Errors

You can address soft 404 errors in multiple ways, depending on the page's status and your intended outcome:

  1. The Page and Content Are No Longer Available

    • If you've removed the page without a replacement containing similar content, return a 404 (Not Found) or 410 (Gone) status code. This signals to search engines that the page is gone, and its content shouldn't be indexed.

    • Leverage Custom 404 Pages:

      • If you have server configuration access, make your error pages user-friendly. A well-designed custom 404 page helps users find information and encourages further site exploration.

      • Tips for Custom 404 Pages:

        • Clearly state that the requested page isn't found. Use inviting and friendly language.

        • Maintain consistent styling (including navigation) with the rest of your site.

        • Include links to popular articles, posts, and your homepage.

        • Consider adding a broken link reporting mechanism.

      • Important: Custom 404 pages are for users. Ensure your server returns a 404 HTTP status code to prevent indexing.

  2. The Page or Content Has Moved

    • If your page has moved or has a clear replacement, use a 301 (Permanent Redirect) to guide users seamlessly. This also informs search engines about the new location.

    • Verify correct code implementation using the URL Inspection tool.

  3. The Page and Content Still Exist

    • If a valid page is flagged with a soft 404, it likely didn't load correctly for Googlebot, is missing resources, or displayed a prominent error during rendering.

    • Use the URL Inspection tool to investigate the rendered content and the returned HTTP code.

    • If the rendered page is blank, near-empty, or shows an error message, it might indicate resource loading problems (images, scripts, etc.). This can trigger a soft 404.

    • Reasons for Resource Loading Issues:

      • Blocked resources (robots.txt)

      • Excessive resources on a single page

      • Various server errors

      • Slow-loading or excessively large resources

Network and DNS Errors

Network and DNS issues can swiftly and negatively impact a URL's ranking on Google Search. Googlebot treats network timeouts, connection resets, and DNS errors similarly to 5xx server errors.

Network Errors

Network errors cause an immediate slowdown in crawling. Googlebot interprets them as potential signs of server overload. Since Googlebot can't reach the server, it receives no content, hindering indexing of crawled URLs. Unreachable URLs already indexed are typically removed from Google's index within days. Search Console may generate specific error messages for these situations.

If you don't manage your web hosting directly, consult your hosting or CDN provider for assistance.

Debugging Network Errors

These errors occur before or during Google's crawl attempts. Diagnosing them can be trickier since they might arise before the server can respond, leaving you without helpful status codes.

To debug timeout and connection reset errors:

  1. Firewall Scrutiny: Review your firewall settings and logs. Check for overly restrictive blocking rules and ensure Googlebot IP addresses aren't blocked.

  2. Network Traffic Analysis: Utilize tools like tcpdump and Wireshark to capture and analyze TCP packets. Look for anomalies pointing to specific network components or server modules.

    Example using tcpdump to monitor traffic on port 80 (HTTP):

    tcpdump -i eth0 port 80

    (Replace eth0 with your network interface if needed)

  3. Contact Your Hosting Provider: If your investigation yields no clear answers, reach out to your hosting company for further assistance.

Remember: The problem might lie within any server component handling network traffic. Overloaded network interfaces, for example, might drop packets, leading to timeouts or connection resets (RST packets sent due to mistakenly closed ports).

DNS Errors

Misconfiguration is the most common cause of DNS errors, though firewall rules blocking Googlebot's DNS queries can also be culprits.

Debugging DNS Errors:

  1. Firewall Inspection: Scrutinize your firewall rules. Ensure no Google IPs are blocked and that both UDP and TCP requests are permitted.

  2. DNS Record Verification: Double-check your A and CNAME records to ensure they point to the correct IP addresses and hostnames, respectively.

    Example:

    • A Record: example.com. IN A 192.0.2.1 (maps domain to IP)

    • CNAME Record: www.example.com. IN CNAME example.com. (maps subdomain to domain)

  3. Name Server Check: Verify that all your name servers point to the correct IP addresses for your site.

    Example: nslookup -type=A example.com (should return the correct IP address)

  4. DNS Propagation: If you've adjusted your DNS within the last 72 hours, allow time for changes to propagate globally. To potentially expedite this, you can flush Google's Public DNS cache.

  5. DNS Server Health: If you operate your own DNS server, ensure it's running smoothly and not overloaded.

By understanding these common HTTP status codes, network errors, and DNS issues, you'll be better equipped to diagnose problems and ensure Googlebot can efficiently crawl and index your website, leading to improved visibility in Google Search results.

Last updated