🔎
Google Search for beginners
Home
  • Introduction
  • Google Search Essentials
    • Overview
    • Google Search Technical Requirements
    • Spam Policies
  • SEO Basics
    • SEO Beginner's Guide
    • How Google Search Works
    • Creating Helpful, Reliable Content
    • Do You Need an SEO Expert?
    • Maintaining Your Website’s SEO
    • Developer's Guide to Google Search
    • How to Get Your Website Listed on Google
  • crawling and indexing
    • Overview
    • File formats Google can index
    • URL structure
    • Links
    • Sitemaps
      • Create and submit a sitemap
      • Manage your sitemaps
      • Image-specific sitemaps
      • News-oriented sitemaps
      • Video sitemaps and alternatives
      • Combining different sitemap types
    • Managing Google Crawlers
      • Reducing the crawl rate of Googlebot
      • Verifying the Googlebot and other Google crawlers
      • Managing Crawl Budget for Large Sites
      • HTTP Status Codes, Network, and DNS Errors
      • Types of Google Crawlers
      • Googlebot Explained
      • Google Read Aloud Service
      • Google API
      • Understanding Feedfetcher
    • Robots.txt
      • Creating and Submitting Robots.txt
      • Updating Robots.txt
      • Google's Interpretation of Robots.txt
    • Canonicalization
      • Specifying Canonicals Using rel="canonical" and Other Methods
      • Resolving Canonicalization Issues
    • Canonicalization for Mobile Sites and Mobile-First Indexing
    • AMP (Accelerated Mobile Pages)
      • Understanding How AMP Works in Search Results
      • Enhancing Your AMP Content
      • Validating AMP Content
      • Removing AMP Content
    • JavaScript
      • Fixing Search-Related JavaScript Issues
      • Resolving Issues with Lazy-Loaded Content
      • Using Dynamic Rendering as a Workaround
    • Page and Content Metadata
      • Meta Tags
      • Using Robots Meta Tag, data-nosnippet, and X-Robots-Tag noindex
      • noindex Explained
      • rel Attributes
    • Removals
      • Removing Pages from Search Results
      • Removing Images from Search Results
      • Handling Redacted Information
    • Redirects and Google Search
      • Switching Website Hosting Services
      • Handling URL Changes During Site Moves
      • A/B Testing for Sites
      • Pause or Disable a Website
Powered by GitBook
On this page
  1. crawling and indexing

Canonicalization

What is Canonicalization?

In simple terms, canonicalization is like choosing the "official" version of a webpage when you have multiple copies. Imagine you have a recipe for apple pie on your website, but it's accessible through different URLs:

  • https://www.example.com/recipes/apple-pie

  • https://www.example.com/recipes/apple-pie/index.html

  • https://www.example.com/recipes?id=123

While these URLs look different, they all lead to the same delicious apple pie recipe. This is where canonicalization comes in. You tell Google (and other search engines) which URL you prefer to be the main one, the canonical URL. This helps search engines:

  • Avoid duplicate content penalties: Having multiple copies of the same content can confuse search engines and might even be seen as trying to manipulate rankings.

  • Consolidate ranking signals: All the "link juice" from other websites pointing to your recipe will be directed to the canonical URL, boosting its authority and ranking potential.

  • Provide a consistent user experience: Users will always land on the same URL for your recipe, avoiding confusion and potential frustration.

Why Do Duplicate Pages Happen?

Duplicate content isn't always intentional. Here are some common reasons why you might have duplicate content on your website:

  • Website Structure:

    • Trailing slashes: /recipes/apple-pie vs /recipes/apple-pie/

    • URL parameters: /products/shoes?color=red vs /products/shoes?size=10&color=red

  • Content Management Systems (CMS): Some CMS platforms automatically generate multiple versions of the same page.

  • Mobile Versions: Separate desktop and mobile versions of a page can be seen as duplicates if not handled correctly.

  • Protocol Variations: http:// and https:// versions of a page.

  • Session IDs: Dynamically generated URLs that include session IDs for tracking.

How Google Handles Canonicalization:

  1. Identification: Google crawls your website and analyzes the content to identify potential duplicate pages.

  2. Signals: Google considers various signals to determine the preferred version, including:

    • rel="canonical" tag: This HTML tag explicitly tells Google which URL you prefer. Example:

    <link rel="canonical" href="https://www.example.com/recipes/apple-pie/" />
    • HTTP 301 redirects: Redirect all duplicate URLs to the canonical URL. Example (PHP):

    <?php
    header("HTTP/1.1 301 Moved Permanently");
    header("Location: https://www.example.com/recipes/apple-pie/");
    exit();
    ?>
    • Sitemap: Submitting an XML sitemap with clear canonical URLs helps guide Google. Example:

    <url>
      <loc>https://www.example.com/recipes/apple-pie/</loc> 
      <changefreq>weekly</changefreq> 
      <priority>0.8</priority>
    </url>
    • HTTPS: Google prefers HTTPS versions of a page over HTTP.

    • Content Quality: If other signals are equal, Google might choose the page with better content quality.

  3. Selection: Google selects the page with the strongest signals as the canonical version.

  4. Indexing & Ranking: Google primarily indexes and ranks the canonical version. Duplicate pages might still be indexed but crawled less frequently.

Important Notes:

  • Google's Decision: While you can suggest your preferred canonical URL, Google might choose a different page if it believes it's more beneficial for users.

  • Language Variations: Different language versions of a page are not considered duplicates.

  • Mobile-First Indexing: If you have separate mobile and desktop versions, ensure the mobile version is properly canonicalized for mobile users.

By understanding and implementing canonicalization best practices, you can ensure Google understands your content structure, avoids duplicate content penalties, and delivers the best possible search experience for your users.

PreviousGoogle's Interpretation of Robots.txtNextSpecifying Canonicals Using rel="canonical" and Other Methods

Last updated 11 months ago