URL structure

URL Structure Best Practices for Google

This document outlines best practices for structuring URLs to ensure optimal crawling and indexing by Google.

URL Encoding and Character Usage

Google adheres to the URL standards defined in RFC 3986.

  • Reserved Characters: Characters designated as "reserved" by RFC 3986 must be percent-encoded.

  • Unreserved ASCII Characters: Unreserved ASCII characters can remain in their non-encoded form.

  • Non-ASCII Characters: Characters outside the ASCII range should be UTF-8 encoded.

Example:

https://www.example.com/search?q=caf%C3%A9+au+lait  // Encoded "é" in "café" 

URL Clarity and Readability

Prioritize clear and concise URLs that use human-readable words instead of lengthy ID numbers.

Recommended:

  • /products/coffee-maker

  • /blog/best-brewing-methods

Not Recommended:

  • /products/1234567890

  • /p?id=87654321

UTF-8 Encoding for Internationalization

Utilize UTF-8 encoding to represent characters from various languages and enhance URL readability across regions.

Examples:

  • Arabic: /القهوة/أنواع

  • Chinese: /产品/咖啡机

  • German: /produkte/kaffeemaschine

  • Emoji: /products/☕/french-press

Geotargeting URL Structure

For websites targeting multiple regions, implement a URL structure that facilitates efficient geotargeting.

Recommended:

  • Country-specific domain: https://www.example.de (Germany)

  • Country-specific subdirectory: https://www.example.com/de/ (Germany)

Hyphens for Word Separation

Employ hyphens as word separators within URLs. Hyphens improve readability for both users and search engines.

Recommended:

  • /red-tea-kettle

Not Recommended:

  • /red_tea_kettle

  • /redteakettle

Common URL Structure Issues and Solutions

Excessively complex or poorly structured URLs can hinder Google's ability to crawl and index a website effectively. This section highlights common URL pitfalls and offers solutions.

Issues Leading to URL Bloat:

  • Additive Filtering: When filters on category pages can be combined without limits (e.g., "hotels + beach + pool + pet-friendly"), it creates an exponential growth of URLs, many pointing to very similar content.

  • Dynamic Document Generation: Using timestamps, counters, or dynamically injected content can generate a high volume of URLs for essentially the same page.

  • Problematic URL Parameters: Session IDs, unnecessary tracking parameters, and similar elements contribute to URL bloat and duplication.

  • Sorting Parameters: Offering numerous sorting options for the same product set (e.g., price, popularity) unnecessarily inflates URL count.

  • Irrelevant Parameters: Including parameters like referral tags that don't change page content creates URL variations without adding value.

  • Calendar Issues: Dynamic calendars often generate URLs for an endless range of future and past dates, leading to a vast and often irrelevant URL space.

  • Broken Relative Links: Malformed relative links, especially with repeating path elements, can create infinite URL loops.

  • Simplify URL Structure: Organize content to allow for logical, human-understandable URLs.

  • Utilize robots.txt: Strategically block Googlebot's access to problematic URL patterns using robots.txt, particularly for dynamic URLs or those prone to infinite spaces.

  • Avoid Session IDs: Replace session IDs in URLs with cookies for session management.

  • Standardize Case: If the web server treats upper and lowercase characters the same in URLs, opt for consistent casing (all lowercase is generally recommended) to prevent duplicate content issues.

  • Shorten URLs: Eliminate unnecessary parameters and keep URLs concise.

  • Nofollow Future Calendar Links: For sites with dynamic calendars, add the "nofollow" attribute to links pointing to future calendar pages that haven't been generated yet.

  • Regularly Check for Broken Links: Implement a process for identifying and fixing broken relative links to avoid crawl inefficiencies.

Last updated