What is Duplicate Content and How Does it Impact SEO?

What is Duplicate Content?

Duplicate content is when the same or similar content appears across more than one URL. There is exact 1:1 duplicate content that can be created in many ways (which we’ll get into below) and similar content that contains overlapping search intent.

Both instances of duplicate content can negatively impact search rankings and performance.

How Does Duplicate Content Impact SEO Performance?

Duplicate content can negatively impact search visibility for a number of reasons, as search engines have a difficult time deciding which version of the content to show users. As a result, the duplicate and/or similar content can create:

Internal Competition - Search engines won’t know what page to rank if they are the same or similar. This may also create a confusing user experience as they won’t know which version of the page to click on from the SERPs.
Wasted Crawl Budget - If there are numerous pages containing duplicate content and you only want 1 indexed, crawlers will still crawl all duplicate variants which can take time away from them crawling non-duplicate, important pages. Learn more about what crawl budget is and how to optimize it.
Diluted Link Equity - External and internal links may point to different variants of the page since there may be confusion on which one to link to. This will split the link equity across multiple pages, rather than driving to the 1 page you want indexed and ranking.

How is Duplicate Content Created?

There are numerous ways duplicate content can be created. Oftentimes it’s accidental, however, it should still be addressed.

These example URLs seem the same to people but they are technically different to search engines.

https://seerinteractive.com
https://www.seerinteractive.com
https://seerinteractive.com/index.html
https://www.seerinteractive.com/index.html
http://www.seerinteractive.com/index.html
https://www.seerinteractive.com/INDEX.html

Below are some common ways duplicate content is created:

Http vs. Https

This occurs when a site is accessible in both http and https. A common reason this happens is if your site moved to https and you didn’t properly migrate (301 redirect) the http version to the https version, you will experience duplicate content issues. Explore Seer’s Migration Checklist for more information.

https://www.seerinteractive.com
http://www.seerinteractive.com

Www vs non-www

This occurs when a site is accessible in both www and non-www. It’s important that a site only resolves to one version, either www or non-www. Having both variations live creates duplicate content.

https://seerinteractive.com
https://www.seerinteractive.com

Mixed Case URLs

Mixed casing duplication can occur when URLs are accessible with upper and lowercase characters. Ideally, URLs should always resolve to lowercase characters to avoid duplicate content issues.

https://seerinteractive.com/Blog
https://www.seerinteractive.com/blog

Trailing vs Non-Trailing Slash URLs and/or Multiple URL Endings

Duplication from trailing and non-trailing slashes can occur when URLs have multiple, inconsistent endings. This can also occur when URLs end in /index. .html, .aspx, etc. along with other variations. URLs should only be accessible with one URL ending.

https://seerinteractive.com/blog
https://www.seerinteractive.com/blog/

Parameters

Parameters can be used for multiple reasons. Common uses for parameters is to change on-page content through facets and filters and for tracking purposes. Oftentimes parameterized URLs can create thin content that serves little value to search engines.

Duplicate/thin content generated from facets can be carefully handled through canonical tags, noindex tags, robots.txt blocking, or a combination of these elements.

In the example below, canonicalizing the parameterized URL to the clean URL would likely make the most sense.

Parameterized URL: https://www.example.com/rugs/floor-rugs?brand_name=21688&nav_color=19342&nav_price=11334&nav_style=11609
Clean URL: https://www.example.com/rugs/floor-rugs

Near Duplicates

Near duplicate content occurs when there are similar pages on a site that share the same search intent.

For example, if we have a landing page about “Why is SEO Important” and a separate landing page about “Benefits of SEO”, these two pages are likely going to compete against each other for the same search terms. Rather than having two pages with similar content themes, we should combine them into 1 landing page.

How to Resolve Duplicate Content

For HTTP vs HTTPS duplicate content issues, you should implement 301 redirects from the HTTP URL variation to the HTTPs URL. It’s important that the HTTP pages are carefully redirected over to their HTTPs reciprocal to avoid equity loss and create a poor user experience.
For www and non-www, mixed casing, and trailing vs non-trailing slash URLs, you can implement a server-side 301 redirect to force the URLs to one URL variation.
- If redirects are not an option or the duplicate pages are needed under multiple site sections, you can canonicalize the lower performing page to the higher-performing page. This will signal to Google that only one of the pages should be indexed and ranked.
  - Canonical tags should only be used for 1:1 duplicates. Learn more about canonical tags and their usage here.
  - If there are numerous pages of duplicate content, implementing a canonical strategy will not conserve crawl budget.
How to Resolve Similar Content: If the pages are not 1:1 duplicates but share similar search intent, consider merging any unique, relevant content from the redirected page to the destination landing page to create one strong landing page. 301 redirects should be used to consolidate the similar pages into 1.

*For all duplicate content solutions, remember to update internal links and the XML sitemap to only include the canonicalized or redirect URL destination.

How to Decide What URLs to Consolidate Duplicate Content To?

As mentioned above, consolidating content should be done through 301 redirecting or canonicalizing the lower performing page to the higher-performing page.

Below are some metrics to consider when looking for the higher-performing page to consolidate content to: