Last Updated on
You have probably heard about duplicate content and having too many of it is bad for your site. The big question is what is duplicate content and how easy (or difficult) it is to find and solve them?
We have summarized them all for you below.
What is duplicate Content?
Duplicate content is created when the same content appears on different URLs. Google does not treat duplicate content kindly and even though Google does try to identify which one is the original, the page will likely not rank as highly as when there are no duplicates.
- www vs. non-www
- Trailing slashes (/)
- Staging server
- http vs. https
- Uppercase vs. lowercase
- Home page duplicate
- Session ID
- Affiliate tracking
- Duplicate paths
- Functional parameters
- International duplicates
- Search sorts / filters
- Product variations
Which is better www or non-www?
From SEO perspective, there is no difference between the two as long as your site has a consistent URL structure. The default WordPress setting is non-www and it is recommended to keep it unless you are technical enough to change some of the settings. Also, WordPress comes with redirect from www to non-www so you don’t need to worry about creating that specific type of duplicate content.
In some cases though, depending on hosting environment, you might need to tweak something at the server level to make this work.
How to find duplicate content?
What we need is a list of all of the posts on your site. Unless you have a well documented list handy, the easiest way to obtain the list is to use a crawler such as Screaming Frog.
What WordPress plugin can I use to find duplicate content?
Alternatively, we can use “Fix duplicates” plugin. It will find any potential duplicate content for you. It also let’s delete one of the duplicates but DO NOT do this unless you know one of them has no backlink equity.
Ideally, you should implement a 301 redirect instead. The premium version of this plugin comes with a redirect function but if you don’t mind doing a bit of work (really a tiny bit), by installing “Redirection” plugin, you can apply 301 redirect very easily.
How to handle duplicate content?
It really depends on the type of content we are talking about. Generally speaking, applying a 301 redirect is the way to go but if you have an ecommerce site offering products with different colors, applying canonical tags would be a better solution.
On the other hand, “noindex, follow” meta robots tag can come in handy for duplicate pages that search engines don’t like to index, such as search result pages.
So essentially there are three ways to handle the duplicate content.
1) Apply 301 redirects
This method is mainly used for www vs. non-www and http vs. https type of duplicate content.
2) Apply canonical tags
Mainly used on eCommerce sites where products with various sizes and colors are sold. In many cases, variable URL structure is utilized for each product.
As you can image 301 redirects cannot be used in this situation because that will prevent users from reaching the product size or product color page they are looking for.
3) Apply “noindex, follow” meta robots tag
Mainly used for pages that you want them to be indexed on search engines such as search result pages.
How to apply 301 Redirects?
Let’s talk about the first option you have on disposal to get rid of duplicate content, the 301 redirect, considered by many SEO people as the single method which can make all of your worries go away.
In reality this type of duplicate content handling is used to solve any “www vs. non-www” or “http vs https” issues, and is usually enough for sites where there’s no huge variety of contents.
The whole point of the 301 redirect is to allow visitors to see only one of your duplicate pages. This should be your most authoritative page, the one with most links pointing to it.
You are probably guessing now, why canonical urls are a better solution for eCommerce sites rather than the 301 redirect usage, but we’ll talk more about this later on.
Although this is not the perfect solution for all situations you may find yourself in, you must remember nothing can be considered as “an ultimate practice” when we talk about SEO. You’ll use more than one method when trying to improve any element of your on or off site SEO almost 99% of the time.
The good thing about the 301 redirect is that it allows you to have multiple pages combined into a single one, which improves your chances of higher ranking for a specific keyword.
Another positive side of this method is the ease of implementation. Contrary to other practices you wont need almost any coding knowledge to use a 301 redirect, heck I’ve placed it on numerous pages on the first site I’ve created and I had zero to none coding experience back then. All because this informative yet simple to understand guide. .
What WordPress plugin can I use to apply 301 redirects?
“Redirection” plugin is the easiest solution when you want to apply page to page 301 redirects.
What you need is to define which page is redirected from (Old URL) to which new destination page (New URL).
How to apply canonical tags?
Canonical tagging provides a completely different user experience. It retains both the duplicate and the original content yet the tagging tells search engines which one is the original content.
This is why canonical tag is a smart choice for individuals operating eCommerce sites and the reasoning behind this can be explained with a simple example.
Let’s say you operate a site that sells or resells sweaters. All of your content is about sweaters and each product type can be found in different colors, sizes, etc. The bad thing here would be to use a 301 redirect solely because it will affect your user experience.
For example, a visitor has found the type of sweater he wants to purchase yet he wants to choose a different color. If you use a 301 redirect to resolve your own issues with duplicate content, the user will be redirected towards your “main sweater page”, which is something he wasn’t looking for.
That’s why canonical tags have been invented, I guess. When you use canonical tagging, your visitor won’t experience any noticeable changes to his experience and on top of all he’ll get the information he needed in the first place.
You still need to decide what will be the main page you want to use in your canonical strategy and you should be really smart about this, depending on the type of your site and the niche you operate in.
Adding a canonical tag to your page is not difficult. Here’s the single line of code you need to add to your header section of the page.
<link href=”http://www.example.com/canonical-version-of-page/” rel=”canonical” />
The code should be placed on the duplicate, even though it is a best practice to self-canonicalize as well. The self-canonicalization will prevent any potential duplicate content with variable URLs that are created by many eCommerce software packages.
What WordPress plugin can I use to apply canonical tag?
SEO Ultimate plugin has a module that allows you to add self-canonicalization on the entire site with a click of the button!
How to apply meta robots tag “noindex, follow”?
Meta robots tag is often used by blogs and pages which feature duplicate content in the form of multiple categories, author bios and blog posts published on more than one page.
What this tag does is tell search engine bots not to index a particular page, yet crawl all the links on it.
What WordPress plugin can I use to apply meta robots tag?
The easiest way to add a “noindex, follow” tag to a certain page or group of pages is by using Yoast’s SEO plugin for WordPress (only if you use WP as your publishing platform).
Just open the editing box for the page where you want to use the tag and scroll down to the Yoast SEO for WordPress plugin. Go to the “advanced features” section and select “noindex” from the “meta robots index” drop down menu. Make sure you’ve checked the “follow” box under the “meta robots follow” section.
What is the best practices for duplicate content?
Applying the three methods we described above will resolve most of your duplicate content headaches.
You also need to think about the fact that your site is destined to expand and scale in content, so it’s smart to apply some day-to-day practices that will minimize the need of manual duplicate content handling. Here are couple of simple practices that will help you achieve this:
- Internal linking consistency. Stick with the preferred, “www” or “no www”, version of your site.
- Minimize similar content. Your homepage is your blog yet you still have a separate page titled “blog” where you publish all of your recent posts. Why?
- Use Webmaster tools. If you find some duplicate content from your site while browsing through search engines, it’s a good practice to remove these pages from the search engine’s index by using Webmaster tools.
What’s Matt Cutts take on duplicate content? – “Google will just choose one for ranking as long as they are not spammy”
Matt Cutts, the ex-head of the web spam team at Google used to answer questions asked by the community. Couple of years ago he addressed the issue of duplicate content and showed why handling it is so important yet it’s not so bad to have it on your site.
Don’t get this wrong, you shouldn’t just stop caring about duplicate content, but there will be situations when you’ll not be able to avoid it. In fact, a lot of web contents are actually duplicate content. Google has evolved to a point that it will just present one high on the ranking and ignore all the rest.
This is why using 301 or canonical tag is important because you want to be in charge of choosing which page to rank higher and not Google.
Another good example is elements like repetitive “terms and conditions” sheets for multiple products on eCommerce sites, which are usually ignored by search engine bots as they are legal and a must have for those pages.
According to Matt you should be worried about your duplicate content only when it’s spammy or keyword stuffing.
Another statistical number you need to take into consideration is the fact how between 25% to 30% of the content found on the Internet is duplicate, and that’s OK with Google because they will choose the most quality piece of content and publish it on the first page of their search results.
Are company / legal information duplicate bad?
According to this Matt Cutts video (below), Google knows these kind of duplicate contents are inevitable so they should not hurt your site ranking.
What if somebody is scraping your content?
There are people out there trying to make easy money through implementation of black hat SEO. One of the most heavily used black hat practices is content scraping.
Imagine how you have published hundreds of unique product reviews on your site and one day a page appears from nowhere, featuring the exact same reviews ranking even better than you on Google. This can easily happen and it should not affect your ranking efforts in most cases, but it could happen.
If you notice somebody using your own content and ranking better than yourself, you should file an official content removal request to Google. For more resources on this matter visit this link.
How to prevent content scraping?
Many software being used for scraping look for Atom or RSS feed. You can simply restrict how much content you show on the feed to prevent this kind of scraping.
We recommend using Feedburner to have a better control of your feed. You can use the “Summary burner” function to create a summary and not showing the entire post.