Home --> Blogging SEO --> 2 Simple Ways To Avoid Duplicate Content With WordPress
2 Simple Ways To Avoid Duplicate Content With WordPress
I am a big fan of WordPress mainly due to its functionality and extensibility but it has one major drawback and that is duplicate content. Using WordPress for your blog, if not setup correctly, can be a big challenge if you don’t plan and research properly.
From an SEO standpoint, one of the worst possible things that you can do is allow duplicate content, because if Google in particular recognizes duplicate content on your blog, all your other SEO efforts immediately go to waste.
Duplicate content on another domain is something you definitely want to avoid. There is some debate about the algorithms that Google uses to check for duplicate content. It would appear however that Google, and most other search engines now determine who published the content originally and give rank to the original copy.
Duplicate content on your own site though is a completely different situation and will have heavy consequences to your overall SEO efforts. So you need to plan in advance and take appropriate measures to ensure that search engines only index one instance of every page on your site.
2 Simple Methods To Avoid Duplicate Content
There are 2 very simple methods you can and should use right now to avoid the duplicate content issue on your blog. Setting a custom permalink structure, and using a robots file.
What is a permalink?
A permalink is simply the name given to the url structure used by your blog. By default when you first install WordPress the permalink structure is pretty awful. From an SEO perspective its in fact pretty useless and counterproductive.
The default permalink structure will dynamically assign a unique ID to each of your posts. For instance: http://incomesinternational.com/?p=123
So as you can see this is not very helpful to the search engines or to people searching for your content. There are some basic default option settings that you can use but this doesn’t help much in the way of eliminating the duplicate content.
The main problem is that when the spider crawls your site it can access the same page from various entry points. That could be any one of archives, category, tag, or even author. That means that when you include the original article, the spiders can in fact index 5 copies of the exact same page.
If you want to make sure that only one instance of each page is indexed, simply set your permalinks to only use your domain name and the post name.
To do this go to the settings menu in your WordPress admin area and select the Permalink option. Select the ‘Custom’ setting and insert /%postname%/ in the text box. The resulting url now looks like: http://incomesinternational.com/get-this-fantastic-book-for-free/ . So you can see here that there are massive implications to making this simple change.
There are several advantages to this method over other methods of custom permalink structures. The main advantage is that no matter how the spider crawls your site it will only ever resolve to the same url for each page. It doesn’t matter if the spider comes in via your archives, tags, categories or authors because every time the page is indexed it will always be at the same location from the spiders perspective.
The robots.txt File
If your not familiar with what this is or how it works let me explain quickly. A robots.txt file is a simple text file that tells the search engine spiders which urls it can and can’t follow and index. An example of a robots.txt for a WordPress blog is:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
What this does is simply tell the robot not to crawl any of these file or directory paths. You should always be certain to include the cache. In fact the example above should be the very minimum you use for your WordPress blog. This will not only tell the spiders what not to crawl but will also help in giving more page rank equity to you important pages but thats another topic so I won’t go too deep into that here.
So you might decide that you want to use a permalink structure of /%category%/%postname%/. In that case you would adjust your robots.txt file to accomodate this to ensure that the spiders always crawl the same path to any given page.
It’s always a great idea to review your site and see what the spiders see on a regular basis to ensure that you don’t get penalized for duplicate content. Doing this can mean the difference between showig up on the first page of a Google search or on page 47. I know where I’d rather be.
Also take a look at the Conanical URL plugin. This is a very simple way to help in the fight against duplicate content issues.
I recommend this watching this clear and informative explanation by Matt Cutts of Google.
[pro-player]http://www.youtube.com/watch?v=Cm9onOGTgeM[/pro-player]














Who’s Talking