How Exactly Does Google Handle Duplicate Content?

How Google Treats Duplicate Content
The internet of today is filled with content. More content is added every day than can be possibly read. But to be frankly honest, not all the content that is published is original - far from it. Duplicate content effects on the search rankings on a web page - but how much effect does it actually have? A lot of bloggers and website owners ask this question every day, concerned by the content they've (necessarily or otherwise) copied, or has been copied from them. Today, we'll try to answer this question in light of advice from Google itself.

Duplicate content happens a lot!

Alas, it's inevitable. Content continues to be duplicated today. In fact, about 30% of the content on the web today is duplicate. And this just counts exact duplication. If you consider article spinning and other such practices, then this number will be high, pretty high. Maybe even up to 50% (or more, nobody can be sure).

What you can be sure about is that, your content always runs the risk of being duplicated. The more popular you are, the likelier you are to find your content duplicated.

But...

...not all duplicate content is copyrighted or plagiarised! Sometimes, people will quote a paragraph or a few lines from another blog or website, say from a news/press release. Often times, websites have canonical versions of their pages for different regions on different domains, such as .com and .co.uk. You can also find duplicate versions of important pages, such as Terms of Services (ToS) pages.

So how exactly does Google treat duplicate content?

So this means that Google does not - nay, can not penalize every website with duplicate content, because if it did, that would adversely effect its search quality. So the solution? The solution lies in grouping all duplicate content together, and then showing the best contender in search results. For example, if BBC published a breaking news, and 10 other news sites copied that news release, Google will detect duplicate content, and group all of those 10 releases (plus the 11th BBC's original release) together, and then a best candidate will be chosen from the group, and get the top spot on Google. The other pages will be pushed back quite low in the search results.

But how does Google know which contender was the original/best source? I'm glad you asked - it is an extremely important question. This problem is usually solved keeping many factors in mind. These include the authority of each source (PageRank etc.), the date and time the content was published on each page, the structure of each copy (i.e. what is the nature of the keywords and links in the content - are they relevant to the host page? Elements such as internal links will be most relevant on the original source rather than the copies) etc.

So if you are the original source, you need not worry. If you've copied from somewhere, it won't result in a penalty, but you won't really benefit from it.

But what about people who simply copy RSS feeds, or posts from other sites to publish on their own? Well, in that case, Google has to step in. If you've a blog hosed on Blogger, blatant plagiarism will get your blog banned. Someone might file a copyrights violation charge against you, which can only mean trouble. And even if you manage to avert such cases, you'll eventually see a ranking drop in search results. The effect won't be apparent, but given time, it'll become profound.

Did that clear the questions in your mind? If you still have more, you know where to ask them :) Cheers.

If you don't want to get yourself into Serious Technical Trouble while editing your Blog Template then just sit back and relax and let us do the Job for you at a fairly reasonable cost. Submit your order details by Clicking Here »

7 comments

PLEASE NOTE:
We have Zero Tolerance to Spam. Chessy Comments and Comments with 'Links' will be deleted immediately upon our review.
  1. Thanks Qasim Zaib Brother. I am a regular reader of MBT and I read each and every article with full concentration. I have learnt a lot from this blog and spreading to all my friends. The information you provided above is much useful for new bloggers as well as for older ones. My question is that if google makes grouping between the websites that posted the same news then what website would be on number 1 in search results? The original post generating website or the other one? Also If I have a new website but I post unique content and earlier than anyone then what about my blog position in search engine because you said that in grouping search engine selects a website that have pagerank etc ? I am in confusion now, please can you satisfy me?

    ReplyDelete
    Replies
    1. Thank you for your support dear, really appreciate it.

      As far as your first question goes, Google groups all similar content together, and then displays the original in search results. The originality of content is determined by many factors, such as internal link structure, publish date and time, relevancy, quality etc

      I can see why you'd be worried about other higher PR websites copying your content and passing it as their own. Well, as I just said, it doesn't just consider PR. There are many other factors at play too. There is the date and time of publishing. I also mentioned link structure. What that means is, if you've written a nice article, and put some relevant internal links inside your article, then those links will only be relevant to your website. If someone auto-blogs your content (e.g. via an RSS Feed), then they will get all those links too, which won't be relevant to THEIR website. Hence, that's one good reason to put internal links inside your blog posts - others will have a hard time proving that the content is their's. And even if they remove those links, your article would still be more relevant to your site (based on keywords etc), not to mention that you'll have an earlier publish date and time.

      Hope that helps :)

      Delete
  2. Hi!
    i have create my business blog on blogger, and i have create static home page of my blog. i want to know that how can i create and submit sitemap in google webmaster tool. please help me please i am very worried about my blog. only home page is index in google, how can i get index whole pages of my business blog.

    ReplyDelete
    Replies
    1. Check out this tool of ours buddy
      http://tools.mybloggertricks.com/generator/sitemap.html
      Hope it helps :)

      Delete
  3. I will keep your suggestion to improve my content and thanks for sharing such a informative information with us. Next time when I write content for any business surely make sure these things.

    Online Plagiarism Detection Software

    ReplyDelete