If your blog has over 100 subscribers on it, it is pretty likely that your blog is probably getting scraped, as they seem to be everywhere anymore.   Hack WordPress currently has 3 blogs scraping its content every time we publish a post, so we’ve become pretty familiar with them.   But that is not actually what I’m referring to when I came up with the title of this post.

What I’m talking about are people that manually reprint an entire post on their “legitimate” blogs.  One blog that has been doing this to me (and I assume others) is a blog called WordPress Collection dot com.  The link was intentionally left out.

The two most recent posts (as of publishing this post) are identical copies of two posts in our archives.  Here are our posts:

Now, to be fair, I fully understand when dealing with code that it really doesn’t change, so I have no problem with someone taking one of our code hacks, or someone else’s posts, and publishing it on their blog.  We have done this on occasion, but we always have a fresh post where only the code is the same, and we link to the source.   Unfortunately with this site, it uses the exact same title and exact same content as our posts, word for word.   The only thing I could find changed is the permalink in the code, which was switched to the permalink of their blog.

So, when does content theft go to far?  I’ve never liked scrapers but I have learned to live with them.  They usually only reprint a excerpt of the post and link to the source.  This, however, is going to far in my opinion.   At least change the wording around the code and link to the source.  As a result, this website has been added to my blacklist of sites I don’t visit or link to, etc.

I’ve seen people mention contacting their web hosting as a good method to deal with content theft when something like this gets out of hand.   What methods have you used to deal with content theft?

Want automatic updates? Subscribe to our RSS feed or
Get Email Updates sent directly to your inbox!
Tweet This | Digg This | Stumble it | Add to Del.icio.us | | Print This

Kyle Eslick

Kyle Eslick is the founder and primary author of WordPress Hacks. You can learn more about him at KyleEslick.com or you can follow his personal tweets here.

There Are 11 Responses So Far »

  1. As long as you include a good number of deep-links, that should make content scraping/stealing at least somewhat beneficial. (Or are “those sites” stripping out all the links?)

  2. Bizhack says:

    This sort of thing infuriates me. It is a breach of copyright and you should write to them (as well as the ISP) and warn them to back off. You should also have a clear terms and conditions policy which makes clear what people can do with the info they read on this site.

    Perhaps everyone should start writing their thoughts
    in their comments section – although they don’t desrve the hits!

    I presume they won’t be copying this article…

  3. Zhu says:

    I deal with scrapers everyday, which is quite funny sometimes because my blog about my life in Canada as a French, therefor it’s pretty personal… yet some people steal my articles! :lol:

    Used to bother me but now I just leave it. I guess I think I’m “established” enough to be my own authority.

  4. Michael says:

    content scrapers are the worst, I personally believe that as long as they are not copying the whole post and are linking to you it is fine. I know I’ve quoted sections of other peoples posts but I do link and I’m adding to the story or commenting on they opinion. Probably the best way to get them to stop is to complain to Adsense about them and email the scrapers to let them know that you did, hitting them in the pocket book is the best way to get them to stop.

  5. Leland says:

    I don’t mind the short excerpt posts with a linkback to the source, but when they completely rip a post with no credit whatsoever – that gets annoying. I’ve even seen some people rewriting some of my posts, removing any “deeplinks” to my site – even using images I’ve created and stripping the watermark.

    So it’s not just automatic “splogs” doing this – but also done manually. I don’t think you can stop this with any sort of .htaccess hack (unless you find out their IP), so it’s best to read up on DMCA and reporting to Adsense, web hosts, etc. Does anyone know any effective ways of doing this?

    Luckily Google’s duplicate content filters don’t even index these pages, so hopefully it’s not hurting much in the search engines.

  6. Kyle Eslick says:

    @ John – These are older posts from late 2007. Back then I didn’t have many internal links within my post, but I imagine they would have been stripped if that was the case.

    @ Bizhack – Though I believe copyright protection is an international rule, I also have a copyright notice in the footer. A copyright page could be beneficial, but I’m not sure it is necessary as everything sort of fits under the common sense umbrella. With that said, I will certainly consider it.

    @ Zhu – Yeah, I have always felt that getting scraped meant I was getting noticed by people. :D

    Sadly, I think the manual reposting of content is taking things way to far.

    @ Michael and Leland – Great points, and something I will consider. I don’t even remmember seeing AdSense, but my browser may have been blocking that script on their site.

    I think I will probably monitor the site to see if the pattern continues before going to that extreme. Hopefully the author will remove the posts in question and discontinue this practice going forward.

  7. Laurence says:

    Yes this is incredibly frustrating. Maybe you could write an article on here describing ways to avoid this happening? That would be most useful.

    And yes, manual scrapers are the worst!

  8. Kyle Eslick says:

    @ Laurence – I’m not sure that I’m an authority on avoiding this as it happened to me, but I can certainly think of a few things that could help which might make for a good post.

    I will credit you with the idea! :mrgreen:

  9. PW says:

    Hi,
    how about sending some other content to the scrapers?
    No searchengine loves spam like viagra for example.
    Just sotre the ip addresses of the spammers and send them another feed than the original one.
    Could be done with some .htaccess tricks and believe me, they will stop stealing your content and stop ranking ;-)

    best regards,
    Pete

  10. Well, on one side note, I think the “100 readers” estimation is a bit optimistic. I’ve done testing and found that it can start from the first post depending on what your topic is. Scrapers will grab anything with the keywords they like and target your feed specifically if it grows large enough.

    Regarding how I resolve such instances, if it is a human I try to resolve things with a cease and desist letter or some other face to face interaction. If they don’t respond or I can’t get in contact with them, then I go to the host with a DMCA notice.

    The latter is becoming more common though. It is a tradition these days to keep contact information off of such sites, especially when using sites such as WordPress.com or Blogger.

    If you’re going to copy and paste content, make sure you can be reached easily…

  11. Travis says:

    I have found 4 sites that have in one way or another scraped my content. It wouldn’t be so bad if it was just articles they were scraping off of but 2 sites which are from the same owner completely copied and pasted all of my original content onto their sites and are trying to use it to sell their products which happen to be the same type of products as mine. I am going through the legal process to take care of them. Another site registered a domain name one letter off of mine with a similar product to try and scrape my traffic with mispellings, one call to him and he is going to sell me the domain name so that is cool. Threatening legal action can work.
    The 4th site took all of my content and spun the text to be different but it is all laid out and flowing the same way. I came across this site and thought WOW! deja vu. They tried to tell me they got their content from a corporate website related to their franchise and there is no corporate site to get it from. So again I am going the legal route.
    I don’t have any sympathy for someone that steals content for their own personal gain. Especially when its a competitor doing it.

Trackbacks/Pingbacks »

Leave a Reply