How to Combat Content Scrapers

by Kevin on July 28, 2009

It isn’t that hard for people to steal your content. Whether they opt to use an automated method or simply copying and pasting your content into another site, you can have your entire blog replicated within a matter of minutes after publishing. It’s a problem that all bloggers face.

The only way to truly combat the problem is by addressing it as soon as it arises. Sure, there may be times where it works in your favor, but that is only when there are links back to your site. Most times, it is a disappointment when you see others reading your content on blogs that have aggregated, or more appropriately, stolen your content.

Organizations have been setup to help combat the problem, but there are so many cases that it may never be fully addressed.

Depending on how often your content is stolen and to what extent your intellectual property has been infringed upon, you may have to pursue further legal action other than the steps below.

1. Determine the Sites that Have Stolen Content

The easiest part of the entire process is determining who has been stealing your content and in what manner. A Google search using text contained within a post is the best starting place. Enclose a group of sentences within quotes, then follow through to determine how many of these sites are actually replicating the content or taking a brief excerpt of it.

Obviously, doing a search for each of your posts, especially over the course of years can add up to a lot of wasted time. I also wouldn’t confront every individual offender, due to the fact that new cases spring up every day. It’s much like how you eliminate spam – try to attack as many of the biggest offenders as possible, then work your way down to the smaller ones.

Additionally, don’t stop with just your text-based content. If you have original images on your site that haven’t been licensed for redistribution or re-use, then you can also send notices out about these images.

2. Contact the Website Owner(s)

The next step is to contact the website owner(s) that may be harvesting your content. This may be a difficult process, but you won’t have too far to look in order to get in reach with someone that can handle the matter.

First of all, you want to look on their contact page, often found directly at theirsite.com/contact/. Otherwise, you’ll need to search the site for an address that they can be reached at.

A polite email should be sent to the owner of the website with your clear position that you do not want your content appearing on their site. I would suggest starting with this resource page with stock letters (PlagiarismToday).

Whois information is also publicly accessible, but you may again have trouble if the content scrapers know what they are doing. Often, it doesn’t cost too much more to completely block this information, and in some cases, fake addresses and contact information is supplied.

3. Go to the Hosting Provider

Hosting providers generally have to comply with the requests of those who feel their copyrights have been infringed upon. Again, some of the content scrapers operate from their own servers, so it can be a difficult process to finally get through to someone who can disable access to the particular pages or entire websites.

A list of business addresses, both online and offline, can be found on the U.S. Copyright Office Service Provider page. Find the name of the website hosting company on the list and directly contact them this way if your initial email request did not work.

Don’t expect the hosting company to take your request too seriously if you don’t act professional about it. Include all appropriate information, including the pages or websites that are infringing your copyright, proof that you in fact own the rights to the content, and your contact information. Please be aware that anything that you state that isn’t true could wind yourself up in court.

Most professional hosting companies can be reached for further information and notification of DMCA abuse at the following addresses: abuse, dmca, legal, copyright, or support preceding the hosting company’s address.

4. Contact the Registar

The registrar, in some respects, has more control over the websites than the hosting company. Unless you have your own registration service, you must go through one of the main companies that registers domains. These companies are able to suspend any website, given they have the reason to do so.

You can find copyright information for some of the leading registrars at the following links:

Much of the contact information for domain registration companies can be found within their footer or through a legal/contact page.

Take the time to read through their copyright policy, if they have one. See what steps they take if they are notified of copyright infringements.

Not all registration companies will comply with your requests, as they often go through the hosting companies and aren’t the people you should be contacting directly.

You want to replicate steps two and three (sending a DMCA take-down request) with each company that has control over the domain and website.

5. Contact the Search Engines

Often, you’ll find your content on sites with meaningless content, little original material, and a domain that isn’t particularly great. However, the links inside the site are what help the site get higher rankings, plus any sites that are in the network will also be increasing rankings. It truly is a difficult scenario to imagine, but it is happening day in and day out.

When you eliminate the search engines from the picture, most of these sites will fall pretty quickly. It doesn’t mean that they’ll stop scraping your content, but fewer people will find the sites, link back to them, thinking it is original content, and you may get some justice.

You can find the contact information for the top search engines here: Google, Yahoo, Bing/Microsoft, and Ask/IAC. Because these search engines are all located in the United States, they must comply with the Digital Millennium Copyright Act. They will usually contact the website owner to remove the content, but may automatically remove the entire site if the case is more severe.

Your Fight Against Content “Aggregators”

Although it can take some time and effort before you finally see action against your content that has been stolen, this process is much easier than going through an attorney to get justice. In most cases, some of the steps won’t get you anywhere, but the entire process will.

Licensing your content and posting a copyright notice on your site can also help. For example, set a clear limit on how much of your content can be “excerpted” on other sites, and how it must be attributed/linked back to. Scraper sites often have clever ways of stealing your content, breaking it apart and not linking to it.

Finally, don’t let your blog be bait for scrapers. Include links back to other articles, and text at the bottom of individual posts and within your feed that tell anyone reading the article on scraped sites whose content it really is.

Leave your comment

Required.

Required. Not published.

If you have one.