Content scraping – the final solution and reality check

    There are many distinctions between screen scraping and web scraping. I would like to function web scraper. A scaper in this context is one that pull information from another site in the hopes of ranking high. Somtimes they will do it through webcrawlers/webspiders/webrobots/ants/automatic indexes/worms ect…

    rss feeds are a scraper delight, does echoe,echoe,echoe, ring any bells how about fix,fix,fix,
    Is this not a malicious hack for php. A server sends this out shouldn’t the server be help responsible?

    By default I believe domains are protected within the framework of the www.

    I am touching the tip of the problem right now. This is a huge topic and will take a long time to get some real answers. Was it not a screen scrape/malicious hack back in the 80’s that brought forth the 128 encrypt instead of the forty. greek39


    I believe Dominique, Chatmaster and Janet all have some very valid points. Lets try and keep the thread going. In the coming weeks, hopefully we all gain some interesting knowledge on the subject. greek39



    arguing about what to label black hat, and what to label white hat, and what grey hat is an old debate, and there will never be an answer. Besides, as technology changes, the definitions change necessarily.

    But it really doesn’t matter what we call it, like they say, a rose by any other name is still a rose.

    All this theory is good and fine and we can go on and discuss internet philosopy til the cows come home.

    Bottom line remains: If I catch a burglar in my back yard, I will tell him to drop the loot. If he doesn’t do it, I bite. He can claim all day long that the tomatoes he stole off my vine are god’s gift to human kind and I have no right to claim ownership. They are mine and I intend to keep them.

    That said, yes, I can see white hat use for black hat tools. Apparently they are able to identify the majoriy of my SERPS which for me is too time consuming a task to perform.

    I am about as white hat as they come. I design things with the visitor in mind only, and it has served me well. That includes linking.

    Just had a quick check on google, using the names of sites without spaces and their tld extension, who have rogued 888 and it appears many have been hit by these sites with .pl and .info domain names.

    Sites affected and which have subsequently accrued these thousands of backlinks are:

    This is definately a crude attempt at google bowling our sites and there are many more being hit in addition to those four sites listed above.

    yep, that is what it is. And it is malicious and I expect 888 to reel this person in.

    It makes it look like 888 would like to start a blackhat/whitehat war.

    That is not the way to solve things.

    And you can’t google bowl white hat authority sites. You can only piss them off.

    There is a difference between this and normal blackhat, which I consider to be a business model, albeit not one I condone or want to be apart of.

    What is happening here is intentionally malicious, and not profit oriented.


    Interesting thread…

    I have never and will never scrape content, but I want to expand on a few points. Just want to say that I don’t agree with them, but this is the reality of the situation.

    The first one is scraping a snippet from another site. I have asked a copyright lawyer friend of mine and apparently it is only copyright theft if a “substancial” part of the page is scraped. I am not sure about the definition of substancial and it will probably be argued in court, but he did say that 1 or 2 sentences does not amount to substantial.

    Another point is that if somebody scrapes a SE, they don’t scrape your site. The SE shows your meta tags, which is really public property as this is what you want displayed on other websites. It will be VERY difficult to get a legal judgment against somebody scraping a SE.

    I have been scraped as well and it is infuriating. We have to find a way to stop it, I just don’t know how (as the law is not really helping)!!


    Oh this is a great topic!

    Scrapers are the scum of the earth.. They’ve caused me more hassle in the past than I’m prepared to put up with. Two of my best performing pages went the way of the dodo in Google. Totally gone, no reason, bar scumbag scrapers copying my pages. Got one of them removed after hassling their ISP/webhost.

    Posted two threads on it on WMW: (waiting for it to be approved)


    Well Janet; you expected it so here it comes. You are full of shit! :flamer:


    Lets all try and get something useful out of this thread. I don’t want this thread to disappear, because there is a lot more too come. I am taking the weekend off we are finally getting some deciet weather. But will be back on Monday I should have recovered by then, I hope.



    Greek39, this subject has been discussed in this exact same context dozens upon dozens of times over the years; nothing new in this thread that wasnt mentioned in past threads, in past years.

    If someone claims that since they are only assembling SERP content and not taking it from my site its not stealing; and that since they are only using a few lines of my content that its perfectly ok, then its my opinion that they are full of shit.

    The search engines use my sites name, title, description, metas, and content. In return they provide me with a link and the luxury of being included in the SERPS when one of the millions of users does a search for something related to my site/content etc.

    The blackhatter assembling those SERPS takes my sites name, title, description, metas, and content; and then alters the link to my site to direct it to his/her benefit. I get nothing in return, but this piece of shit gets to benefit off of my content. Nope thats not ok with me.

    This has zero, nada, nothing to do with SEO or that some are better at it than others. It has everything to do with using someone elses content without their permission or without credit being given (in fairuse applications.) To claim that this is all about SEO and that what the scrapers is doing is not against the law, is disgraceful.


    This is a fascinating topic.

    We have lumped together scraping in its entirety, even though the nature of the scrapings are quite varied.

    I don’t believe Janet is saying it’s ethical, only that in the eyes of the law, taking a couple sentences from a site is not likely to hold up as copyright theft in court or in a DCMA complaint.

    I have noticed that a lot of ‘scraper’ sites are scraping search engine results and are technically not taking content directly from other sites, but from the search engines. My concern with this is that the search engines don’t get confused between their own caches of site content and the original site content when handing out SE brownie points and penalties. However, when another site takes the same content and publishes it, the search engines do get confused and it may hand out a dup content pentalty to one of the sites, sometimes the original site.


    Axl, this time will be a little different, I hope. I am seeking solutions not problems. I don’t make any promises but we will see who is up to what. greek39



    if someone has done something the hurt you or your property and you can show it with hard evidence that he is the one who did it, and a specific damage was coused to you directly by his activities and you can quontitae it to $$$$, then go after him, take a lawyer and get it back.

    what the scrappers in my mind are doing can be compared right now in the legal system to something like tax planning.

    yep, it is not socially nice to do tax planning in order to pay less taxes since you are hurting the community by contributing less cash to be spent for community needs, but as the court ruled, tax planning is legit , but some people it may look like a “trick” and not cocially accepted by again what acceprted and whats not is very subjective, this is why we have laws !
    now if the law says that it is Ok to do tax planning, the fact that other people do not like people who do tax planning does not make it ligally wrong or a reason to get this guy in court, as long as the tax planing is done according to the law, it is OK

    now, you are debating if taking a few sentences from the site is legit or not, this is the 1st discussion here, from what I know (and again I do not ancourage people to do it or support it) it is OK, and by saying OK i mean the law allows it

    now the 2nd thing that you are raising is that it hurts your site and income, now this is new, if you can PROVE that it is hurt due to this and actually show how and what are the losses coused due to this , then take a lawyer
    IMHO, no one can prove it since no one really knows how google work or the other SEs, it is all speculation, I would love if you would take it to court, get google on the stand and have them tell you how exactly that affect the ranking, you will do a HUGE service for the entire SEO community, but realisticly it will not happen, more tehn that, the scrapper can say by by providing you links from his pages he actually contributed to your rank, now this has more support within the google original published paper.

    if you think nothing new was said in this thread that was not said before you are WRONG, I read almost every post here for the last few years and I think you will find in this thread information that was not discussed before, starting from the 1st post in the thread.
    if you want to kill this thread becouse you do not want people to contredict you and doing that by flaming and running people off, it is also ok and a legit strategy,but i think that I at least will surly continue this discussion with many members here on a PM bases like we did in the past.
    it is up to the moderators here what type of discussion is being conducted here


    This is a hot topic and people will get heated.

    It is also an important topic and I want this thread to stay. If it gets out of hand I will just move it to free for all.

    Everyone’s opinon is welcome and counts.

    Lets avoid personal insults though.

    Now I need to follow my own rules. nut.gif


    Scraping = Bottom Feeding

    I love it when the scumbags steal my content… Its like stealing crumbs from under my table… Its just too bad I can’t pat them on the head or scratch them behind the ear while they are licking up the crumbs… good boy:)

    I have blacklisted 888 and my sites have been TARGETED by a 888 moron that thinks stealing some words off my pages is going to hurt my site in the google SERP… lol

    Spotted and blocked your bot moron and spotted your so called ‘cloaked’ pages (who wrote that script anyway, some retard?), try to do a better job of hiding it next time, it was no fun finding it as it was far too easy. Try to make it a challenge next time… if you can. :) Maybe if you spend some more money on new and better scripts… lol

    I just spent several hours tracking down a large network of domains… Now I wonder what I am going to do with this very large list… I am in a mood to play around a little and I feel like causing some damage… I wonder what their security is like…. :devil:

    Mark my words… The day 888 casino (and ALL its little scraping buddies) vanishes from googles SERP is getting very close! Either 888 can dump these SOBs that are scraping or they can get dumped…


    I would like everybody to view this thread as an opportunity to learn something from somebody who is very knowledgable about SEO – Janet.

    His thoughts on the scraping issue, in my opinion, can give us some valuable insight into something that affects everybody in one way or the other.

    Arguing and reiterating how terrible scraping is isn’t adding to the discussion at all. Nobody disagrees that scraping is unethical and a huge wart on any profitable online industry.

    Janet wrote:
    IMHO, I think they are the most honest online casino affiliate program we have worked with and the sequence of events that led to their banning by members here is simply a miss communication fueled by some competitors interested parties to lead to this situation, but I do not want to get into this discussion or try to get 888 out of the hole they are in, in time they will clear it up and this is not the subject of this thread so let’s not go there.

    Technically, this is exactly the topic of your post Janet. This sentence should of started your post, it would have directed readers towards where you are deriving your discussion points from much more effectively.

    Good read still, although I disagree.

