Get exclusive CAP network offers from top brands

View CAP Offers

Preventative site scraping method being tested

[bsa_pro_ad_space id=2]
  • This topic is empty.
Viewing 15 posts - 1 through 15 (of 17 total)
  • Author
  • #594967

    Thought this maybe the best forum to start this thread.

    Like most I’ve invested many hours of time into my content. Unfortunately I’ve fallen victim to rogue webmasters too.

    Over the last few weeks, I’ve been looking at preventative measures to combat this ever rising problem.

    At present I’m currently testing a rudimentary means by which to stop site scrapers in their tracks.

    I’ll post my findings over the next week or so.

    Although this is not a click and go program, it does require a level of programming skill, though something that I feel most webmasters could install under 30min.




    Good going!

    We certainly need protection.

    Anyone who develops a program that helps foil scraping attempts could do uite well for themselves selling it. It’s not just our community that’s fed up with that.


    Thanks Dom for the vote of confidence.

    Basically at this stage I’m just working on its rudimentary form, if testing goes ok I’ll make it available to other here at a small fee. Around the $30 buck mark for those wanting a definate price. :woo-hoo:

    I’ll keep posting updates to this thread.



    I do appreciate your efforts, but unless you have a firm understanding of the internet from A-Z their is very little that can be done.

    This stuff has been going on since the early 80’s and yet there is no solution. Webmasters can rogue sites, take income away, block IP ranges, ect.. these are solutions residing outside of the internet.

    I will regergitate what I have been saying all along, get aquainted with the internet. Ask yourself, what is the internet? how does it really function? why was it invented? what is the World wide web? what is the IGN? what is the relationship?
    Find out why the internet has exploitable holes? what is Unix? find out what happened in the early eighties with the spread of Lans, PC’s and workstations.

    How are IP’s sent through the internet? is by binary? where do malicious spiders come from?

    Afterwords sit back and think about what is really going on. The solution to the problem is malicious so are the techniques used.

    Google is not part of the internet but a server plugged into it. In additon, Google is well aware of this problem which cannot be fixed 100%.

    This is the world I know very well. But neophites never beleive it and that is part of the reason no concrete solutions will ever be found. greek39



    I’m not going to enter into a debate based on opinions. Besides my intention is to use it as a preventative, not a cure. In the same context as one uses anti-virus software, or don’t you believe in these either!

    If your post was trying to discredit me and write my dev work off as snake oil, I don’t appreciate it. Of course your entitled to your opinions as I’m to mine. However posting assumptions; especially as you don’t know me or my background, is just plain rude imo.


    As stated I appreciate your efforts, no personal attacks intended quite the opposite. I apologize if you see my post as rude that was not my intent.

    Sorry, greek39



    Didn’t mean to chew your head off, just don’t like being pigeon holed.

    apology accepted, no hard feelings :)


    thank you let’s make this a productive thread!! greek39


    Agreed, my sentiments exactly.


    I think the concept is a great idea. Especially with everyone that’s falling victim to scraping, the thought of being able to minimize your vulnerability to content thieves definitely makes it worth a shot.


    As someone who is JUST getting started in this, I’m very interested in what Wager2winUK is talking about.

    Besides posting a copy of Robert’s Rules, everything I’ve done is mine, and mine alone (and extremely modest at this stage too).

    By the way, my name’s Luke, and I’m a newbie here, so

    Hi Everybody!


    Hi Luke welcome to CAP.

    If your using a Unix box you’d know that amongst other things you can use a .htaccess file to block out IP’s. It’s also relatively easy to block known user agents too.

    The problem is that not everyone is on a Unix box. There is also the issue that some people don’t do their own coding and or don’t know how to add dynamic code or they simply don’t want the hassle of having to update signatures all the time.

    That’s where this proggy comes in. I’m thinking the best away around update issues and so forth, is to just host it remotely. For one reason I can prevent people buying one copy and giving it away to 50 others…lol

    Though in all seriousness the more important factors are db updates and people not having to screw around with code they know nothing about.

    If your wanting to lend a hand on this, I’d appreciate someone setting up a poll to find out the % of Unix to Win box users.

    Right now I can tell you that the final product will be written in php.

    That’s about all I can give away right now.



    It certainly sounds interesting at a glance. I know anything written in php formatt is highly exploitable. I like your avtar by the way, I will look in my notes and see if this is a viable option.

    Most of all I do appreciate someone who will be trying to solve this problem.



    To my knowledge there’s not too many things on the internet that are not exploitable. If someone is gung ho to find a way in, that is…

    As far as php being highly exploitable, that depends on the coder. I do however think using highly as an adjective is being a tad over dramatic.

    Thanks for your opinion inputs.

    Wager2winUK wrote:
    Hi Luke welcome to CAP.

    Though in all seriousness the more important factors are db updates and people not having to screw around with code they know nothing about.

    good reasoning….

    I’m learning code quickly,….but apparently not quickly enough given some of the hassles I’m reading about here…..

Viewing 15 posts - 1 through 15 (of 17 total)