Get exclusive CAP network offers from top brands

View CAP Offers

Site Scrapping

[bsa_pro_ad_space id=2]
  • This topic is empty.
Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #589909
    Anonymous
    Inactive

    Hi What can i go about this site http://black-jack-card-counting.xyzz.com.ru/

    it has ripped off all the text from http://www.onlinegambling.com into a massive text dump.

    #671907
    Anonymous
    Inactive

    This has happened to me.

    From my understanding these guys use BOTS and there is a certain comment you can place in your header tag to prevent BOTS crawling your site and scraping content.

    I will ask around so this can be prevented in the future. BOTS will continue to exist and crawl sites and the only way we can prevent these affiliates stealing from us is taking action by changing the way we design our website(s).

    #671917
    Anonymous
    Inactive

    Look for posts by DivaG in the SEO section.

    She has posted a link to all kinds of info about some of this: http://www.amateurcoalition.com/masters/guides/programming/htaccess.html

    #672044
    Anonymous
    Inactive

    thanks will look into this

    #672146
    Anonymous
    Inactive
    Simoneaton wrote:
    This has happened to me.

    From my understanding these guys use BOTS and there is a certain comment you can place in your header tag to prevent BOTS crawling your site and scraping content.

    That’s only useful if the bots respect robots.txt exclusion protocol. Typically, a malicious bot won’t.

    Simoneaton wrote:
    I will ask around so this can be prevented in the future. BOTS will continue to exist and crawl sites and the only way we can prevent these affiliates stealing from us is taking action by changing the way we design our website(s).

    Really… designing the website differently won’t help. You can ban bots server-side by user agent, but that doesn’t help much since anyone capable of builiding a bot can surf with any user agent they wish.

    You can also ban IP addresses server-side as well, but there are easy ways around that too.

    The link posted is a really good one, and will stop a certain amount of silly stuff. There’s a large amount of still stuff that will continue.

    #673471
    Anonymous
    Inactive

    RANDOMIZE your site!!!

    Use PHP

    #673473
    Anonymous
    Inactive

    What does ‘randomize your site’ mean? I use php, but even if you ‘randomize’ by putting different content up every time the page loads, you wouldn’t be able to have it re-write your content. And if a scraper comes along and scrapes your content… It is still a duplicate, correct? Might be just one version of your site copy, but it is still out there.

    I am confused. I think.

    kw

    #673488
    Anonymous
    Inactive

    I think Yorktown means one or both of the following:

    1. Include some random items on your pages along with other content. That way you may avoid duplicate filters because the chances of your pages being EXACTLY the same are reduced. Of course you need to have a decent amount of random text to make it worthwhile. A quote or headline here or there probably won’t make a big enough difference.

    2. Use something like a thesaurus to change some of your words in the content. Sites run from datafeeds make use of this to differentiate them. Of course you need to ensure the content makes sense with each combination.

    #673494
    Anonymous
    Inactive

    Very interesting. I’ve never thought of this.

Viewing 9 posts - 1 through 9 (of 9 total)