Get exclusive CAP network offers from top brands

View CAP Offers

Is this a scraper or just an annoying spider?

[bsa_pro_ad_space id=2]
  • This topic is empty.
Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #591330
    Anonymous
    Inactive

    As I type this, one of my new sites is being hit once every minute by the following IP address:

    64.5.245.23

    It hits a different page at exactly the same time, at 48 seconds past the minute. Here is a snippet of the hits to my site in the last 5 minutes:

    “page 1” Tue 11/29 – 11:11:48 am 64.5.245.23
    “page 2” Tue 11/29 – 11:12:48 am 64.5.245.23
    “page 3” Tue 11/29 – 11:14:48 am 64.5.245.23
    “page 4” Tue 11/29 – 11:15:48 am 64.5.245.23
    “page 5” Tue 11/29 – 11:16:48 am 64.5.245.23

    Whois shows this, but I don’t know what to do with the information:

    64.5.245.23

    Blacklist Status: Clear
    Cached Whois: Cached today
    Whois History: 5 records stored
    Oldest: 2005-11-24
    Newest: 2005-11-29
    Record Type: IP Address
    IP Location: Canada – Nova Scotia – Halifax – It Interactive
    Reverse IP: No websites hosted using this IP address
    Reverse DNS: h64-5-245-23.gtcust.grouptelecom.net


    GT Group Telecom Services Corp. GROUPTELECOM-BLK-5 (NET-64-5-192-0-1)
    64.5.192.0 – 64.5.255.255
    IT Interactive GT-64-5-245-0-CX (NET-64-5-245-0-1)
    64.5.245.0 – 64.5.245.63

    Does anyone know what is going on here? I don’t know why this thing would be crawling through every single page like this… It’s been chewing up my bandwidth for almost 18 hours now. :angry:

    Any thoughts?

    #677478
    Anonymous
    Inactive

    If it were me… I’d block the IP.

    #677479
    Anonymous
    Inactive

    Hmmm… Do you do that by adding a rule to the .htaccess file? Or to the robots.txt file?

    #677481
    Anonymous
    Inactive

    Okay, I blocked it with .htaccess, and it worked. The hits have stopped. :xmas:

    Here’s the code I added to my .htaccess file, if anyone is interested:

    order allow,deny
    deny from 64.5.245.23
    allow from all [/CODE][CODE]order allow,deny
    deny from 64.5.245.23
    allow from all [/CODE]

    #677482
    Anonymous
    Inactive

    perfect. I wasn’t sure which, but figured .htaccess would block a bot as well (if it only used 1 IP address). Anyone know for sure?

    #677523
    Anonymous
    Guest

    I was gonna say: are you sure its a bad thing before you block it?

    I don’t know crapola about all this but seems to me it’d be a shame to find out later it was a SE spider.

    … just a thought.

    #677534
    Anonymous
    Inactive

    Steve, you were right! I did a more detailed search, and it turns out that this was a spider from the GenieKnows search engine. GAAAHHH!!!

    I wonder why the spider didn’t identify itself properly in my stats, and in the Whois information? If they would just name the thing “GenieKnows Bot” or something, this whole thing wouldn’t have happened. :sarcasm:

    #677541
    Anonymous
    Inactive

    This IP spidered my site too, good to know if it is a SE bot :D

    Thanks guys!

    #677695
    Anonymous
    Guest

    Steve, you were right!

    you know what they say: even a blind squirrel finds an acorn once in a while. :)

    I’m surprised … pleasantly, to hear GenieKnows is spidering. I thought they were strictly a PPC (one which I have avoided after initially trying out).

    #679224
    Anonymous
    Inactive

    I went ahead and banned this IP. It was using 1 gig a month in bandwidth and never left my site. It seems like this spider is not programmed correctly. I hope I did the right thing, it was a waste of bandwidth though.

    #679237
    Anonymous
    Inactive
    Engineer wrote:
    If they would just name the thing “GenieKnows Bot” or something, this whole thing wouldn’t have happened. :sarcasm:

    I could name myself “GenieKnows Bot” if I liked. User agent spoofing is easy. That’s why user agent based cloaking is a bad idea.

    But, yeah, I see your point. I’d also think they’d be smart to identify their IP better. IP spoofing is a different game.

Viewing 11 posts - 1 through 11 (of 11 total)