Get exclusive CAP network offers from top brands

View CAP Offers

Microsoft Versus The Web Spammers

[bsa_pro_ad_space id=2]
  • This topic is empty.
Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #595606
    Anonymous
    Inactive

    Microsoft Research Paper – Strider Search Defender: Automatic and Systematic Discovery of Search Spammers through Non-Content Analysis

    The general idea is to use the links found on indexed webpages to identify search engine spamming and content spamming etc.

    We call our approach the Search Defender approach. It consists of two steps:

    1. Starting with a seed list of confirmed spam URLs, the Spam Hunter supplies them as search terms (or “link:” query terms) to search engines to locate the forums and guest books at which they were spammed, gathers additional URLs from each of these pages to grow the list, and does this iteratively until the list “converges”, i.e., the list no longer grows significantly after a query iteration.

    The list automatically generated from the above step is only a list of “potential” spam URLs because there can be false positives. For example, some spammed forum pages may contain earlier comments from actual users that include non-spam URLs; spammers may intentionally intersperse non-spam URLs with spam ones.

    2. To filter out false positives, we feed the list of potential spam URLs to the Strider URL Tracer (which we have previously released to help trademark owners find typo-squatting domains of their websites [5]). The tracer provides a key functionality called the Top Domain view: given a list of (primary) URLs, the tracer launches an actual browser to visit each URL and records all secondary URLs visited as a result. At the end of the batched scan, the Top Domain view provides the list of third-party domains that received secondary-URL traffic and rank them by the number of primary URLs that generated traffic to them. If the input is a list of potential spam URLs, the Top Domain view essentially highlights those target-page domains that are associated with a large number of doorway-page URLs. To further reduce false positives, we use the whitelist of legitimate ads syndicators and web-analytics servers that were heavy redirection-traffic receivers in our Strider HoneyMonkey scan of the top one million click-through URLs [6,7]. The ranked Top Domain list is then used to prioritize manual investigation. Once a third-party domain is determined to be a spammer’s domain, all doorway-page URLs associated with that domain are labeled as high-potential spam URLs.

    Our Search Defender approach has two desirable properties that naturally turn the spammers’ spamming activities against themselves:

    1. The more widely spammed a URL is, the easier it is for the spam hunter to find it. Once a spammed forum is identified, it becomes a “HoneyForum” that can be used to capture new spam URLs in new comment postings. Ideally, since there is a delay between spamming and its effect on search engine results, our spam hunter should be able to identify new spam URLs and notify the search engine before the URLs enter top search results.

    2. The more doorway pages a spammer creates, the higher priority its target-page domain is placed on the Top Domain list for investigation.

    Maybe Microsoft MSN will have more luck than Link Spam Recognition Based On Mass Estimation (pdf file) because this new Strider Search Defence method entails quickly eyeballing fast growing website pagewise, in comparison with the quality of links and content, in effect, on a league table basis of scummy.

    #699204
    Anonymous
    Inactive

    Good post joeyl ! greek39

Viewing 2 posts - 1 through 2 (of 2 total)