All About Content

Spam Techniques Defined

Posted by Melanie Phung on Thursday, November 24, 2005 at 10:38 pm

The nine-page paper Web Spam Taxonomy (PDF), authored by ZoltĆ?n Gyƶngyi and Hector Garcia-Molina, was presented earlier this year. ZoltĆ?n Gyƶngyi — a grad student at Stanford — also co-authored the paper on link spam detection I blogged about earlier this month. Gyƶngyi and Garcia-Molina propose a taxonomy of current spamming techniques and define web spam as “all types of actions intended to boost ranking (either relevance, or importance, or both), without improving the true value of a page.”

That seems awfully broad to me. One perfectly ethical or legitimate SEO technique, for example, is to clean up sloppy or deprecated code. This is primarily intended to make the page easier for a search engine spider to read and has nothing to do with changing the information being presented to the user. It may also constitute a usability improvement, but does that add to the “value” of the page? Being easier to find, as a result of SEO, doesn’t make a page more valuable — that would be circular since search engine position is ideally determined by the page’s value. So technically, strictly speaking, altering code to make it W3C compliant is spamming.

The authors also claim on page 2 that “most SEOs engage in practices that we call spamming” [emphasis added]. But the techniques they go on to define are certainly not practices that I engage in, and are generally dismissed by nearly all SEOs I know of.

Those techniques include term spamming, which consists of:

  • Body spam. In this case, the spam terms are included in the document body. This spamming technique is among the simplest and most popular ones, and it is almost as old as search engines themselves.
  • Title spam. Todayā€™s search engines usually give a higher weight to terms that appear in the title of a document. Hence, it makes sense to include the spam terms in the document title.
  • Meta tag spam. The HTML meta tags that appear in the document header have always been the target of spamming. Because of the heavy spamming, search engines currently give low priority to these tags, or even ignore them completely.
  • Anchor text spam. Just as with the document title, search engines assign higher weight to anchor text terms, as they are supposed to offer a summary of the pointed document. Therefore, spam terms are sometimes included in the anchor text of the HTML hyperlinks to a page.
  • URL spam. Some search engines also break down the URL of a page into a set of terms that are used to determine the relevance of the page. To exploit this, spammers sometimes create long URLs that include sequences of spam terms.
  • Repetition of one or a few specific terms. This way, spammers achieve an increased relevance for a document with respect to a small number of query terms.
  • Dumping of a large number of unrelated terms, often even entire dictionaries. This way, spammers make a certain page relevant to many different queries. Dumping is effective against queries that include relatively rare, obscure terms: for such queries, it is probable that only a couple of pages are relevant, so even a spam page with a low relevance/ importance would appear among the top results.
  • Weaving of spam terms into copied contents. Sometimes spammers duplicate text available on the Web and insert spam terms into them at random positions.
  • Weaving is also used for dilution, i.e., to conceal some repeated spam terms within the text, so that search engine algorithms that filters out plain repetition would be misled.
  • Phrase stitching is also used by spammers to create content quickly. The idea is to glue together sentences or phrases, possibly from different sources.

Web Spam Taxonomy also discusses other spam categories and techniques including: cloaking, using CSS for keyword stuffing, link farms, exploiting PageRank of expired domains, JavaScript redirects and others. It’s a good overview of current spam techniques (most of which, however, have become much less effective at least in Google since the Bourbon and Jagger updates this year). Check it out.

Comments Off on Spam Techniques Defined

Category: Spam

No Comments

No comments yet.

Sorry, the comment form is closed at this time.