All About Content

Typo-Squatting Comes with Money-Back Guarantee

Posted by Melanie Phung on Sunday, April 30, 2006 at 11:59 am

An article in today’s Washington Post business section, titled The Web’s Million-Dollar Typos, highlights the practice of domain parking by people counting on Internet users misspelling their destination URL. This tactic is known as typo squatting and can be a very lucrative business, because according to industry analysts approximately 15% of all web traffic comes from people typing a domain into the address bar — as opposed to search or bookmarks.

The Business of Typo-Squatting
Simply buy an unregistered domain that is similar in spelling of a major brand, put some Google AdSense or Yahoo ads on it, and share in all that “contextual” advertising revenue. This works because most web users will click on the ads to get to their intended destination, rather than retype the domain or do a search to locate the correct URL.

And there is very low risk for the ambitious typo-squatting entrepreneur, as the Washington Post explains:

Because purchasers can change their minds within five days and avoid paying the $6 registration fee for the name, many investors enter the names in Google’s ad program for a quick test and quickly drop those that don’t yield enough clicks to cover the domain registration fee.

But others, including those who speculate on potential traffic of a specific domain name, argue that the pages are helping people find information related to what they’re looking for. Typo-squatters and those who provide services to support them claim these pages benefit the customer by making unused pages “function as alternatives to search engines.”

The Losers and Winners
This is a huge problem for those of us who participate in paid search. Every one of those clicks costs the advertiser money. For small business owners who pay a couple of cents per click, as well as for larger businesses bidding one or two dollars for competitive brand terms, that adds up quickly. Basically you are forced to pay money for clicks you are almost guaranteed to get for free in organic search (a search on your brand name).

Result: higher ad budgets (and lower ROI) and brand dilution.

As search professionals, I suppose, we could try to educate consumers about this type of web spam, but there’s no incentive for them to refrain from clicking on those ads. After all, one click gets them where they wanted to go.

It works for the consumer, and it works for the search engine since they (as the owners of the ad network) get a cut of that ad spend. Because Google and Yahoo benefit from the advertising revenue there is a disincentive to pull the plug on this practice across the board. Even when they don their “what’s in the best interest of the user” hats, removing these pages is still not a pressing issue. Sure, they clutter up the Internet with low-quality content, but like I said before, the argument would go, if someone mistypes a URL and then is presented with a link to the site they actually wanted to get to… well, isn’t that better than just landing on a 404 error page?

But as the recent click fraud lawsuits against Yahoo and Google show, publishers are starting to get fed up with those companies turning a blind eye to practices that benefit the search engines at the expense of advertisers. It would be wise for owners of ad publishing networks, especially ones focused just on parked domains, to try to reestablish the goodwill of legitimate content publishers and help unclutter the World Wide Web by cracking down on typo-squatting. Not because it’s in their immediate financial interest, but because it’s the right thing to do.

Comments Off on Typo-Squatting Comes with Money-Back Guarantee

Category: Contextual Ads,Monetizing,Navel-Gazing,Spam

Spam Techniques Defined

Posted by Melanie Phung on Thursday, November 24, 2005 at 10:38 pm

The nine-page paper Web Spam Taxonomy (PDF), authored by ZoltĆ?n Gyƶngyi and Hector Garcia-Molina, was presented earlier this year. ZoltĆ?n Gyƶngyi — a grad student at Stanford — also co-authored the paper on link spam detection I blogged about earlier this month. Gyƶngyi and Garcia-Molina propose a taxonomy of current spamming techniques and define web spam as “all types of actions intended to boost ranking (either relevance, or importance, or both), without improving the true value of a page.”

That seems awfully broad to me. One perfectly ethical or legitimate SEO technique, for example, is to clean up sloppy or deprecated code. This is primarily intended to make the page easier for a search engine spider to read and has nothing to do with changing the information being presented to the user. It may also constitute a usability improvement, but does that add to the “value” of the page? Being easier to find, as a result of SEO, doesn’t make a page more valuable — that would be circular since search engine position is ideally determined by the page’s value. So technically, strictly speaking, altering code to make it W3C compliant is spamming.

The authors also claim on page 2 that “most SEOs engage in practices that we call spamming” [emphasis added]. But the techniques they go on to define are certainly not practices that I engage in, and are generally dismissed by nearly all SEOs I know of.

Those techniques include term spamming, which consists of:

  • Body spam. In this case, the spam terms are included in the document body. This spamming technique is among the simplest and most popular ones, and it is almost as old as search engines themselves.
  • Title spam. Todayā€™s search engines usually give a higher weight to terms that appear in the title of a document. Hence, it makes sense to include the spam terms in the document title.
  • Meta tag spam. The HTML meta tags that appear in the document header have always been the target of spamming. Because of the heavy spamming, search engines currently give low priority to these tags, or even ignore them completely.
  • Anchor text spam. Just as with the document title, search engines assign higher weight to anchor text terms, as they are supposed to offer a summary of the pointed document. Therefore, spam terms are sometimes included in the anchor text of the HTML hyperlinks to a page.
  • URL spam. Some search engines also break down the URL of a page into a set of terms that are used to determine the relevance of the page. To exploit this, spammers sometimes create long URLs that include sequences of spam terms.
  • Repetition of one or a few specific terms. This way, spammers achieve an increased relevance for a document with respect to a small number of query terms.
  • Dumping of a large number of unrelated terms, often even entire dictionaries. This way, spammers make a certain page relevant to many different queries. Dumping is effective against queries that include relatively rare, obscure terms: for such queries, it is probable that only a couple of pages are relevant, so even a spam page with a low relevance/ importance would appear among the top results.
  • Weaving of spam terms into copied contents. Sometimes spammers duplicate text available on the Web and insert spam terms into them at random positions.
  • Weaving is also used for dilution, i.e., to conceal some repeated spam terms within the text, so that search engine algorithms that filters out plain repetition would be misled.
  • Phrase stitching is also used by spammers to create content quickly. The idea is to glue together sentences or phrases, possibly from different sources.

Web Spam Taxonomy also discusses other spam categories and techniques including: cloaking, using CSS for keyword stuffing, link farms, exploiting PageRank of expired domains, JavaScript redirects and others. It’s a good overview of current spam techniques (most of which, however, have become much less effective at least in Google since the Bourbon and Jagger updates this year). Check it out.

Comments Off on Spam Techniques Defined

Category: Spam

The Splog Police

Posted by Melanie Phung on Saturday, November 12, 2005 at 11:14 pm

Here’s a guy with way too much time on his hands:

Comments Off on The Splog Police

Category: Spam

Adding Value to the Web (Not!)

Posted by Melanie Phung on Saturday, November 12, 2005 at 11:02 pm

This is the kind of thing that gives people the idea that Internet marketers are slimy: an auto blog generator.

The blog generator site pitches to owners of the kinds of spam blogs — AdSense splogs — I explained in my October 20 entry, Say It 5 Times Fast: Blogspot Splog Bomb

There’s a second type of splog I didn’t mention. This splog is a subset of sites that don’t target searchers directly, but rather are created as a way to artificially inflate the number of hyperlinks pointing to another site, the target site.

That’s because search engine algorithms favor sites that have a lot of links pointing to them. Inbound links are considered a sort of “vote” for the target site. However, links are not all worth the same and the less authoritative the page the link is coming from, the less value that link has. So this type of web spam, like the ad-based splogs, also requires the creation of a lot of sites (a brute force strategy) to accomplish the spammers’ goals.

Creating fake sites solely for the purpose of improving the popularity of other sites — which themselves can be legitimate, content-rich sites — is obviously not something of which the search engines approve and a strategy that has been hit hard by recent algorithm updates.

Read more about the importance of links in Link Popularity Explained.

Comments Off on Adding Value to the Web (Not!)

Category: Spam

State of the Blogosphere

Posted by Melanie Phung on Monday, October 24, 2005 at 12:26 pm

Superb overview of the current blog landscape by Technorati’s Dave Sifry:

Among the data he shares:

  • The total number of weblogs tracked continues to double about every 5 months
  • The blogosphere is now over 30 times as big as it was 3 years ago, with no signs of letup in growth
  • About 70,000 new weblogs are created every day
  • About a new weblog is created each second
  • 2% – 8% of new weblogs per day are fake or spam weblogs
  • Between 700,000 and 1.3 million posts are made each day
  • About 33,000 posts are created per hour, or 9.2 posts per second
  • An additional 5.8% of posts (or about 50,000 posts/day) seen each day are from spam or fake blogs, on average
Comments Off on State of the Blogosphere

Category: Blogging,Spam