Google Bomb Algorithm Separate From Ranking Algorithm
One of the interesting tidbits from this morning's discussion with Google's Matt Cutts has to do with how Google diffuses a "Google bomb" (such as the ones that Stephen Colbert recently pulled off with "greatest living American" and "giant brass balls").
The reason they crop up and then disappear suddenly (as opposed to never succeeding in the first place) does not include any editorial intervention -- however suspicious it may look. That's consistent with what they've always said, even when the "miserable failure" results had President Bush's bio page at the top forever.
Apparently the algorithm that sniffs out Google bombs is not built into the regular ranking algorithm; it's run separately and only once every couple of months.
Another (valuable?) insight from this conference: Vanessa Fox is a big Buffy the Vampire Slayer fan. (Updated June 15: Vanessa Fox is leaving Google!)
Labels: link-building, spam
Posted by Melanie Phung
Folksonomy Spam, a.k.a. Tag Spam
For every evolution of the Internet, new types of spam are born. Wikipedia even has a whole article series about different kinds of spam. If you aren't familiar with tagging and the ways it can be abused, learn a little about tag spam here and see a somewhat amusing example here.
Labels: social media, spam
Posted by Melanie Phung
Typo-Squatting Comes with Money-Back Guarantee
An article in today's Washington Post business section, titled The Web's Million-Dollar Typos, highlights the practice of domain parking by people counting on Internet users misspelling their destination URL. This tactic is known as typo squatting and can be a very lucrative business, because according to industry analysts approximately 15% of all web traffic comes from people typing a domain into the address bar -- as opposed to search or bookmarks.
The Business of Typo-Squatting
Simply buy an unregistered domain that is similar in spelling of a major brand, put some Google AdSense or Yahoo ads on it, and share in all that "contextual" advertising revenue. This works because most web users will click on the ads to get to their intended destination, rather than retype the domain or do a search to locate the correct URL.
And there is very low risk for the ambitious typo-squatting entrepreneur, as the Washington Post explains:
Because purchasers can change their minds within five days and avoid paying the $6 registration fee for the name, many investors enter the names in Google's ad program for a quick test and quickly drop those that don't yield enough clicks to cover the domain registration fee.
But others, including those who speculate on potential traffic of a specific domain name, argue that the pages are helping people find information related to what they're looking for. Typo-squatters and those who provide services to support them claim these pages benefit the customer by making unused pages "function as alternatives to search engines."
The Losers and Winners
This is a huge problem for those of us who participate in paid search. Every one of those clicks costs the advertiser money. For small business owners who pay a couple of cents per click, as well as for larger businesses bidding one or two dollars for competitive brand terms, that adds up quickly. Basically you are forced to pay money for clicks you are almost guaranteed to get for free in organic search (a search on your brand name).
Result: higher ad budgets (and lower ROI) and brand dilution.
As search professionals, I suppose, we could try to educate consumers about this type of web spam, but there's no incentive for them to refrain from clicking on those ads. After all, one click gets them where they wanted to go.
It works for the consumer, and it works for the search engine since they (as the owners of the ad network) get a cut of that ad spend. Because Google and Yahoo benefit from the advertising revenue there is a disincentive to pull the plug on this practice across the board. Even when they don their "what's in the best interest of the user" hats, removing these pages is still not a pressing issue. Sure, they clutter up the Internet with low-quality content, but like I said before, the argument would go, if someone mistypes a URL and then is presented with a link to the site they actually wanted to get to... well, isn't that better than just landing on a 404 error page?
But as the recent click fraud lawsuits against Yahoo and Google show, publishers are starting to get fed up with those companies turning a blind eye to practices that benefit the search engines at the expense of advertisers. It would be wise for owners of ad publishing networks, especially ones focused just on parked domains, to try to reestablish the goodwill of legitimate content publishers and help unclutter the World Wide Web by cracking down on typo-squatting. Not because it's in their immediate financial interest, but because it's the right thing to do.
Labels: contextual ads, monetizing, navel-gazing, spam
Posted by Melanie Phung
Spam Techniques Defined
The nine-page paper Web Spam Taxonomy (PDF), authored by Zoltán Gyöngyi and Hector Garcia-Molina, was presented earlier this year. Zoltán Gyöngyi -- a grad student at Stanford -- also co-authored the paper on link spam detection I blogged about earlier this month. Gyöngyi and Garcia-Molina propose a taxonomy of current spamming techniques and define web spam as "all types of actions intended to boost ranking (either relevance, or importance, or both), without improving the true value of a page."
That seems awfully broad to me. One perfectly ethical or legitimate SEO technique, for example, is to clean up sloppy or deprecated code. This is primarily intended to make the page easier for a search engine spider to read and has nothing to do with changing the information being presented to the user. It may also constitute a usability improvement, but does that add to the "value" of the page? Being easier to find, as a result of SEO, doesn't make a page more valuable -- that would be circular since search engine position is ideally determined by the page's value. So technically, strictly speaking, altering code to make it W3C compliant is spamming.
The authors also claim on page 2 that "most SEOs engage in practices that we call spamming" [emphasis added]. But the techniques they go on to define are certainly not practices that I engage in, and are generally dismissed by nearly all SEOs I know of.
Those techniques include term spamming, which consists of:
- Body spam. In this case, the spam terms are included in the document body. This spamming technique is among the simplest and most popular ones, and it is almost as old as search engines themselves.
- Title spam. Today’s search engines usually give a higher weight to terms that appear in the title of a document. Hence, it makes sense to include the spam terms in the document title.
- Meta tag spam. The HTML meta tags that appear in the document header have always been the target of spamming. Because of the heavy spamming, search engines currently give low priority to these tags, or even ignore them completely.
- Anchor text spam. Just as with the document title, search engines assign higher weight to anchor text terms, as they are supposed to offer a summary of the pointed document. Therefore, spam terms are sometimes included in the anchor text of the HTML hyperlinks to a page.
- URL spam. Some search engines also break down the URL of a page into a set of terms that are used to determine the relevance of the page. To exploit this, spammers sometimes create long URLs that include sequences of spam terms.
- Repetition of one or a few specific terms. This way, spammers achieve an increased relevance for a document with respect to a small number of query terms.
- Dumping of a large number of unrelated terms, often even entire dictionaries. This way, spammers make a certain page relevant to many different queries. Dumping is effective against queries that include relatively rare, obscure terms: for such queries, it is probable that only a couple of pages are relevant, so even a spam page with a low relevance/ importance would appear among the top results.
- Weaving of spam terms into copied contents. Sometimes spammers duplicate text available on the Web and insert spam terms into them at random positions.
- Weaving is also used for dilution, i.e., to conceal some repeated spam terms within the text, so that search engine algorithms that filters out plain repetition would be misled.
- Phrase stitching is also used by spammers to create content quickly. The idea is to glue together sentences or phrases, possibly from different sources.
Labels: spam
Posted by Melanie Phung
The Splog Police
Here's a guy with way too much time on his hands:
http://fightsplog.blogspot.com/
Labels: spam
Posted by Melanie Phung
Adding Value to the Web (Not!)
This is the kind of thing that gives people the idea that Internet marketers are slimy: an auto blog generator.
The blog generator site pitches to owners of the kinds of spam blogs -- AdSense splogs -- I explained in my October 20 entry, Say It 5 Times Fast: Blogspot Splog Bomb
There's a second type of splog I didn't mention. This splog is a subset of sites that don't target searchers directly, but rather are created as a way to artificially inflate the number of hyperlinks pointing to another site, the target site.
That's because search engine algorithms favor sites that have a lot of links pointing to them. Inbound links are considered a sort of "vote" for the target site. However, links are not all worth the same and the less authoritative the page the link is coming from, the less value that link has. So this type of web spam, like the ad-based splogs, also requires the creation of a lot of sites (a brute force strategy) to accomplish the spammers' goals.
Creating fake sites solely for the purpose of improving the popularity of other sites -- which themselves can be legitimate, content-rich sites -- is obviously not something of which the search engines approve and a strategy that has been hit hard by recent algorithm updates.
Read more about the importance of links in Link Popularity Explained.
Labels: spam
Posted by Melanie Phung
State of the Blogosphere
Superb overview of the current blog landscape by Technorati's Dave Sifry:
http://www.technorati.com/weblog/2005/10/53.html.
Among the data he shares:
- The total number of weblogs tracked continues to double about every 5 months
- The blogosphere is now over 30 times as big as it was 3 years ago, with no signs of letup in growth
- About 70,000 new weblogs are created every day
- About a new weblog is created each second
- 2% - 8% of new weblogs per day are fake or spam weblogs
- Between 700,000 and 1.3 million posts are made each day
- About 33,000 posts are created per hour, or 9.2 posts per second
- An additional 5.8% of posts (or about 50,000 posts/day) seen each day are from spam or fake blogs, on average
Posted by Melanie Phung
Say It 5 times Fast: Blogspot Splog Bomb
Big to-do in Blogland this week. Some are saying Google has allowed the criminals to take over the asylum. Seems someone was crafty enough to find a way to automate the creation of thousands of blogs on Blogspot aka Blogger (a free blog creation and hosting site owned by Google). Those blogs then steal content from elsewhere on the web to lure users to the site and then bombard them with Google AdSense ads.
Since the blogs have no real content and are designed just to make money via commission on the ads, they're called spam blogs (= splogs). The result? Lotsa (more) crap on the Internet.
Not familiar with splogs? I dare you to click the "next" button on the upper right of this page to get sent to a different blog. Odds are that if you do this a couple of times, you'll find some pretty obvious examples. Unless Google has cleaned up the mess already.
Recourse
Matt Cutts - a V.V.I.P. over at Google - gives this tip on spotting and reporting splogs:
You see a low-quality site that is running AdSense. If you run across a site that you consider spammy and it has AdSense on it, click on the "Ads by Goooooogle" link and click "Send Google your thoughts on the ads you just saw". Enter the words spamreport and jagger1 in the comments field.
Updated Oct. 21
Seems like a lot of the spam has been cleared out so it's not as easy to find. I'm linking to an example so you can see what blog spam might look like.
Posted by Melanie Phung


Quote of the Week: Hannibal Lecter's Guide to Link Building
Quote of the Week from Eric Ward's article on so-called "best practices" in link building:
Labels: link-building, quote-of-the-week, spam
Posted by Melanie Phung