All About Content

Typo-Squatting Comes with Money-Back Guarantee

Posted by Melanie Phung on Sunday, April 30, 2006 at 11:59 am

An article in today’s Washington Post business section, titled The Web’s Million-Dollar Typos, highlights the practice of domain parking by people counting on Internet users misspelling their destination URL. This tactic is known as typo squatting and can be a very lucrative business, because according to industry analysts approximately 15% of all web traffic comes from people typing a domain into the address bar — as opposed to search or bookmarks.

The Business of Typo-Squatting
Simply buy an unregistered domain that is similar in spelling of a major brand, put some Google AdSense or Yahoo ads on it, and share in all that “contextual” advertising revenue. This works because most web users will click on the ads to get to their intended destination, rather than retype the domain or do a search to locate the correct URL.

And there is very low risk for the ambitious typo-squatting entrepreneur, as the Washington Post explains:

Because purchasers can change their minds within five days and avoid paying the $6 registration fee for the name, many investors enter the names in Google’s ad program for a quick test and quickly drop those that don’t yield enough clicks to cover the domain registration fee.

But others, including those who speculate on potential traffic of a specific domain name, argue that the pages are helping people find information related to what they’re looking for. Typo-squatters and those who provide services to support them claim these pages benefit the customer by making unused pages “function as alternatives to search engines.”

The Losers and Winners
This is a huge problem for those of us who participate in paid search. Every one of those clicks costs the advertiser money. For small business owners who pay a couple of cents per click, as well as for larger businesses bidding one or two dollars for competitive brand terms, that adds up quickly. Basically you are forced to pay money for clicks you are almost guaranteed to get for free in organic search (a search on your brand name).

Result: higher ad budgets (and lower ROI) and brand dilution.

As search professionals, I suppose, we could try to educate consumers about this type of web spam, but there’s no incentive for them to refrain from clicking on those ads. After all, one click gets them where they wanted to go.

It works for the consumer, and it works for the search engine since they (as the owners of the ad network) get a cut of that ad spend. Because Google and Yahoo benefit from the advertising revenue there is a disincentive to pull the plug on this practice across the board. Even when they don their “what’s in the best interest of the user” hats, removing these pages is still not a pressing issue. Sure, they clutter up the Internet with low-quality content, but like I said before, the argument would go, if someone mistypes a URL and then is presented with a link to the site they actually wanted to get to… well, isn’t that better than just landing on a 404 error page?

But as the recent click fraud lawsuits against Yahoo and Google show, publishers are starting to get fed up with those companies turning a blind eye to practices that benefit the search engines at the expense of advertisers. It would be wise for owners of ad publishing networks, especially ones focused just on parked domains, to try to reestablish the goodwill of legitimate content publishers and help unclutter the World Wide Web by cracking down on typo-squatting. Not because it’s in their immediate financial interest, but because it’s the right thing to do.

You Can Hide from Googlebot…

Posted by Melanie Phung on Sunday, April 30, 2006 at 12:24 am

… but you can’t hide from Google’s bots. Google has confirmed that it’s using multiple spiders to feed crawl results into Bigdaddy. Specifically mentioned is that the AdSense mediapartners bot (a.k.a. mediabot) is caching pages for the natural search index.

Jenstar points out:

It could definitely be used as a tool to detect when content is being cloaked for either the Google or AdSense bot, particularly since the mediapartners bot has been indexing pages since at least the beginning of February.

Who knows how many other-named spiders Google has doing recon like this. Definitely would make the old IP cloaking black hat trick a little trickier.

More info on the crawl caching proxy on Matt Cutts’s site.

State of the Blogosphere: Q1 2006

Posted by Melanie Phung on Saturday, April 29, 2006 at 3:14 pm

Dave Sifry’s overview of the world of blogs includes the astounding estimate that the number of blogs has doubled nearly every 6 months for the last 3 years, meaning the Blogosphere is 60 times bigger than it was in the spring of 2003. The current growth rate is one new blog per second, with 55% (19.4 million) of those blogs having a lifespan of more than 3 months.

Sifry’s next “State of the Blogosphere” post promises to talk about the tagging phenomenon. Given how large the blog world has grown, one would really expect it to somewhat spontaneously organize itself somehow. Tagging, of course, is one way to organize content - by topic. And I suppose awards like The Webbies pull together the highest quality sites from a range of categories, at least at the elite level — you’ll be able to peruse the latest crop of Webby Award winners when they are announced May 9 — but that’s not really self-organization.

Since I don’t believe I blogged about some of the new features Technorati released a while back, this would be a good time to point out that you can sort your Technorati searches by levels of “authority.” Your blogs authority is measured by…. you guessed it: how many other posts link to yours.

Think of it like the New York Times Best Sellers List. The more attention you get, the more attention people think you deserve.

(One day soon there will also be the equivalent of an Oprah’s Book Club for blogs … a way to make blogs even more accessible to those who like their hands held through their media consumption.)

An Unhealthy Fascination With del.icio.us

Posted by Melanie Phung on Friday, April 28, 2006 at 8:32 pm

Haven’t had a lot of time to update the blog in a while, but I am still keeping my SprayOnSalt page up to date with search engine news I find worth reading. This list of links also appears in the navigation column of the blog’s homepage.

Other things that might be worth an update: I’m still fascinated by that issue I was having with del.icio.us pages getting first page rankings for what seemed like not entirely un-competitive words. Basically these pages were getting ranked based on off-page factors (i.e., inbound links) and possibly the URL (file name). But file names are not believed to be hugely important relative to the most focused-on factors like title tags, on-page copy, links, etc.

That leaves links. While there is a lot of cross-linking inside del.icio.us, all the pages use the “nofollow” robots meta tag, as well as “noindex” — so the only links that should be counting are external links, with the pages not getting any boost in link popularity from internal links. Given that it seems unlikely that many people would link to a tag page inside del.icio.us, those few links must be pretty valuable. (A quick search using the “link:” operator confirms there aren’t many links coming in.)

That’s my hypothesis. So I’m playing around to see if I can significantly manipulate the ranking of these “URL only” listings by just pointing a couple of links to them (for example, to a page like http://del.icio.us/tag/technorati) but using entirely irrelevant anchor text. So far I’ve only tried it with del.icio.us pages that are already ranking, but early indication is that those links move up in the results based on only a couple of IBLs.

I’m also creating new tags that don’t exist yet to see if I can get a new del.icio.us tag page ranking out of nothing. Shouldn’t be hard, because if people aren’t already tagging a word or phrase, it probably isn’t a very competitive one… like, say, oh… http://del.icio.us/tag/melanie+phung.

So that’s one of the things I’m playing around with. Will post an update if I turn anything up.

comScore Releases March Market Share Stats

Posted by Melanie Phung on Wednesday, April 19, 2006 at 7:08 am

comScore’s latest release shows that the number of search queries rose and, once again, Google not only leads the pack but takes market share from its competitors.

Other highlights:

  • Americans conducted 6.4 billion searches online in March, up 10% from last month and 15% from last year. The increase in search queries from the previous month marked the largest gain over the past 12 months.
  • Google Sites led the pack with 2.7 billion search queries performed, followed by Yahoo Sites (1.8 billion), MSN-Microsoft (849 million), Time-Warner Network (486 million), and Ask Jeeves/Ask Network (376 million).
  • The toolbar search market continues to be dominated by Google and Yahoo, which combined for more than 95% of toolbar searches in March. Google led the way with 48.9%, while Yahoo captured 46.5%.

Look! I’m Meta-Meta-Blogging

Posted by Melanie Phung on Saturday, April 15, 2006 at 9:24 pm

Just wanted to point out that although the subtitle of this blog describes All About Content as “a site about interesting things going on in the world of Internet search,” it’s often just a site about interesting things going on in the world of Internet search as it pertains to this blog.

In other words, I’m blogging about how well my blog, which is largely about blogging, is doing in terms of organic search (or not). Is this then an instance of meta-meta-meta-blogging? Have I just made history? Or am I just a pretentious, self-obsessed indulgent blogger who hasn’t gotten the memo that pomo is long dead? Is this very post itself nothing more than pomobabble?

Isn’t it del.icio.us?

Ad Spending Up for ‘Alternative Media’

Posted by Melanie Phung on Saturday, April 15, 2006 at 7:51 pm

WebProNews reports on a study from the Center for Media Research about the growth of advertising in user-generated media:

Blog ads, podcast ads, RSS ads, oh my, it could be a $50 million dollar
market by the time 2006 comes to an end, well over the $20.4 million
spent for advertising on those syndication methods in 2005.

Other findings highlighted by the Center For Media Research:

  • Blog advertising comprised 81.4%, or $16.6 million, of total spending on user-generated online media in 2005; blog ads will reach $300.4 million, but only account for 39.7%, of overall spending in 2010
  • Advertising networks ($8.0 million) and click-throughs ($7.8 million) are the largest ad insertion methods
  • Total spending on user-generated online media is forecast to grow at a compound annual rate of 106.1%, to reach $757.0 million, by 2010
  • The media industry spent $3.2 million on advertising in user-generated media in 2005

The report points out that while user-generated media is (by which I mean “are”) a great opportunity to reach a younger and more engaged audience, advertisers still face many challenges that need to be resolved before these media are fully deployable and measurable.

Google’s Interpretation of ‘noindex’

Posted by Melanie Phung on Tuesday, April 11, 2006 at 11:16 am

I asked around regarding my observation that Google is displaying pages in results even if they use the robots noindex meta-tag, and someone pointed me toward Matt Cutts’ March 17 blog post titled Googlebot Keep Out:

You might wonder why Google will sometimes return an uncrawled url reference, even if Googlebot was forbidden from crawling that url by a robots.txt file. There’s a pretty good reason for that: back when I started at Google in 2000, several useful websites (eBay, the New York Times, the California DMV) had robots.txt files that forbade any page fetches whatsoever. Now I ask you, what are we supposed to return as a search result when someone does the query [california dmv]? We’d look pretty sad if we didn’t return www.dmv.ca.gov as the first result. But remember: we weren’t allowed to fetch pages from www.dmv.ca.gov at that point. The solution was to show the uncrawled link when we had a high level of confidence that it was the correct link. Sometimes we could even pull a description from the Open Directory Project, so that we could give a lot of info to users even without fetching the page. I’ve fielded questions about Nissan, Metallica, and the Library of Congress where someone believed that Google had crawled a page when in fact it hadn’t; a robots.txt forbade us from crawling, but Google was able to show enough information that someone assumed the page had been crawled. Happily, most major websites (including all the ones I’ve mentioned so far) let Google into more of their pages these days.

That makes great sense in theory, but what Google is telling users is that it thinks, it’s guessing, that this page which it hasn’t even looked at is very relevant… and not just very relevant but more relevant than all the other pages it has actually indexed. It’s one thing if they dig down deep on searches that don’t yield very many results, but to list these types of pages on the first page of results on searches that have tens of thousands (or more) results is just odd.

Nevermind that one would think a “noindex” robots meta tag means the search engine wouldn’t index the URL (not just that it wouldn’t index the page’s content). Okay, so the page will still show up in the index. And while Googlebot didn’t technically crawl the page, it will go ahead and return it in results based on… on…? keywords in the URL? What?

I’m not sure what the take-away is here (because this makes no sense); except that if you have a webpage that you don’t want users to find, don’t rely on robots exclusions to keep your page from showing up in the results (well, actually, don’t post anything you wouldn’t want people to find on the Internet).

Updated June 19: Looks like Google is indeed indexing pages that are tagged “no follow.” See this recent Webmaster World discussion.

Google Not Honoring ‘noindex’?

Posted by Melanie Phung on Monday, April 10, 2006 at 8:25 pm

Can anyone tell me what’s wrong with this picture?

It’s what appears on the first page of Google results today if you do a search on the term “technorati.” It’s a link to the del.icio.us page of items tagged “technorati.” But del.icio.us pages all use < name="robots" content="noarchive,nofollow,noindex">. In other words, the robots instructions on the page tell the search engines not to index the page! Noindex means it shouldn’t show up in search results.

What gives? Has Google started ignoring noindex?

Two thoughts: 1) Get ready for some aggressive del.icio.us tag spamming, and 2) how do we avoid getting in trouble for duplicate content if we can’t keep Google from indexing dupe pages using the standard robots exclusion?

Update: This question was answered in my subsequent post, Google’s Interpretation of ‘noindex’.

Updated June 19: Looks like Google is indeed indexing pages that are tagged “no follow.” See this recent Webmaster World discussion.

Google Base Results Integrated

Posted by Melanie Phung on Sunday, April 9, 2006 at 9:25 am

Search for apartments for rent in Google and you might notice an option to refine your search - with a link to Google Base. Google’s integration of real estate content from Google Base is being called a threat to the classifieds space.

Just for giggles, I added a listing for my blog to Google Base on Tuesday. The Internet being the wonderful censor-less medium it is, I editorialized a bit. Lo and behold, just three days later, this shows up on the first page of results when I did an ego search in Google:


Unexpectedly, the same result also made its way into Alta Vista results.


(Except for this Google Base listing, Alta Vista’s results look remarkably similar to Yahoo’s.)

Recent Posts
Recent Comments