Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Web Text Search is Hard. Image indexing is even harder. Just ask Cuil.

by · 2 Comments ·

A new search engine, Cuil, has launched, in an attempt to become the next Google. Cuil was founded by people with experience from Google, AltaVista and IBM – sufficient enough to get the mass media’s attention in the dog days of summer.

Cuil searches 121,617,892,992 web pages

Rather unfortunately Cuil decided to tout it’s index size as a primary feature. As seasoned search engine professionals know, there are many other issues which also impact quality search results. Are the indexed web documents fresh, up to date? Google is indexing some sites in just minutes:

google indexes some pages in minutes

On it’s home page, Cuil says “Search 121,617,892,992 web pages”. This number hasn’t changed in days; it seems that their index is more static than Google’s.

Have duplicate pages, such as print versions and syndicated content, been filtered out? Is there a sophisticated ranking algorithm to show the most pertinent documents to the web searcher, based on their intention?

Several years ago, after much back and forth on who had the biggest index, and Google agreed to move on to more important issues.

Yet while a new search engine challenger to the effective Google monopoly is warmly welcomed, especially since Ask.com is joining Jeeves in retirement, Cuil got off to a bad start.

But it gets worse. Cuil also spoke about a feature which “goes beyond today’s search techniques of link analysis and traffic ranking”, namely the grouping of related search results in clusters. Um, nothing new here. Teoma, now part of Ask.com, provided “Web Pages by Topic: Top result pages are grouped based on their topics

teoma search engine, now defunct

and Clusty still does.

Perform a basic search on Google and within seconds results appear, almost like magic. Yet the work behind the scenes needed to arrive at uncannily targeted results for a web search is much harder than it might seem.

A search engine has to go through a long sequence of steps which, in rather simplified form, envolves three basic parts:

  1. Discover and capture content on the web. This is what we mean by crawling the web.
  2. Process and index retrieved content.
  3. Interpret a web searcher’s intention and produce relevant results, quickly.

Text based documents formats, like html files, generally provide lots of rich information for the to work with. Their work becomes much more onerous when search engines try to decipher information in image and video files. Over the years has documented a few great examples of when Google has gotten it wrong.

The Register, a UK based online service covering the IT industry (irreverent tagline: Biting the hand that feeds IT) captured one example of where Cuil‘s large index doesn’t help it provide relevant results. In the example, site thumbnail previews do not appear to be related to the search results. Indeed, we see lads, as mamma made them, up to a bit of fun. Unfortunately, the lads are unrelated to the search result, “Jonathan Grattage Teaching”.

Seems Cuil were out to prove either that search is hard, or mine is bigger than your’s (index, that is), I’m just not sure which (the link is probably not “work-safe”, you’ve been warned).

Kudos to David Naylor (a.k.a. DaveN) and Mikkel deMib Svendsen for citing this example on their Strikepoint show. In April 2006 DaveN also pointed out a problem with a Google for the UK flag – one of the pictures was very un-flag like, trust me.

Although Cuil is initially focusing on English language search, I do see Italian language results, such as in this search for Blogbabel:

Search for Blogbabel in Cuil

My personal verdict for now: too Cuil for its own good. An intellectually dishonest press release does not make a valid search engine. But it might be just the glitter that attracts an ill-informed private equity investment company buyout specialist. See Danny Sullivan’s review for a more extensive take on Cuil.

Similar Posts:

Registration is now open for the next SEO Course (May 14 and 15) and Google Analytics Course (May 9 and 10) in Milan. Don’t miss the opportunity!

Originally published August 19th, 2008

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


2 Comments so far ↓

  • Alex

    Hi Sean – I found Cuil the other day, but was not overly impressed, and I see you are not either.

    Not sure how to pronounce the name either, and it would not read too well in Italian either!

    I sometimes use Clusty, which I still think is useful.

    Hope you are having a kewl summer,

    Alex

  • sean

    Alex, the Cuil folks say the pronunciation should be “Cool”; this seems a bit hoakey to me.

    They say the name derives from the Irish word “Cuil” but it seems that Cuil actually means fly, at least in current usage.

    Perhaps it should be Cuill or Coll?.

    I agree with you, Clusty can be very useful at times.

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).