There's been a lot of news this week about real-time search. Google has made a number of announcements related to a new real-time search product, and these announcements really have people talking.
Unfortunately, a lot of what's being said about Google's real-time search is confusing and misleading.
Here's our opinion, at Wowd, of recent developments -- in particular, the Google announcements.
Google has said that they will offer a section called "latest results" in a regular Google search results page. This section will automatically refresh content from sources like Twitter.
Ah, well that's a key phrase: "... sources like Twitter". What this phrase really means is, Twitter, and Facebook, and a few hand-picked blogs and news sites. That's a great list of information resources, but it does not mean that Google is now indexing the entire web in real-time.
While I haven't seen anyone from Google actually make this claim directly (that Google now indexes the entire web in real time) there have been others who have explicitly said this on their behalf.
For example, the Technology Review article "Google Takes Search Real-Time" (Mon Dec 7 2009, by David Talbot) says this: "Gradually, over the past decade, Google has compressed the gap between fresh indexing of the Web from months to mere minutes."
Sorry, but that's technically just not possible.
Google clearly has a crawler-based architecture that's able to make a sweep of the web in some amount of time. How much time is required for Google to index the entire web? It's a period of time that's best measured in weeks or months, not seconds or minutes.
Clearly, Google deals with some sites specially. Google gets page content updates from its AdSense program, from its Urchin-based analytics, from its toolbar, and also from special-purpose information feeds like the one that it now receives from Twitter.
It's clear that Twitter is one source of "real-time" information. And, depending on what sorts of pages are made public through Facebook, it too can provide good real-time information. But Twitter and Facebook and a handful of specially selected blogs and news sites do not make up the entire real-time web.
Google is clearly able to refresh some parts of its index in minutes, and other parts in a much longer time frame.
Another article that's quite misleading is "Google launches real-time search" (Mon Dec 7 2009, by Tom Krazit). In this article, we get the following claim:
"Real-time search at Google involves more than just social-networking and microblogging services. While Google will get information pushed to it through deals with those companies, it also has improved its crawlers to index and display virtually any Web page as it is generated."
And is supposed to work... how? The physics of computation preclude an instantaneous crawl of all pages as they're updated or first created.
A more reasonable account of the real-time web appears in "Startups Mine the Real-Time Web" (Wed Dec 9 2009, by Erica Naone). In fact, the sub-title of this article is "There's more to it than microblog posts and social network updates."
We have two predictions involving Google's incorporation of Twitter results in traditional search results pages:
- Many people will realize that they don't want Twitter results in their normal search experience. There's a lot of chatter in tweets, and in general, if people want to search inside the content of tweets, then they know that they can do that, in real-time, at Twitter.com itself.
- Many people will realize that there's more to real-time search than just having access to Twitter and Facebook and a few news sites. A lot more.
On this second point, it's clear now that the entire web is moving to a real-time foundation. The revolution that started with Web 2.0, where those people formerly known as the audience became active participants in an open, public, high-velocity conversation, has become a global torrent of new information. That information is not localized to just Twitter and Facebook. Those two sites represent just a tiny fraction of what's new and interesting and changing in real time on the entire web.
A analysis in this vein is provided in the article "Facebook Will Be Google-able (If Your Profile is Set to Public)" (Mon Dec 7 2009, by Marshall Kirkpatrick). Marshall makes an excellent point:
"Ultimately, the real-time web remains larger than both of these sources and the newly-included MySpace partnership (also announced today). There is so much implicit real-time data online that few real-time search startups use only explicit data, like shared links, from social networks."
At Wowd, we believe that the challenges of the real-time web are two-fold:
- A credible real-time service must help people discover what's out there on the web, in real-time, before those people decide to search for something specific.
- A credible real-time service must implement ranking, in real-time, across all possible web pages, not just Tweets and Facebook pages.
To this end, Wowd offers a user experience in the real-time web that starts with real-time discovery. Using the Wowd Hot List, you can see what's popular, from pages across the entire web, including the so-called "deep web".
The Wowd Hot List changes in real-time, to reflect what real humans think about different web pages. Once you see something that's interesting to you, you can use Wowd's real-time search features to dive more deeply into this real-time, web-wide information, finding pages (again, from the entire web) that are just seconds old.
Wowd also incorporates Twitter information, but Wowd uses Twitter information as "meta-data", to help figure out what real people think are good web pages. Wowd takes tweets, determines which ones contain URLs, and then filters that set according to some re-tweet and follower criteria. (This helps to ensure that only quality tweets are considered.)
Wowd then de-references URLs in the quality tweets and indexes the pages that the tweets mention, essentially using such information as a 'vote' for the mentioned page. This small stream of human attention data is folded in to Wowd's own, proprietary, anonymous attention stream, and this combined data is used to rank pages from all over the web.
Wowd uses an incremental link-based distributed algorithm to calculate ranking. The real-time performance of this algorithm is only possible because of the distributed nature of the Wowd cloud computation engine.
The key point as far as the current discussion is concerned is that Wowd is not a search engine for looking inside the contents of tweets... if you want to do that, we suggest using Twitter's own rather good search system.
Wowd is instead a tool for discovering and then searching the entire web, in real-time, where search results are ranked according to what real human beings think is good. Twitter is just one (small) input to that overall calculation.
However one ranks search results, either according to popularity or according to freshness, one must take into account relevance. It's not enough to simply match a search term against a page or a tweet or some other information source, and to return the page or tweet if that term is found. That approach breaks down quickly.
What's needed is a method of doing ranking in a way that takes into account what a user wants (the most popular information for a given search term, or the freshest), but that also pays attention to how relevant any given source is for the given term. If you'd like to read more about the Wowd approach to this problem, please take a look at Boris Agapiev's recent blog posting.
If you'd like to see what a real-time web discovery and search engine looks like, please visit us at www.wowd.com. See what's gaining in popularity on the Wowd Hot List, and try a search and sort your results by Freshness.
Not to pick on Tiger Woods in his hour (or many months) of personal difficulty, but you might try a real-time, web-wide freshness search for web pages that mention him.
Clearly, there are many approaches to real-time search. At Wowd, we plan to offer you a service that starts with real-time discovery, to allow you to see what's new, and popular, with your fellow human beings. Once you find things that are interesting to you personally, you can use Wowd's real-time search feature to dig more deeply into that material.
We've got a lot of new features coming soon, to help you do real-time discovery even more effectively. So, watch this space, as they say.
We'd love to know what you think. Please feel free to get in touch.