Thursday, February 17, 2011

What was the main factor for Watson's success? Hardware, software, or data?

I can think of three thinks that may have allowed Watson to win Jeopardy:
  • Hardware: From a comment at Shtetl-Optimized, "The hardware Watson was running on is said to be capable of 80 teraflops. According to the TOP500 list for November 2000, the fastest supercomputer (ASCI White) was capable of 4.9 teraflops." So, computers became 40x faster over the last 10 years. Is this the winning factor?
  • Software: A couple of months back Noam Nissan reported: "while improvements in hardware accounted for an approximate 1,000 fold increase in calculation speed over a 15-year time-span, improvements in algorithms accounted for an over 43,000 fold increase." So, maybe it is just the better NLP and machine learning algorithms that played the crucial role in this success.
  • Data: 10 years back we did not have Wikipedia, and its derivatives, such as Wiktionary, WikiQuote, Wikispecies, DBPedia, etc. Such resources add a tremendous value for finding connections between concepts. 
My gut feeling says that the crucial factor is the development of the data resources that allowed Watson to answer such trivia questions. Without discounting the importance of hardware and software development, without having such tremendously organized and rich data sources, it would not be possible for Watson to answer any of these questions. The IBM WebFountain was around for a while, but trying to structure the unstructured web data, and get meaning out of such data, is much harder than taking and analyzing the nicely organized data in DBPedia.

To paraphrase a loosely-related quote: Better data usually beats better algorithms.