IBM Watson Strategy

IBM's Watson Strategy and Possible TOMI Synergy

Watson and Jeopardy!

Watson is a computer system specifically built to compete in and win a special program of the game show Jeopardy!. IBM's ingenious objective was introducing its next generation Big Data tool (and its "Watson" brand) to the mass public through a highly rated television show. The three Jeopardy! episodes aired February 14-16 2011 and grabbed the highest audience numbers in 6 years.

Technically Watson is a Big Data analytics system running on 2,880 PowerPC cores with 16Tbyte of RAM. The estimated cost of just the hardware is reportedly $3M. Much of the inner workings have not been publicly released, but it is known that the software is built using a tool IBM calls DeepQA, implemented using Hadoop (a Java implementation of Google's MapReduce), and running on SUSE Linux Enterprise Server.

Jeopardy! is primarily a show that tests a competitor's trivia knowledge. As such a human with a fast Google connection could probably do well or even win. However Watson is more than a search engine in at least two ways:

  1. It does "semantic analysis" of the question text. That means it attempts to understand the context of the question instead of just searching for words and phrases. Furthermore Jeopardy! questions are worded as answers and often include puns and other word play that present rather significant challenges to robotic analysis. Watson appeared to handle the language tricks quite well.
  2. It uses a variety of heuristic algorithms to accelerate the search process. (The winner of a Jeopardy! question must punch the buzzer ahead of the other competitors.)

Watson's creators have stated that the algorithms are embarrassingly parallel. This probably means that the underlying engine is more MapReduce than graph analytics. The lack of specificity on the interprocessor network tends to indicate it might be something along the lines of a simple switch instead of a 3D Torus as used in BlueGene. A simple network control would be quite adequate for a MapReduce embarrassingly parallel engine such as Hadoop.

However Watson was a marketing effort not just a science project. The metrics for Watson marketing were the Nielsens.

The Jeopardy! audience for Monday, February 14, was an 8.7 rating and 16 share, Tuesday 9.5/17, and Wednesday 9.1/17. This means that for thirty minutes a night, for three consecutive nights, more than 30 million prime time viewers watched a disembodied automaton named WATSON best some of the smartest humans on earth.

As a public relations branding, identity building, and positioning move, the Watson Jeopardy! effort must be seen as a master stroke on the order of Steve's "1984" Super Bowl ad.

"Ginni from the Block" (Apologies to Jennifer Lopez)

As an outsider it's hard to know the complete story of how Watson came to be, but from the publicly available documents and the internal IBM blogs, it appears that WATSON was Ginni Rometty's baby (no pun intended). During the Watson effort, she was VP Marketing and Strategy. She seems to have taken the "strategy" part very seriously.

In the 1970s IBM famously missed the shift of data management from ISAM files to relational databases even though the inventor of RDBMS, Edgar Codd, was one of its researchers. It fell to a brash college dropout named Larry Ellison to redefine the industry and vastly enrich himself with a company named Oracle.

IBM never caught up.

"Ginni", the appellation by which she is lovingly addressed by apparently one and all within IBM, seems to have a feel for the future as well as an understanding of the past. At an interview following a meeting of the Council on Foreign Relations she was asked, "What keeps you up at night?" Her answer, "The biggest thing to fear in this business is you miss a shift."

.......straight out of Andy Grove, "Only the paranoid survive."

Car companies can be run by managers. Technology companies should probably be led by visionaries. Only visionaries know the technology future will always be radically different than the present. Their careers are motivated by that perpetual fear of missing the shift. Andy, Steve, and now maybe Ginni.

Andy had a Ph.D. in physics and literally wrote the book on the semiconductor technology that made the microprocessor possible. Steve was a once in a century self-educated genius of human nature and behavior with an intuitive feel for commerce. Ginni appears to be somewhere in between. She's the only IBM chairman other than T. Vincent Learson and Louis Gerstner who could be considered a technologist. Louis saved the company from certain death and T. Vincent was the driving force behind an obscure project called the System 360.

On paper at least, she appears closer to T. Vincent as a strategic thinker. Some have said IBM bet the company with the $5B spent developing System 360. IBM is not exactly betting the company on Big Data, but they are placing a bet that will determine whether the future of IBM is as a mature services provider with a fading stable of big iron or once again the technical (and financial) industry leader.

Monetizing the Game Show Victory

"Big Data: The Next Frontier", a McKinsey report from June 2011, projected Big Data's benefit to health care as $300B per year. Six months later IBM announced Watson would help Sloan-Kettering find a cure for cancer. At their most recent investor briefing their Watson demo used its artificial intelligence to whip up unusual recipes when given health and other constraints.

Game show and kitchen mastery are parlor tricks that successfully excite and entertain a crowd. However IBM is neither Disney nor Starbucks, it must find a way to monetize technology and return significant value to shareholders. As a medical tool, IBM must convince doctors that Watson offers a valuable second opinion and not an annoying second guess.

Ginni has 5 years to make big money in Big Data. In order to make money in a market that does not currently exist, she must also solve two significant Watson problems: power and cost.

TOMI Technology, Positioning, and Synergy

Ginni has a vision of Watson everywhere from super computer to smart phone. In February 2013 IBM announced "some hardware and software elements derived from Watson" available on Power servers starting at $5,947. Another source reports it takes a minimum of 12 of these servers to run a basic Watson. Whatever the case, this could be the beginning of a compatibility strategy that looks suspiciously like System 360, IBM's most profitable product line ever.

Now the bad news. Watson is a power company's dream. The Jeopardy! Watson I purportedly consumed 200,000 watts. Put another way, not only does it consume about the same energy as 100 homes, the operator has to install 50 tons of air conditioning to remove the 600,000 BTU/hr of heat it generates. IBM's Bernie Meyerson has been telling reporters that Watson's power consumption is "dropping like a stone".....which brings to mind a very large hot rock.

The second bad news is that Watson is expensive. Reportedly the Watson I hardware alone cost $3M. From an end user customer's perspective the wonderful, future predicting, problem solving, analytic capabilities must be weighed against its cost. From IBM's perspective not only must they price Watson low enough to entice customers, but they must price it high enough to generate good margins. For Watson these two elements are in significant conflict.

Based on these two Watson challenges there might be some synergy with TOMI Technology.

The following are areas of potential discussion:

  1. Factoring Watson into Intellect and Grunt Work

    From an outsider's perspective Watson appears to be a semantic analyzer that parses and pre-processes the requests then sends them to a Big Data engine that does the grunt work of retrieval. IBM has acknowledged that Watson uses Hadoop as its Big Data engine.

    Most of the Watson intelligence is therefore probably in the analyzer and pre-processor. These algorithms are key to Watson's performance and its user interface, but they probably consume as little as 1-2% of Watson's computer resources and power.

  2. Potential Power Consumption Savings

    If our assumptions on factoring Watson into Intellect and Grunt Work are correct, a TOMI Celeste array could possibly be created to perform the MapReduce function at reduced power from the current configuration. We know from our work with TOMI Borealis on Sandia's MapReduce MPI that core-for-core the TOMI architecture consumes about 1% of the energy as a high end INTEL Xeon processor when both are running at full speed. IBM has designed Power7 to clock even faster than INTEL and it has a similar power profile.

    Furthermore due to the low leakage of its DRAM process transistors, TOMI Celeste can be placed in idle between requests to reduce consumption by another 100X between requests.

  3. Potential Cost of Goods Reduction

    TOMI Celeste is built in an unmodified DRAM fab using DRAM tooling and the existing 3 layer metal DRAM technology. The die size of the 4G DRAM with eight 64-bit cores is only 10% greater than the parent 4G DRAM alone. The cost to manufacture the parent DRAM is about $2, so the cost of the eight TOMI Celeste cores added to the 4G DRAM can be inferred to be 10-20% greater than the DRAM alone.

    The IBM Power7 used in Watson is built on a 11 metal layer 45nm logic process with eDRAM and SOI. The most recently reported die size is nearly an inch on a side. The Power7 cost is therefore probably greater than that of TOMI Celeste.

Watson with a TOMI Celeste Back End

Watson I fit 2880 cores and 16Tbyte of RAM in 10 racks.

If our factoring assumptions are correct, it could be possible to configure a system similar to Watson I consisting of:

As seen here: http://www.venraytechnology.com/Implementations.htm

The total power consumption would be around 15kw compared to Watson I's 200kw. TOMI power consumption is strongly application and workload dependent since cores can be switched to idle in a few nanoseconds to reduce power by 100X. In a Jeopardy! or even a hospital setting responding to human requests, power could conceivably be much lower.

Other Strategic Factors

  1. Using TOMI Celeste to do the grunt work for Watson, IBM could easily create a scalable product line from a single TOMI Celeste motherboard (32,768 cores + 1Tbyte) up to several million cores.
  2. IBM already has much of the world's massively parallel experience through its Blue Gene project. TOMI Celeste is inherently modular by design with a 3D Torus as its native interprocessor communications, the same as Blue Gene.
  3. TOMI Technology is a very defensible technology. Since merging DRAM and CPU was first patented in 1989, several dozen CPU projects have demonstrated the performance advantages of embedding DRAM on existing logic processes. The Power7 uses a similar technique to fabricate its caches.

However, only TOMI Technology has shown how to design multiple general purpose CPUs that can be routed using only 3 layers of metal of a commodity DRAM process. Only by fabricating on an unmodified DRAM process are the significant power and cost reductions achievable.

TOMI Technology might allow Ginni to fulfill her vision of Watson everywhere a little quicker.

"I actually have to go on record as saying that, at some time, this (TOMI) would be the way to go." EDN

- Dr. David Patterson (RISC visionary, SPARC inventor, IRAM inventor)

"...delighted, even envious" WIRED

- Dr. Thomas Sterling (Creator of Beowolf supercomputer, DARPA Exascale project, Gilgamesh inventor)

"The entity that controls [TOMI] probably controls computer architecture to the end of silicon." WIRED

- Russell Fish