There are two ways to make a frosted cake.
The first is to construct a self-supporting structure out of frosting with a hole in the middle. Then bake a cake. Then carefully insert the cake into the frosting structure without collapsing the frosting or destroying the cake.
The second is to bake a cake and smear on frosting.
Similarly there are two ways to combine memory and CPU. Embedding memory into an existing CPU seems the most obvious. Embedding CPU into an existing DRAM seems a little more difficult, and it is. However the embedded memory approach turns out to be stupendously expensive while the less obvious approach is cheaper than dirt.
Combining memory and CPU was first proposed by Fish and presented in the Moore/Fish patent filed in August of 1989. The immediately obvious benefit of such a combination was reducing the time for CPU to access memory.
Subsequent to the issuance of the patent DARPA (Defense Advanced Research Projects Administration) and JPL (NASA Jet Propulsion Laboratory) funded about a dozen of the best known computer architects in the country to implement the technology. Architects included David Patterson of Berkeley, inspiration for the SPARC microprocessor and author of the most widely used text in the area, "Computer Architecture, an Integrated Approach", Thomas Sterling of Cal Tech, creator of the massively parallel Beowulf Cluster, and Peter Kogge of Notre Dame, inventor of the Kogge/Stone Adder. Over a period of 10 years DARPA spent as much as $500M on projects related to solving this problem.
The DARPA architects succeeded in combining CPU and memory in several of their efforts. They confirmed that bringing CPU and memory together on a single chip significantly improved performance. However none of the projects left so much as a ripple in the computer world. DARPA succeeded in creating stupendously expensive difficult to make designs with undersized memories. The last of the projects were defunded in December of 2011.
DARPA and JPL failed because they chose to make frosted cakes backwards. They called what they did PIM (Processor in Memory), but they really were doing MIP (Memory in Processor). They embedded memory into existing CPUs. CPUs are normally built on complex processes that emphasize performance and are unconcerned about leakage. Dynamic RAMs (DRAM), the most common main memory technology, remembers data by storing charge on small capacitors. DRAM processes must be very low leakage or the memories will forget.
The DARPA and JPL architects solved the problem of combining the two technologies by using a new DRAM technology that could be "embedded" into existing CPU logic. The solution had the following disadvantages:
The TOMI Technology makes CPUs using transistors fabricated on existing DRAMs using unmodified DRAM processes. The TOMI approach has the following advantages over the DARPA approach:
TOMI CPUs are immediately adjacent to the main memory DRAM. Therefore access to main memory is very nearly the same as the speed of access to the cache of a legacy CPU. A cache is a high speed memory placed on a CPU to hold data fetched from main memory. Data is often reused in some computer programs. The cache allows once read data to be re-read from the high speed cache instead of the slow speed main memory. On a legacy CPU a main memory read penalty can be 200-500X the time of a cache read.
Cache effectiveness assumes that data will be reused by a program. This is true of legacy applications such as Microsoft Office. However the most important high performance applications today are those that handle what is called Big Data. Big Data refers to amounts of information that are too large to handle with traditional tools such as single processors and SQL databases. Big Data analysis attempts to predict the future by analyzing patterns in past behavior. Examples include analyzing ecommerce transactions in real time in order to adjust advertising or pricing on the fly.
Two of the most popular techniques for managing Big Data are MapReduce and Graph Analysis. MapReduce sorts huge amounts of data and Graph Analysis locates patterns. The patterns in many Big Data applications are nearly random. This means that the relationships to be analyzed are not likely to be adjacent in memory. Furthermore, once a relationship is identified it will not be referred to again very often.
A cache is only effective if the data it holds is accessed repeatedly. Big Data Graph Analysis destroys cache effectiveness by constantly requiring access to main memory and incurring the 200X speed penalty.
TOMI Technology positions the CPUs within a few microns of main memory thereby eliminating the speed penalty.
Finally, DRAM transistors are about the cheapest transistors that can be made. For example, 1 billion INTEL Xeon transistors made on a CPU process cost more than $200. Four billion transistors on a 4G DRAM made on a memory process cost less than $1.50. A TOMI 8-core 64-bit CPU adds about 4 million transistors to a 4 billion transistor 4G DRAM.
Designs made with TOMI Technology are therefore very inexpensive.
"I actually have to go on record as saying that, at some time, this (TOMI) would be the way to go." EDN
- Dr. David Patterson (RISC visionary, SPARC inventor, IRAM inventor)
"...delighted, even envious" WIRED
- Dr. Thomas Sterling (Creator of Beowolf supercomputer, DARPA Excascale project, Gilgamesh inventor)
"The entity that controls [TOMI] probably controls computer architecture to the end of silicon." WIRED
- Russell Fish