Computational RAM



In a typical computer with 32MB of 16Mb DRAM chips and a 100MHz processor, there is 3000 times the bandwidth available inside the memory vs. at the CPU. If you can't bring the memory bandwidth to the processor, then bring the processors to the memory.

Computational RAM Page Highlights

Computational Ram: A Memory-SIMD Hybrid and its Application to DSP

Duncan G. Elliott, W. Martin Snelgrove, and Michael Stumm. Computational RAM: A Memory-SIMD Hybrid and its Application to DSP. In Custom Integrated Circuits Conference, pages 30.6.1--30.6.4, Boston, MA, May 1992.

postscript paper less photos (181KB)

chip micrograph (584KB)

PE detail micrograph (136KB)



These files are also available via anonymous FTP from

ftp.eecg.toronto.edu in /pub/tech_reports/dunc/*

Boiler Plate:

A PetaOp/s is Currently Feasible by Computing in RAM

From the proceedings of the PetaFLOPS Frontier Workshop held February 1995 in Washington DC during The Fifth Symposium On The Frontiers Of Massively Parallel Computation:

postscript paper (51KB)

HTML paper

Slides (65KB)

Other information on PetaFLOPS Enabling Technologies and Applications

Compiler

Performance

Simulated performance by application program cram (ms) host (ms) speedup Source code 3x3 Convolution 16M 17.6067 112760.0 6404 Parallel, Sequential FIR 128K 40b 0.0991 311.7 3144 Parallel, Sequential FIR 4M 16b 1.0437 5144.4 4929 Parallel, Sequential Vector Quantization 25.746 33780 1312 Parallel, Sequential Masked Blt 0.0182 442.8 24310 Parallel, Sequential LMS Matching 0.2003 250.9 1253 Parallel, Sequential Data Mining 70.66 192450 2724 Parallel, Sequential Fault Simulation 0.0894 2380.0 26626 Parallel, Sequential Satisfiability 0.0232 959.0 41391 Parallel, Sequential Memory Clear 0.0016 8.8 5493 Parallel, Sequential

Computational RAM: A Memory-SIMD Hybrid

Computational RAM (C-RAM) is semiconductor random access memory with processors incorporated into the design to build an inexpensive massively-parallel computer. If an application contains sufficient parallelism, it will typically run orders of magnitude faster in C-RAM than the central processing unit. This work includes architecture, prototype chips, compiler and applications.

C-RAM integrates SIMD (Single Instruction stream, Multiple Data stream) processors into random access memory at the sense amplifiers (along one edge of a 2 dimensional array of memory cells). The novel combination of processors with memory (the memory retains its memory interface) allows C-RAM to be used as computer main memory, as a video frame buffer or for stand-alone signal processing. The use of high-density commodity dynamic memory makes C-RAM economical. The bit-serial, externally programmed processing elements (PEs) add only slightly to the cost of the chip (9-20%), yet a workstation with 32Mbytes of C-RAM would have an aggregate performance of 13 billion 32 bit operations per second. A working 64 processing element per chip C-RAM has been fabricated and the PE for a 2048PE, 4Mbit chip has been designed.

The performance of C-RAM for kernels and real applications was obtained by simulating their execution. For this purpose, a prototype compiler was written. Applications are drawn from the fields of signal and image processing, computer graphics, synthetic neural networks, CAD, data base and scientific computing.

Computing in Memory Bibliography

Future

Professors accepting graduate students

Keywords: smart memory, smart DRAM, intelligent memory, intelligent DRAM, processors in memory, processing in memory, computing in memory, pitch-matched logic in memory, application specific memory, application specific DRAM, massively parallel computer, massively parallel computing, massively parallel SIMD, MPP, IRAM, DSP, VLSI, logic enhanced memory, logic enhanced DRAM, merged DRAM-logic, MPP applications, graphics, digital signal processing, image processing, image compression, scientific computing, database

Duncan's home page