# A Rapid Prototype Wafer Scale System Design for Signal and Data Processing D. Landis: University of South Florida H. Brown: University of Central Florida T. Sanders, M. Shahsavari, J. Hadjilogiou: Florida Institute of Technology R. Shankar: Florida Atlantic University ### Abstract This paper describes a WSI rapid prototype system design project supported under the Florida Advanced Microelectronics and Materials (DARPA\*) "Alpha-Site" project. Applications which can be implemented as WSI systems are being developed at three Florida Universities (alpha sites): University of Central Florida (UCF), Florida Institute of Technology (FIT), and Florida Atlantic University (FAU). A common wafer level design has been defined which can be rapidly restructured to meet the system requirements of each university. A system design verification approach which involves defining a common CAD tool base for high level modeling has been agreed upon by the multi-university team. Each alpha site is developing high level models for their architecture using VHDL (VHSIC Hardware Description Language), and the physical wafer structure will be automatically mapped to the wafer from the VHDL model port assignment definitions. Architecture, modeling, cell design, wafer layout, and WSI system development support for this project are being provided by the Center for Microelectronics Research (CMR) at the University of South Florida (USF). ## Paper Summary In order to meet the signal and data processing requirements for a wide range of applications, a common WSI Processing Element (PE) cell architecture has been jointly developed by CMR and the three alpha sites. Table 1 summarizes the applications which are being addressed by each of the alpha-site designs. A PE cell which meets the requirements these applications requires an 8X8 array multiplier, a 24-bit accumulator, a 32X8-bit RAM (dual port read, single port write), and a multi-function 8-bit ALU (capable of performing add, subtract, and algorithmically supported inversion and division) [1]. As illustrated in figure 1, the cell will essentially be a small programmable special purpose CPU tailored toward parallel signal or data processing and neural network applications. The cell has 40 signal inputs, 17 signal outputs, and 2 bi-directional I/O signals; as indicated in Table 2. Each primary input <sup>\*</sup> This research is being supported by the Defense Advanced Research Projects Agency under DARPA Grant No. MDA 972-88-J-1006. Figure 1. Alpha Site Processing Element Cell and output will incorporate boundary scan in order to facilitate both wafer probe and system level interconnect as well as functional testing. In addition, a complete at-speed self-test of the embedded static RAM is being designed to identify RAM delay faults and support laser restructuring of the redundant (defect tolerant) features. Both redundant rows and columns are included in the RAM for yield enhancement. Wafer level restructuring to interconnect the good PEs will be performed using the RVLSI laser diffused link technology which was developed at the MIT Lincoln Laboratories [2,3] and was transferred to the University of South Florida under DARPA support. ### TABLE 1. Alpha Site WSI Applications - UCF -- Feed Forward Artificial Neural Network data processor; primary application to small field target recognition. - FAU -- Image perception system (ALOPEX biologically influenced computational paradigm); application to pattern matching & handwritten character recognition. - FIT -- Parallel Convolution signal processor; application to automatic target recognition, machine vision, and autonomous vehicles or robots. #### TABLE 2. Alpha-Site PE Cell Inputs and Outputs ``` +DA 8-bit parallel A-input data bus. +DB 8-bit parallel B-input data bus. 5-bit parallel input address for port A of the dual port RAM. +addr a 5-bit parallel input address for port B of the dual port RAM. +addr b local (row) write enable RAM input (active low) +we l global (column) write enable RAM input (active low) +we g +msb inout most significant bit data input/output to the RAM shifter +instr 6-bit instruction input to the PE. +cin external carry input global (column) output tristate enable for both PE and mult outputs. +oen g local (row) output tristate enable for the PE OUT B tristate (active low). +oen 11 local (row) output tristate enable for the PE OUT A tristate (active low). +oen 21 phase 1 of non-overlapping 2-phase clock input +P1 phase 2 of non-overlapping 2-phase clock input +P2 +PE out A 8-bit parallel tristate-able output. (primary data output) +PE out B 8-bit parallel tristate-able output. (multiplier and buffer output) single bit carry output signal from the ALU + cout +lsb inout least significant bit input/output from the RAM input shifter ``` An instruction set definition is being developed by CMR for the alpha-site PE cell based upon requirement inputs from each of the alpha sites. In order to minimize the number of global (wafer level) control lines, an encoded set of control lines will be decoded by the PE cell, corresponding to this pre-defined instruction set. The VHDL cell model as well as the cell physical design are both being developed at USF/CMR. The alpha sites will ultimately incorporate the detailed cell level VHDL model with their complete architectural level VHDL simulation activities. In this way, final system level verification can be performed for all three architectures prior to submitting the wafer design to fabrication. In addition, FIT has developed of a software translation tool which will create the required WSI system netlist directly from the VHDL architectural model. This tool automatically creates the input file needed by the MIT Lincoln Labs SLASH tools [4] which will be used for WSI reconfiguration. In a wafer scale system design, individual cell (chip) sites must be tested prior to wafer level restructuring in order to identify, and avoid connection to, defective cells. This could be accomplished using a conventional wafer probe operation on each cell utilizing full size probe pads at each I/O location. However, this wastes wafer area since these pads will not be needed later for package level wire bonding. An alternative is to use the IEEE 1149.1 (JTAG) [5] boundary scan method which provides serial access to the cell inputs and outputs for complete I/O controllability and observability. In addition, system level access to the cell (chip) level boundary can be provided using the 1149.1 Test Access Port and test bus capability to address each cell site on the wafer. A Standard Test Interface (STI) for WSI has been developed at the University of South Florida, based on the IEEE 1149.1 test bus standard [6]. It consists of an 1149.1 Test Access Port along with serial pattern generator and signature analyzer circuits to facilitate Built-In Self-Test. This interface will be used in the alpha-site design to identify good cells through a serial test using a standardized 6-pin probe card (4-wire test bus plus Vdd and Gnd). Wafer probe tests as well as laser restructuring tests, using the Lincoln Laboratories RVLSI wafer restructuring techniques, are easily coordinated through this interface. USF will also be responsible for the full wafer layout, floorplan, and design-for-test aspects of the alpha-site design. A critical task is to identify the number of wiring tracks needed to insure wireability for each of the three designs, based upon feedback from each alpha site. Each university is evaluating the wafer floorplan, using the place and route capabilities of the SLASH tools, to determine if it meets their system interconnect requirements. This wafer floorplan uses a sub-wafer cell partitioning organization to enhance testability. Figure 2 illustrates the wafer floorplan, which incorporates a 3X3 cluster of PE cells and STIs to facilitate wafer probe and test-restructure-test operations. Additional USF responsibilities include tile definition, layout, and simulation (where a tile represents a 3X3 PE cluster including associated test circuitry). Physical design of this wafer is being performed using 2 micron sCMOS rules from MOSIS so that wafer fabrication will not be locked in to any single foundry. Figure 3 shows the floorplan of the PE cell, which measures approximately 3mm on a side. The major functional blocks (RAM, Multiplier, ALU, etc) are identified in this figure. The STI cell, which only measures 2mm on a side, is illustrated in figure 4. The additional tile area adjacent to the STI, which can be seen in the wafer floorplan of figure 2, will be used for Figure 2. Alpha Site Wafer Layout global clock and signal buffers. Input, output, and control signals are distributed around the periphery of the PE as shown in the cell pin-out diagram of figure 5. Note that the data inputs are located at the top of the cell and data outputs at the bottom. This will simplify the data path interconnections between PEs. The actual wafer tile structure will contain both true and mirrored versions of this cell, in adjacent columns. This will allow the instruction, Figure 3. PE Cell Floorplan address, and control bus connections to be routed such that they can be easily shared in the common vertical wiring channel between PE cells. The PE cell is scheduled to be submitted to MOSIS for fabrication as a functional verification test chip in August 1990. The STI is scheduled for verification as part of a complex multiplier chip which will be fabricated through MOSIS in the same time frame. The complete alpha-site wafer should be released to fabrication in December 1990. Figure 4. STI Cell Layout # Neural Network Applications of the Alpha-Site WSI Design Neural networks have been shown to solve classes of problems which are intractable using conventional algorithmic approaches. Neural network techniques are suitable for large scale parallel processing which can achieve computational power that exceeds, by many orders of magnitude, the capabilities of the sequential computing architectures employed in conventional digital computers. However, hardware implementations of neural networks have certain constraints that have bounded the promise shown in theoretical studies. Because neural networks are massively parallel in nature, wafer scale technology offers a number of advantages for implementation. By providing a large number of processing units on a single wafer, the high communication bandwidth between elements allows for higher computation speed. Power dissipation is also reduced when compared to similar circuit implementations using other technologies and packaging techniques. Figure 5. PE Cell Pin-Out Diagram The neural network design developed by UCF as their WSI alpha-site application consists of a three-layer network, with each layer containing a maximum of 32 neural network processing elements. This architecture is oriented toward small field target recognition applications which can benefit from the rapid prototyping capability and high density available using the RVLSI wafer scale technology. Software support tools under development at UCF will allow a user to select a specific set of application driven components and quickly generate a hardware prototype system. In this way, target recognition systems could be rapidly configured to various types of inputs and information capacities. For small field target recognition, each wafer must process a 32-byte input, or one half of an 8X8 pixel field. This input pixel field size corresponds to a Forward Looking InfraRed (FLIR) targeting application being developed as a cooperative effort between UCF and their industrial partner, Martin Marietta. The total number of processing elements available per layer will be set by technology constraints. Each processing element must perform a 32 X 32 byte dot product whose result is then passed to the saturation logic whose output is limited to $\pm 127$ . The final saturated result is then passed to the next layer, which is the network output in the case of the third layer. To meet these requirements, each neural network cell will utilize the following alpha-site processing element features: - 1. 32 X 8 RAM (for holding connection weight data) - 2. 8 X 8 Two's Complement Multiplier (for connection weighing) - 3. 16-bit adder with an Accumulator Register (for synapse integration) - 4. Saturation Logic (to emulate a tanh(x) function) FAU has developed a wafer scale architecture of an image perception system based on a biologically influenced computational paradigm (ALOPEX). ALOPEX uses a stochastic procedure to find the global optimum of linear and nonlinear functions. Using the Boltzmann probability distribution function, the ALOPEX algorithm generates probabilities of taking positive or negative steps away from its current position on the path toward the global optimal solution. Extensive simulations have verified the use of ALOPEX for solving a variety of problems [7]. FAU has developed approximations, simplifications and level quantizations for the algorithm to yield simple, space-efficient and fast processing elements [8,9]. These modifications are necessary in order to implement ALOPEX in the digital domain. FAU initially performed algorithmic simulations at the behavioral level using DABL, a hardware description language supported on DAISY CAE systems, and is currently in the process of transferring these simulations into VHDL. This will allow them to use the VHDL to SLASH tool translation necessary for WSI structural design mapping. Unlike other neural networks, the FAU ALOPEX architecture it is not a connectionist network with dense interconnections, and may be implemented in digital ULSI/WSI since the processing at the neuronal processing element (NPE) level is simple and all NPEs can be operated in synchrony. The hierarchical SIMD architecture for WSI implementation of the ALOPEX machine is organized as a N X M array of NPEs (each constructed from a single WSI Processing Element) which synchronously process inputs. There are no direct interconnections among PEs. During any cycle of iteration, the outputs of a row of NPEs are transmitted to its partial cost function (PCF) element, also implemented using a single WSI Processing Element. The cost function for each row can be determined separately because of the symmetry in the cost function equation. The processing in all the stages of the pipeline will be performed with the alpha-site processing element (PE) cell previously described. Thus, for a 16 X 16 image size, the number of PE cells needed will be 16 x 16 + 16 + 1 = 273, an optimistic number for our WSI project. Cell dimensions and yield constraints will limit the number to half as many. However, the FAU system can be operated at different levels of resolution (from coarse to fine) and with different task partitions, such that lower numbers of good PEs will still allow a working system to be configured. The ALOPEX algorithm will use most of the resources of the PE cell, including the capability to generate a cellular-automata based probability function [10], by providing one communication link between neighboring PE cells. The three stages of the pipeline will be controlled with three Am2910 microsequencers, with data transfer occurring between the stages with the aid of a full handshake. Simulation of the Am2910 has been incorporated in the overall architectural VHDL model. A Motorola 68000 microprocessor system will act as the host for downloading the images and templates and for reading the results of the processing. FAU has identified IBM - Entry Systems Division (Boca Raton) as their industrial partner and are continuing discussions on the potential use of ALOPEX by IBM for character recognition. Other possible applications for this system include tactical target recognition using laser radar images such as those contained in the AFWAL database. ## Signal Processing Applications of the Alpha-Site WSI Design It is clear from the block diagram of the Processing Element cell that our alpha-site WSI design is well suited for implementing a wide variety of digital signal processing algorithms. The single cycle parallel multiplier/accumulator will support high speed digital filtering and FFT operations, and the programmable ALU adds flexibility for application customization. FIT has developed a novel architecture which will exploit many of the features of our WSI PE and rapid prototype wafer system architecture. This architecture is capable of computing the cross-correlation of an image with a template. The correlation between local regions of two or more images must be computed for a variety of image analysis tasks, including pattern recognition. The basic requirements of this application are repetitive multiply and accumulate operations. In this approach, a systolic array architecture is used which consists of an array of densely connected identical processing elements. The system works on input data from an image and calculates the cross-correlation at each pixel in one clock cycle. The performance of this WSI architecture will allow real-time local correlations and data fusion to be performed. Nearly all functional Processing Element cells on the wafer can be utilized to speed up the correlation operation by exploiting parallelism on the wafer. The major applications for this architecture are in automatic target recognition, machine vision, and autonomous vehicles or robots. FIT has teamed with Harris (Melbourne) as their industrial partner. ## REFERENCES: - [1] D. Landis, S. Athan, H. Brown, T. Sanders, S. Kozaitis, M. Shahsavari, R. Shankar, "A Multi-function Wafer Scale System Architecture for Signal and Image Processing Applications", Proceedings of the 1990 Florida Microelectronics Conference, May 10-11, 1990, pp. 21-24 - [2] P.A. Wyatt & J. I. Raffel, "Restructurable VLSI A Demonstrated Wafer Scale Technology", Proc. 1989 International Conference on Wafer Scale Integration, January 3-5, 1989, pp. 13-20. - [3] A. H. Anderson & R. Berger, "RVLSI Applications and Physical Design", *Proceedings* of the 1989 International Conference on Wafer Scale Integration, January 3-5, 1989, pp. 39-45. - [4] R. Frankel, et. al., "SLASH An RVLSI CAD System", Proceedings of the 1989 International Conference on Wafer Scale Integration, January 3-5, 1989, pp. 31-38. - [5] Test Bus Standardization Committee Working Group of the IEEE Computer Society's Test Technology Technical Committee, "Standard Test Access Port and Boundary-Scan Architecture", IEEE Std 1149.1/D6, November 22, 1989, (Draft Version 6 approved as IEEE standard 2/90). - [6] D. Landis, "A Self-Test Methodology for Restructurable WSI", Proceedings of the 1990 Wafer Scale Integration Conference, January 23-25, 1990, pp.258-264. - [7] Harth, E., Pandya, A.S., "Dynamics of Alopex Process: Applications to Optimization Problems," Biomathematics and related computational problems, Ricciardi, L.M., ed., pp. 459-471, Kluwer Academic Publ., 1988. - [8] A. Pandya, R. Shankar, and L. Freytag, "An SIMD Architecture for Alopex Neural Network", Proc. of SPIE/SPSE Conf. on Parallel Architectures for Image Processing, Feb. 1990. - [9] E.E. Pesulima, R. Shankar, and A.S. Pandya, "Digital Implementation of the Sigmoidal Transfer Function for Stochastic Neural Networks," *Proceedings of the Florida Microelectronics Conference*, Melbourne, FL, May 1990. - [10] P.D. Hortensius, R.D. Mcleod, W. Pries, D.M. Miller, and H.C. Card, "Cellular Automata-Based Pseudorandom Number Generators for Built-in Self-Test," *IEEE Trans. Computer-Aided-Design*, Vol. 8, No. 8, pp. 842-859, August 1989. incoly labor water | Metal 2 | | Metal 2 | '4' | | |------------------------|---------|-----------|------------------|---------| | Metal 2 to Metal 1 Uia | Metal 1 | | (-14 Microns **) | Metal 1 | | Metal 2 | · | | Metal 2 | | | • | Metal 1 | 6 Microns | 4 Microns | Metal 1 | ( ) | | | + Microns | | | |---------------------------|---------|------------------------|---------|----| | Metal 2 | | Metal 2 | · | | | Metal 2 to Metal 1<br>Uia | | 5 Microns N 16 Microns | Metal 1 | | | Metal 2 | | Metal 2 | | ٠. | | r | Metal 1 | | Metal 1 | | | DATE<br>1/8/90 | | |------------------------------|--| | DESCRIPTION<br>UNDERLAY MOD, | | | REI<br>S | | 320 LEAD WSI PACKAGE (\*/o LEAD FRAME) | MATERIAL: | WAFER SCALE PACKAGE | |-------------|---------------------| | | | | SCALE: 1:1 | DWG. # PACK2 | | NEXT ASS'Y: | PAGE 1 OF 1 |