1994

Hardware realization of real-time two-dimensional IIR filters for broadcast TV images.

Herbert Joseph. Kaufman

University of Windsor

Follow this and additional works at: https://scholar.uwindsor.ca/etd

Recommended Citation

https://scholar.uwindsor.ca/etd/2273

This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email (scholarship@uwindsor.ca) or by telephone at 519-253-3000 ext. 3208.
NOTICE

The quality of this microform is heavily dependent upon the quality of the original thesis submitted for microfilming. Every effort has been made to ensure the highest quality of reproduction possible.

If pages are missing, contact the university which granted the degree.

Some pages may have indistinct print especially if the original pages were typed with a poor typewriter ribbon or if the university sent us an inferior photocopy.

Reproduction in full or in part of this microform is governed by the Canadian Copyright Act, R.S.C. 1970, c. C-30, and subsequent amendments.

AVIS

La qualité de cette microforme dépend grandement de la qualité de la thèse soumise au microfilmage. Nous avons tout fait pour assurer une qualité supérieure de reproduction.

S'il manque des pages, veuillez communiquer avec l'université qui a conféré le grade.

La qualité d'impression de certaines pages peut laisser à désirer, surtout si les pages originales ont été dactylographiées à l'aide d'un ruban usé ou si l'université nous a fait parvenir une photocopie de qualité inférieure.

La reproduction, même partielle, de cette microforme est soumise à la Loi canadienne sur le droit d'auteur, SRC 1970, c. C-30, et ses amendements subséquents.
HARDWARE REALIZATION OF REAL-TIME TWO-DIMENSIONAL IIR FILTERS FOR BROADCAST TV IMAGES

by

Herbert J. Kaufman

A Dissertation submitted to the Faculty of Graduate Studies and Research through the Department of Electrical Engineering in Partial Fulfilment of the requirements for the degree of Doctor of Philosophy at the University of Windsor

Windsor, Ontario, Canada 1993
The author has granted an irrevocable non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of his/her thesis by any means and in any form or format, making this thesis available to interested persons.

The author retains ownership of the copyright in his/her thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without his/her permission.

L'auteur a accordé une licence irrévocable et non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de sa thèse de quelque manière et sous quelque forme que ce soit pour mettre des exemplaires de cette thèse à la disposition des personnes intéressées.

L'auteur conserve la propriété du droit d'auteur qui protège sa thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
ABSTRACT

In this dissertation, architectures, hardware design and prototypes for the realization of 2-D filters are presented. These filtering architectures are capable of attaining real-time processing rates for advanced television systems and are economical in terms of hardware cost, fabrication cost, and power consumption.

Sample-and-hold type realizations, operating on 2-D sampled data, based on the standard 2-D discrete-time transfer function $H(z_1,z_2)$ are presented. Both IIR and FIR realizations are developed in terms of high-speed systolic architectures. The design process culminates in the development of a $2 \times 2$ recursive prototype.

Instead of using the standard discrete-time transfer function it is also possible to develop 2-D filters based on a 2-D hybrid transfer function $H(z,s)$ which involves both $z$-domain and $s$-domain variables. These are highly suitable for filtering a raster scanned image, which can be characterized as an input signal $X(z,s)$, which is a function of these same two variables. Design considerations are presented which culminate in the development of a $1 \times 1$ recursive prototype.

The sample-and-hold systolic architecture was employed together with switched-capacitor circuit techniques to develop a 2-D real-time switched-capacitor recursive filter. This type of filter features greater accuracy than a conventional analog circuit as well as advantages for VLSI implementation.

In addition to presenting novel design methodologies for hardware prototypes, a novel function block approach for the SPICE simulation of 2-D modular systems with true 2-D data
is provided. This approach will serve to greatly facilitate 2-D filter development and improve the efficiency of the design cycle.
ACKNOWLEDGEMENTS

I would like to express my deep gratitude and sincere appreciation to Dr. M.A. Sid-Ahmed, my supervisor, for his guidance, support, advice, encouragement and commitment throughout the course of this research.

I would also like to thank Professors: Soltis, Alexander and Toews for serving on my committee and providing helpful suggestions, and comments.

Finally, I wish to thank my colleague, Dr. John Cardillo, for his support, encouragement and helpful suggestions.
TABLE OF CONTENTS

ABSTRACT ........................................ ii
ACKNOWLEDGEMENTS ............................. iv
TABLE OF CONTENTS .............................. v
LIST OF FIGURES ................................ vii

I:  INTRODUCTION ................................. 1
  1-1 Background ................................... 1
  1-2 Literature Survey ............................ 4
  1-2.1 Fundamental Concepts ...................... 4
  1-2.2 Direct Realizations ....................... 8
  1-2.3 Realization from Continued and Partial Fraction Expansion ......... 12
  1-2.4 Architectures for 2-D Filtering ............. 13
  1-2.4.1 High-Speed Delayed Multipath 2-D Digital Filtering .......... 13
  1-2.4.2 2-D Systolic Realizations ............... 18

II: REALIZATION OF 2-D IIR FILTERS USING SAMPLE-AND-HOLD TECHNIQUES .... 25
  2-1 Introduction ................................ 25
  2-2 Details of Hardware Design with Application to Homomorphic Filtering .. 35
  2-2.1 2-D Semi-Systolic Filter ................... 35
  2-2.2 Design of 1H Delay Line .................... 38
  2-2.2.1 The CCD IC ................................ 42
  2-2.2.2 Multiplexed vs. Serial Mode Operation .............. 43
  2-2.2.3 Clock Driver Circuit .................... 43
  2-2.2.4 Input Coupling Circuit and Biasing ............... 47
  2-2.2.5 Output Coupling Circuit .................. 47
  2-2.2.6 Low Pass Filter ......................... 47
  2-2.3 Design of PE .............................. 47
  2-3 Homomorphic Filtering ....................... 51
  2-4 SPICE Simulation of the 2-D Semi-Systolic Filter Structure .......... 58
  2-4.1 SPICE Modelling of a Single PE ............... 58
  2-4.2 SPICE Modelling of the Overall 2-D Sample-and-Hold Semi-Systolic Structure .... 68
  2-4.3 Factors Affecting Throughput Rate — 2-D Systolic Structure .......... 76
  2-5 The Hardware Assembly ....................... 79
  2-5.1 Logarithmic Converter ..................... 79
  2-5.2 Antilog Circuit ............................. 80
  2-6 Filtering of Images .......................... 82
### III: REALIZATION OF TWO-DIMENSIONAL HYBRID IIR FILTERS.

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3-1</td>
<td>Introduction</td>
<td>90</td>
</tr>
<tr>
<td>3-1.1</td>
<td>A Realization</td>
<td>93</td>
</tr>
<tr>
<td>3-2</td>
<td>Details of the Hardware Design</td>
<td>93</td>
</tr>
<tr>
<td>3-2.1</td>
<td>Design of Analog Processor Section</td>
<td></td>
</tr>
<tr>
<td>3-2.2</td>
<td>Design of 1H Delay Line</td>
<td>101</td>
</tr>
<tr>
<td>3-3</td>
<td>SPICE Simulation</td>
<td>102</td>
</tr>
<tr>
<td>3-3.1.1</td>
<td>Simulation of the SPICE PE</td>
<td>109</td>
</tr>
<tr>
<td>3-4</td>
<td>A Hardware Prototype</td>
<td>112</td>
</tr>
<tr>
<td>3-4.1</td>
<td>Evaluation</td>
<td>116</td>
</tr>
<tr>
<td>3-5</td>
<td>Conclusion</td>
<td>117</td>
</tr>
</tbody>
</table>

### IV: SWITCHED-CAPACITOR IMPLEMENTATIONS OF 2D FILTERS FOR VIDEO RATES

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4-1</td>
<td>Introduction</td>
<td>118</td>
</tr>
<tr>
<td>4-2</td>
<td>The 2-D Semi-Systolic Realization</td>
<td>118</td>
</tr>
<tr>
<td>4-3</td>
<td>A Switched-Capacitor Realization</td>
<td>120</td>
</tr>
<tr>
<td>4-3.1</td>
<td>Design of Line Delay</td>
<td>122</td>
</tr>
<tr>
<td>4-3.2</td>
<td>Realization of PE in SC Circuitry</td>
<td>123</td>
</tr>
<tr>
<td>4-3.3</td>
<td>Bank Output Summer</td>
<td>125</td>
</tr>
<tr>
<td>4-4</td>
<td>Conclusion</td>
<td>131</td>
</tr>
</tbody>
</table>

### V: CONCLUSION

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>References</td>
<td></td>
<td>133</td>
</tr>
<tr>
<td>Vita Auctoris</td>
<td></td>
<td>139</td>
</tr>
</tbody>
</table>
## LIST OF FIGURES

<p>| Figure 1-1 | The basic elements for signal processing | . | . | . | 6 |
| Figure 1-2 | Input and output masks for 2nd order filter | . | . | . | 7 |
| Figure 1-3 | A direct form realization | . | . | . | 9 |
| Figure 1-4 | A direct form 2-D Canonic realization (2 x 2) case — type I | . | . | . | 10 |
| Figure 1-5 | An alternate direct form 2-D Canonic realization (2 x 2) case — type II | . | . | . | 11 |
| Figure 1-6 | Realization of 5 x 5 FIR filter example | . | . | . | 16 |
| Figure 1-7 | — High Speed Delayed Multipath 2-D | . | . | . | 17 |
| Figure 1-8 | Realization of the general sub-block $A_3(z_1^2, z_2^1)$ | . | . | . | 19 |
| Figure 1-9 | Systolic implementation of polynomial evaluation | . | . | . | 21 |
| Figure 1-10 | 2-D systolic architecture (Sid-Ahmed [12]) | . | . | . | 23 |
| Figure 2-1 | 2-D systolic architectures given in [14] | . | . | . | 29 |
| Figure 2-2 | Partial realization of $z_2^{-1} Y(z_1, z_2)$ | . | . | . | 29 |
| Figure 2-3 | Processing element (PE) | . | . | . | 29 |
| Figure 2-4 | Realization of $M \times N$ IIR semi-systolic filter | . | . | . | 30 |
| Figure 2-5 | A 1-D realization | . | . | . | 33 |
| Figure 2-6 | &quot;Coefficient&quot; function realizations | . | . | . | 34 |
| Figure 2-7 | 2 x 2 semi-systolic filter structure for prototype | . | . | . | 36 |
| Figure 2-8 | The two types of PE for use in practical hardware prototype | . | . | . | 37 |
| Figure 2-9 | Block diagram of CCD type 1H line delay | . | . | . | 39 |
| Figure 2-10 | Block diagram of CCD321 IC | . | . | . | 40 |
| Figure 2-11 | Clock driver circuit for multiplex CCD | . | . | . | 45 |
| Figure 2-12 | Timing diagram for the CCD IC in the multiplexed mode | . | . | . | 46 |
| Figure 2-13 | Input coupling circuit for multiplexed CCD | . | . | . | 49 |
| Figure 2-14 | Output coupling circuit for multiplexed CCD | . | . | . | 49 |
| Figure 2-15 | Low Pass Filter — Removes CCD clock feedthrough | . | . | . | 50 |
| Figure 2-16 | Op-amp circuit for PE (type I shown) | . | . | . | 51 |
| Figure 2-17 | Homomorphic Filter | . | . | . | 55 |
| Figure 2-18 | Cross-section of circularly symmetric 2-D Butterworth filter function used in homomorphic filtering. $D(u, v)$ is the distance from the origin. | . | . | . | 55 |
| Figure 2-19 | Magnitude and group delay response — designed 2-D filter | . | . | . | 56 |
| Figure 2-20 | Filter coefficients — designed 2-D filter | . | . | . | 57 |
| Figure 2-21 | PE realization in analog form | . | . | . | 59 |
| Figure 2-22 | SPICE model for the PE of Figure 2-20 | . | . | . | 60 |
| Figure 2-23 | SPICE netlist for single PE simulation using idealized op-amp model | . | . | . | 61 |
| Figure 2-24 | SPICE graphics plots for single PE simulation | . | . | . | 65 |
| Figure 2-25 | SPICE netlist for single PE simulation using CLC 400 macro-model | . | . | . | 66 |
| Figure 2-26 | SPICE netlist for Comlinear CLC400 op-amp macro-model | . | . | . | 67 |
| Figure 2-27 | SPICE model of the 2 x 2 semi-systolic structure | . | . | . | 70 |
| Figure 2-28 | Legend of symbols used in SPICE simulation model of Figure 2-26 | . | . | . | 71 |
| Figure 2-29 | SPICE simulation of the 2-D sample-and-hold semi-systolic filter structure | . | . | . | 75 |
| Figure 2-30 | High-speed logarithmic converter circuit | . | . | . | 81 |</p>
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2-31</td>
<td>High-speed antilog circuit</td>
<td>81</td>
</tr>
<tr>
<td>2-32</td>
<td>Insertion of filter prototype into TV receiver circuitry</td>
<td>82</td>
</tr>
<tr>
<td>2-33</td>
<td>Original broadcast TV image. (b) Image homomorphically filtered by prototype</td>
<td>83</td>
</tr>
<tr>
<td>2-34</td>
<td>(a) Line delays on dual delay CCD boards. (b) Homomorphic filter prototype.</td>
<td>84</td>
</tr>
<tr>
<td>2-35</td>
<td>Cost of realization - digital vs. analog components.</td>
<td>86</td>
</tr>
<tr>
<td>3-1</td>
<td>Raster scanned signal as a 2-D semi-discrete-time signal</td>
<td>89</td>
</tr>
<tr>
<td>3-2</td>
<td>A 2-D analog filter realization</td>
<td>91</td>
</tr>
<tr>
<td>3-3</td>
<td>2-D recursive 1 x 1 filter realization</td>
<td>95</td>
</tr>
<tr>
<td>3-4</td>
<td>Fast Inverter</td>
<td>98</td>
</tr>
<tr>
<td>3-5</td>
<td>Fast summing amplifier</td>
<td>98</td>
</tr>
<tr>
<td>3-6</td>
<td>Fast integrator</td>
<td>99</td>
</tr>
<tr>
<td>3-7</td>
<td>Sync separator circuit</td>
<td>100</td>
</tr>
<tr>
<td>3-8</td>
<td>SPICE model of the 2 x 2 semi-systolic structure based on the</td>
<td>105</td>
</tr>
<tr>
<td></td>
<td>2-D hybrid transfer function</td>
<td></td>
</tr>
<tr>
<td>3-9</td>
<td>Function block SPICE models for PE types I and II</td>
<td>106</td>
</tr>
<tr>
<td>3-10</td>
<td>SPICE netlist for simulation of 2-D IIR semi-systolic</td>
<td>107</td>
</tr>
<tr>
<td></td>
<td>hybrid filter structure</td>
<td></td>
</tr>
<tr>
<td>3-11</td>
<td>SPICE PE (type I) simulation model - 2-D hybrid filter</td>
<td>110</td>
</tr>
<tr>
<td>3-12</td>
<td>SPICE PE (type I) netlist</td>
<td>111</td>
</tr>
<tr>
<td>3-13</td>
<td>2-D hybrid analog filter prototype (photo)</td>
<td>113</td>
</tr>
<tr>
<td>3-14</td>
<td>Insertion of 2-D hybrid analog filter prototype into TV luminance channel</td>
<td>114</td>
</tr>
<tr>
<td>3-15</td>
<td>Image filtered with 2-D hybrid analog filter prototype</td>
<td>115</td>
</tr>
<tr>
<td>4-1</td>
<td>Block diagram representation of a 2-D semi-systolic structure</td>
<td>121</td>
</tr>
<tr>
<td></td>
<td>(M x N = 2 x 2)</td>
<td></td>
</tr>
<tr>
<td>4-2</td>
<td>PE symbol and its block diagram realization</td>
<td>121</td>
</tr>
<tr>
<td>4-3</td>
<td>Serial-Parallel-Serial configuration for 1H line delay</td>
<td>124</td>
</tr>
<tr>
<td>4-4</td>
<td>Analog feedback-readout capacitor memory for use in SPS type 1H line delay</td>
<td>124</td>
</tr>
<tr>
<td>4-5</td>
<td>A CMOS folded-cascode one stage op-amp</td>
<td>126</td>
</tr>
<tr>
<td>4-6</td>
<td>SC PE realization</td>
<td>128</td>
</tr>
<tr>
<td>4-7</td>
<td>2 x 2 semi-systolic structure with dual broadcast lines</td>
<td>130</td>
</tr>
<tr>
<td>4-8</td>
<td>Switched-capacitor addition of partial results (2 x 2 case)</td>
<td>131</td>
</tr>
</tbody>
</table>
INTRODUCTION

1-1 BACKGROUND

In the development of new receivers for television, more advanced signal processing techniques will be implemented in the circuitry of the video processing section. It has been proposed that such advanced video signal processing can be carried out by high-speed two-dimensional (2-D) spatial filtering or by so-called motion adaptive temporal filtering [1]. Newer receivers will display pictures that have higher pixel resolution. For what has been termed High Definition Television (HDTV), if the processing is done in real-time, that is, at the same rate as the effective sampling rate of the picture, processing rates in excess of 40 million pixels/sec would be required (e.g. 1000 lines/field x 50 fields/sec, 5:3 aspect ratio, interlaced).

In the past analog signal processing techniques have been based on one-dimensional time domain approaches implemented as simple Finite Impulse Response (FIR) signal processing structures. These have been quite limited as to the type of signal processing and enhancement operations that could be performed by the characteristic type of apparatus resulting from that approach.

More sophisticated signal processing is possible by means of the direct application of techniques for filtering 2-D data, known from the field of Mathematical Image Processing Theory. Various techniques for the enhancement of digital images are usually implemented by
programming a general purpose computer to operate on stored digitized images off-line. These 2-D filtering techniques can be implemented in real-time by means of dedicated hardware [2]. Due to hardware constraints, conventional designs of dedicated 2-D filters have favored FIR filters even though IIR (Infinite Impulse Response) filters are known to be more efficient allowing lower order realizations than their FIR counterparts.

Considerations of hardware complexity, physical size of apparatus, power consumption, and economical manufacture are all of vital importance in any practical signal processing apparatus intended for use in consumer products.

The present thesis introduces dedicated hardware for real-time 2-D filtering which is based on a semi-systolic analog structure realization. The hardware realizations considered are based directly on the 2-D digital transfer function \( H(z_1,z_2) \), and 2-D hybrid transfer function \( H(z,s) \) and are economical and efficient. Both IIR and FIR filtering structures can be realized with the approaches considered and high real-time rates can be attained through the use of analog components that have inherently small delay times. The development of systolic filtering architectures offers advantages for VLSI implementation such as modularity, regularity, and high parallelism. Even greater amenability to VLSI implementation along with high accuracy are obtained through the application of switched-capacitor circuit techniques to these architectures.

Recently, motion adaptive digital filters have been proposed for use in high-definition television video signal processing. They require delays of one or more field periods, such delays being accomplished by means of frame stores. Since pixels in separate fields are combined, this type of signal processing is referred to as temporal filtering [1] and can only be
performed on those pixels for which no motion in the scene of the picture being displayed has occurred between fields.

The approaches considered in the present thesis do not require analog to digital (A/D) and digital to analog (D/A) converters, often used in conjunction with analog pre-filters and post-filters, to convert video signal data from analog raster scanned form to sampled digital data for processing; nor do they require expensive frame stores or motion detection circuitry.

Other methods of performing real-time 2-D signal processing have been based on elaborate algorithms such as the Burt Pyramid [3] which separates an image into a number of 2-D spatial frequency bandpass images. This method, although capable of real-time operation, has the disadvantage of greater complexity and correspondingly higher cost, relative to the methods considered in the present dissertation, due to the need to process multiple bands and to generate a set of component images. When this approach is applied using digital hardware, a large amount of circuitry is required along with the need for A/D, and D/A converters.
1-2 LITERATURE SURVEY

In this chapter we introduce the basic concepts of 2-D filter realization from a literature survey.

1-2.1 FUNDAMENTAL CONCEPTS

A 2-D FIR (finite impulse response) filter of general order $M \times N$ is given by the transfer function

$$H(z_1, z_2) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j} \quad (1-1)$$

where $\{a_{ij}\}, \{b_{ij}\}$ are the filter coefficients, and

$$H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} \quad (1-2)$$

is the ratio of output to input in the z-domain. If the upper limits of the summations in equation (1-1) are both equal to $N$, then the input and output data are square arrays.

Throughout this paper, the z-transform of a 2-D sequence $x(m,n)$ is defined by

$$X(z_1, z_2) = \sum_{m=0}^{\infty} \sum_{n=0}^{\infty} x(m, n) z_1^m z_2^n \quad (1-3)$$

A 2-D IIR (infinite impulse response) filter of order $M \times N$ has the transfer function

$$H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j}}{1 + \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} z_1^{-i} z_2^{-j}} \quad (1-4)$$

The recursive equation governing the relationship of the input $x$ to the output $y$ is given by

$$y(m, n) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} x(m-i, n-j) - \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} y(m-i, n-j) \quad (1-5)$$
A 2-D filter configuration can be conveniently represented in a block-diagram form, the basic elements of which are the constant multipliers and adders (similar to the 1-D case), and two distinct types of delay elements with transfer functions $z_1^{-1}$ and $z_2^{-1}$, respectively. These are depicted in Figure 1-1. Figure 1-2 illustrates the nature of the 2-D filtering process as governed by equation (1-5) for the case $N=M=2$. The input mask $a_i$ is superimposed on the input image and the addition of the product between the mask and the underlying samples gives the 2-D convolution. The output mask $b_i$ is similar to the input except for the missing element in correspondence to the output sample to be computed.
Figure 1-1

The basic elements for signal processing

(a) multiplication by a scalar
(b) addition
(c) line delay
(d) sample (pixel) delay
Figure 1-2

Input and output masks for 2nd order filter
1.2.2 DIRECT REALIZATIONS

A direct realization of an IIR filter from the transfer function follows from the algorithm given by Shanks et al. [4]. There it is shown that equation (1-4) can be written as

\[
Y(z_1,z_2) = \left[ \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j} \right] X(z_1,z_2)
- \left[ \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} z_1^{-i} z_2^{-j} \right] Y(z_1,z_2)
\]  

(1-6)

An example of the direct form realization for the 2 x 2 case is shown in Figure 1-3.

Another direct realization is the 2-D Canonic form (Figure 1-4), which was first introduced in the literature by Mitra et al. [5]. An alternate 2-D Canonic form which is referred to here as type II is shown in Figure 1-5. These structures require only two line delays (denoted \( z_1^{-1} \)) for a 2 x 2 filter.
Figure 1.3

A direct form realization
Figure 1-4

A direct form 2-D Canonic realization (2 x 2) case – type I
Figure 1-5

An alternate direct form 2-D Canonic realization (2 x 2) case — type II
1-2.3 REALIZATION FROM CONTINUED AND PARTIAL FRACTION EXPANSION

In the field of 1-D signal processing, continued fraction and partial fraction expansion realizations are well known. In some instances, it is possible to extend these realizations to the 2-D case, as discussed in [5] and [6,7]. Both of these approaches are limited due to the fact that an arbitrary transfer function of a 2-D recursive filter cannot be realized, but certain restrictions apply. A continued fraction form requires a number of strong relationships to be satisfied among the design coefficients. Also a given continued fraction expansion must be checked for existence [5]. A partial fraction expansion will exist for the case of a denominator separable 2-D digital filter transfer function (of order \(M \times N\)):

\[
H(z_1,z_2) = \frac{Y(z_1,z_2)}{Q(z_1)} = \frac{\sum_{i=0}^{M} \sum_{j=0}^{M} a_{ij} z_1^{-i} z_2^{-j}}{\sum_{i=0}^{M} q_{1i} z_1^{-i} \sum_{i=0}^{M} q_{2i} z_2^{-j}} 
\]

(1-7)

However, in general, 2-D systems have no partial fraction expansion since the fundamental theorem of algebra does not apply as in the 1-D case.
1-2.4 ARCHITECTURES FOR 2-D FILTERING

A number of realization structures appearing in the literature are described as "architectures" because they are optimal in terms of some desired properties such as processing speed, efficiency, and modularity.

1-2.4.1 HIGH SPEED DELAYED MULTIPATH 2-D DIGITAL FILTERING

This type of structure is a delayed multipath realization for 2-D recursive (and non-recursive) filters due to Kwan and Hirano [8]. This work overcomes the disadvantages of those techniques which involve the application of the decomposition theorems to the 2-D transfer function as in [9], and those using LU triangular decomposition of the matrix coefficients of the 2-D polynomials of the transfer function [10].

In the implementation of these structures (given in [8]) so-called N1-path structures are used to implement the sub-blocks. The multiplex switching which is used in the implementation of the sub-blocks is not strictly necessary and unnecessarily complicates the discussion. Thus we will continue to view realizations only in terms of the fundamental blocks as given in Section 1-2.1, i.e. adders, multipliers, and delays. The method can be used to implement FIR or IIR 2-D filters, so we can discuss the implementation of IIR filters with the FIR being a specific case in which the denominator of the transfer function is equal to one (no poles).

Consider again the transfer function of a 2-D IIR digital filter (equation (1-4). In general, it can be decomposed in terms of sub-blocks $A_y$, $B_y$ as follows:
\[ H(z_1, z_2) = \frac{\sum_{j=0}^{N_2-1} z_2^{-jL_2} \sum_{i=0}^{N_1-1} z_1^{-i} A_{ij}(z_1^{-N_1}, z_2^{-1})}{\sum_{j=0}^{N_2-1} z_2^{-jL_2} \sum_{i=0}^{N_1-1} z_1^{-i} B_{ij}(z_1^{-N_1}, z_2^{-1})} \]

(1-8)

where

\[ A_{ij}(z_1^{-N_1}, z_2^{-1}) = \sum_{k=0}^{L_1-1} \sum_{l=0}^{L_2-1} a(kN_1 + i, l + jL_2) z_1^{-kN_1} z_2^{-l} \]

\[ B_{ij}(z_1^{-N_1}, z_2^{-1}) = \sum_{k=0}^{L_1-1} \sum_{l=0}^{L_2-1} b(kN_1 + i, l + jL_2) z_1^{-kN_1} z_2^{-l} \]

\[ M_i + 1 = N_i \times L_i \text{ for } i = 1, 2 \]

\[ \overline{M_i} + 1 = \overline{N_i} \times \overline{L_i} \text{ for } i = 1, 2 \]

(1-9)

\[ 1 \leq N_2 \leq M_2 + 1 \]

\[ 1 \leq \overline{N_2} \leq \overline{M_2} + 1 \]

\[ 1 < N_1; \quad 1 < L \]

\[ 1 < \overline{N_1}; \quad 1 < \overline{L_1} \]

Let us develop the realization of a 5 x 5 FIR filter by this method (as an example).

If we choose \( N_2 = 3, N_1 = 2, L_2 = 2, L_1 = 3 \), then (1-8) (with denominator = 1) becomes

\[ H(z_1, z_2) = \sum_{j=0}^{2} z_2^{-2j} \sum_{i=0}^{1} z_1^{-i} A_{ij}(z_1^{2}, z_2^{-1}) \]

\[ = \left[ A_{00}(z_1^{2}, z_2^{-1}) + A_{10}(z_1^{2}, z_2^{-1}) \right] + z_2^{-2} \left[ A_{01}(z_1^{2}, z_2^{-1}) + A_{11}(z_1^{2}, z_2^{-1}) \right] \]

\[ + z_2^{-4} \left[ A_{02}(z_1^{2}, z_2^{-1}) + A_{12}(z_1^{2}, z_2^{-1}) \right] \]

(1-10)
where each sub-block $A_{ij}$ is given by

$$A_{ij} = \sum_{k=0}^{2} \sum_{i=0}^{1} a(2k + i, l + 2j) z_1^{-2k} z_2^{-l}$$

$$= \sum_{k=0}^{2} z_1^{-2k} \sum_{i=0}^{1} a(2k + i, l + 2j) z_2^{-l}$$

for example,

$$A_{02} = (a_{04} + a_{05} z_2^{-1}) + z_2^{-2} (a_{24} + a_{25} z_2^{-1}) + z_2^{-4} (a_{44} + a_{45} z_2^{-1})$$

The block diagram realization of (1-10) is shown in Figure 1-6, while the realization of the general sub-block $A_{ij} (z_1^{-2}, z_2^{-1})$ is shown in Figure 1-7.
Figure 1-6

Realization of 5 x 5 FIR filter example — High Speed Delayed Multipath 2-D
Figure 1-7

Realization of the general sub-block $A_y(z_1^{-2}, z_2^{-1})$
1-2.4.2 2-D SYSTOLIC REALIZATIONS

Due to the recent advances in VLSI technology, it has become highly desirable to develop digital filter realizations in such a way that the repetitive use of common structures (i.e. modules) is maximized and the number of "random" connections is minimized in order to reduce the design effort and create producible integrated circuits. Systolic structures also have a speed advantage over other known forms of realization, due to the high degree of parallelism.

Kung [11] has given the following definition of a systolic system: "A systolic system is a network of processors that rhythmically compute and pass data through the system." Systolic arrays may be thought of as networks of (ideally) locally interconnected processing elements (PE's) in which each PE regularly "pumps" data in and out, each time performing some short computation involving the data input to the PE and local variables associated with storage within the PE, in order that a regular flow of data is kept up in the network." In general, systolic array processing, has applications to many computationally intensive problems. The wide range of applications includes not only signal and image processing, and matrix arithmetic, which involve numeric computations, but even includes symbolic tasks such as searching and sorting, graph theoretical algorithms, and relational database processing.

As an introduction to the method of systolic array computation, let us consider a simple example: Suppose we are given the following polynomial

\[ P(x) = C_m x^m + C_{m-1} x^{m-1} + \ldots + C_0 \]  

(1-13)
which is to be evaluated at points $x_i$ in the interval $[1,n]$. Using Horner's rule, the formula for $P(x)$ becomes

$$P(x) = (((C_m x + C_{m-1}) x + \cdots + C_1) x + C_0$$

(1-14)

A possible systolic implementation is shown in Figure 1-8. The coefficients are stored in individual "cells" or processing elements (PE's). (We can think of the coefficients as the local variables of the computation.) The $x_i$ (inputs) and $p_i$ (partial results) data "flow" through the PE's. On every clock cycle, each PE receives as inputs $x_i$ and $p_{in}$, multiplies them and adds its internal coefficient $C_i$, producing an output $p_{out}$, while passing through $x_i$ unchanged. Each result $P(x_i)$ appears at the output (the rightmost PE's) $m$ clock cycles after $x_i$ is input at the leftmost PE.

![Diagram of systolic implementation]

**PE SYMBOL:**

\[
\begin{align*}
P_{in} & \rightarrow C_i & P_{out} \\
X_{in} & \rightarrow & X_{out}
\end{align*}
\]

\[P_{out} := P_{in} \cdot X_{in} + C_i\]

\[X_{out} := X_{in}\]

**Figure 1-8**

Systolic implementation of polynomial evaluation
We see from this example, that a systolic array processor, unlike a von Neumann type machine, does not require a data item to be retrieved from a central memory store every time it is used. Thus a systolic array processor is said to have low external memory bandwidth.

Systolic processing offers many additional advantages. The modularity of the design, involving only a few simple PE types, used repetitively, as well as the local pattern of interconnection between PE’s, simplifies circuit layout, and is cost-effective. The low fanout in a systolic array allows the output of signal line drivers to be independent of the number of cells in the array. Moreover, systolic architectures can be readily scaled up to handle larger problems, a property of considerable importance in the case of implementing filters of arbitrary order.

A representative example of a systolic architecture that can implement either IIR or FIR filters and which makes use of a feature known as “broadcasting” i.e. data is distributed to several PE’s in parallel via a single take-off point, is the one developed by Sid-Ahmed in [12, 13]. The Sid-Ahmed structure, which can be realized directly in terms of the transfer function coefficients, is shown in Figure 1-9. A detailed derivation of this structure is given in Chapter 2. Note that in this figure one of the line delays is denoted by $z_i^{-1}$ (with an asterisk), due to the fact that the line delay period is shortened by one sample time. This is sometimes alternatively denoted $z_i^{-1} z_2$.
2-D systolic architecture (Sid-Ahmed [12])
Other systolic architectures for 2-D filtering are described in the literature. In [14] two
other architectures are presented, which are both based on the same type of PE as [12]. These
are shown in Figure 1-10. In general, systolic architectures will differ in:

1) Number of adders and multipliers needed

2) Critical path length

3) Number of storage elements, \( z_1^{-1} \) (line delay), \( z_2^{-1} \) (sample delay)

4) Speed-up factor (SUF)

and

5) latency.

For real-time image processing it is highly desirable for a 2-D filter architecture to
have a short critical path length. The critical path is the longest delay-free arithmetic path, i.e.
the signal path with the longest arithmetic operation time, leading from the output of any delay
element or the system input to the input of any delay element or the system output. A short
critical path length allows a systolic filter to operate at a high clock rate, and thus a high
real-time throughput rate. In general, the minimum achievable path length for a 2-D filter is
1-multiply and 1-add time (1m-1a). Assuming two-input additions, the critical path length of
the architectures shown in Figure 1-10 is \( T_m + 2T_a \), (where \( T_m \), \( T_a \) are multiplication
and addition times respectively) and is

\[
(1-15)
\]

for the architecture in Figure 1-9. The situation improves in the case of analog component
realization of the latter architecture, in which multi-input additions can be done.
Figure 1-10

2-D systolic architectures given in [14]
The SUF is a measure used to compare the speed efficiencies of systolic arrays. It is defined \[15\] as

\[
\text{SUF} = \frac{\text{Processing Time in a PE}}{\text{Processing Time in the Array Processor}}
\]  

(1-16)

For the purposes of comparison it is usually assumed that all adders are two operand adders. Using this assumption we would find that the structures in Figure 1-10 both have SUF=1.0, while that of Figure 1-9 has an SUF given by

\[
\text{SUF} = \frac{T_w + T_a}{\max((T_w + T_a), T_a \log_2(N+1))}
\]  

(1-17)

This penalty in SUF, incurred by the latter, can be avoided in the case of analog component realization utilizing multi-input adders, as will be seen in Chapter 2.

Latency is defined as the time interval separating the appearance of an input sample on the input port from the appearance of the corresponding output sample at the output port. The architectures shown in Figures 1-9 and 1-10 all have a latency of one sample period.
REALIZATION OF 2-D IIR FILTERS USING SAMPLE-AND-HOLD TECHNIQUES

2-1 INTRODUCTION

In this chapter we introduce systolic realization of 2-D systolic filters for real time applications. The structures which we will consider can implement either IIR or FIR filters. Since the FIR realizations follow directly as special cases of the IIR realizations we consider only the latter. The structures presented here make use of a feature known as "broadcasting" i.e. data is distributed to several PE's in parallel via a single take-off point. These structures also can be realized directly in terms of the transfer function coefficients as opposed to the local state space approach [16]. The hardware development and SPICE simulation of structures based on the semi-systolic architecture proposed by Sid-Ahmed [13] are presented.

This development culminates in the production of a hardware prototype that has been applied to the real time filtering of video rate broadcast images in a representative application, namely that of the non-linear homomorphic filtering approach of Oppenheim et al. [18].

In addition to hardware development and prototyping, a new approach to SPICE simulation of discrete time filtering based on functional block simulation [19] is introduced. This approach exploits the modularity of systolic structures to reduce the complexity of the simulation process. Hence a direct simulation with actual 2-D data input can readily be done in
the time domain, eliminating the need to introduce z-domain equivalent circuits as has been
done for 1-D filter simulation in [20].

In general, a 2-D recursive digital filter transfer function of order M x N is given by

\[ H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j}}{1 + \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} z_1^{-i} z_2^{-j}} \]  \hspace{1cm} (2-1)

where \( \{a_{ij}\} \) and \( \{b_{ij}\} \) are filter coefficients obtained from a design procedure that determines
the coefficients according to some given criteria to approximate the desired frequency response
of \( H(z_1, z_2) \) (i.e. high-pass, low-pass, etc.).

Corresponding to equation (2-1) in the time domain is the so called recursive equation which
relates the output sample of the filter \( y(m,n) \) to the input sample \( x(m,n) \) occurring at the same
point in time.

\[ y(m,n) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} x(m-i, n-j) - \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} y(m-i, n-j) \]  \hspace{1cm} (2-2)

Here we have made the assumption of a causal system so that the output signal \( y(m,n) \) is
written as a function of only the present and past values of the input and output signals \( x(m,n),
\ y(m,n) \) respectively. We can write equation (2-2) in the z-domain as

\[ Y(z_1, z_2) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} X(z_1, z_2) z_1^{-i} z_2^{-j} - \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} Y(z_1, z_2) z_1^{-i} z_2^{-j} \]  \hspace{1cm} (2-3)

Multiplying both sides of this equation by \( z_2^{-1} \) (thus introducing one pixel of latency in the
output) and rearranging yields

\[ z_2^{-1} Y(z_1, z_2) = \sum_{i=0}^{M} z_2^{-1} Y_i(z_1, z_2) \]  \hspace{1cm} (2-4)
where

\[ z_2^{-1} Y_0(z_1, z_2) = z_2^{-1} \{ [a_{00} X(z_1, z_2) \] \\
+ z_2^{-1} ([a_{01} X(z_1, z_2) - b_{01} Y(z_1, z_2)] \\
+ z_2^{-1} ([a_{02} X(z_1, z_2) - b_{02} Y(z_1, z_2)] \\
+ z_2^{-1} (\cdots \\
+ z_2^{-1} ([a_{0M} X(z_1, z_2) - b_{0M} Y(z_1, z_2)]) \cdots) \} \] (2-5)

and

\[ z_2^{-1} Y_i(z_1, z_2) = z_2^{-1} \{ [a_{i0} X(z_1, z_2) z_1^{-i} - b_{i0} Y(z_1, z_2) z_1^{-i}] \\
+ z_2^{-1} ([a_{i1} X(z_1, z_2) z_1^{-i} - b_{i1} Y(z_1, z_2) z_1^{-i}] \\
+ z_2^{-1} ([a_{i2} X(z_1, z_2) z_1^{-i} - b_{i2} Y(z_1, z_2) z_1^{-i}] \\
+ z_2^{-1} (\cdots \\
+ z_2^{-1} ([a_{iM} X(z_1, z_2) z_1^{-i} - b_{iM} Y(z_1, z_2) z_1^{-i}] \cdots) \} \] (2-6)

for \( i=1, \ldots, M \)

The realization of the partial result \( z_2^{-i} Y_i(z_1, z_2) \) based on equation (2-6) is shown in Figure 2-1. Shown below in this figure is the general block diagram symbol for a processing element (PE).

Figure 2-2 shows each PE consisting of two coefficient multipliers, an adder, and a pixel delay, connected as shown. Note that \( w_1 \) and \( w_2 \) correspond to some specific filter design coefficients in the set \( \{\{a_{ij}\}, \{b_{ij}\}\} \) that are required in a particular PE.

The overall 2-D semi-systolic filter structure of order \( M \times N \) based on equations (2-4), (2-5), and (2-6) is realized as shown in Figure 2-3.
In Figure 2-3, the output signal of the structure $Y(z_1,z_2)$ has been denoted with an asterisk because the output signal is produced with one data sample (or pixel time) of latency and is actually $z_2^*Y(z_1,z_2)$ which corresponds to $y(m,n-1)$ in the time domain rather than $y(m,n)$. To remove the effects of this latency on the feedback, the first line delay in the feedback path must, in practice, be shortened by one pixel (sample) time and is thus denoted $z_1^{-1}$. Also, because a latent (by one pixel time) signal is fed back, to the lowermost bank of PE's, the "b" coefficients occur there in the order 0, $-b_{02}$, $-b_{01}$, from left to right. In this figure, $z_1^{-1}$ is used to represent a row delay, which in the case of a raster scanned image is a delay of one line scanning period (63.5μs in the NTSC television standard).
Figure 2-1

Partial realization of $z^{-1}_2Y(z_1,z_2)$

Figure 2-2

Processing element (PE)
Figure 2-3

Realization of $M \times N$ IIR semi-systolic filter
We can alternatively derive the 2-D semi-systolic structure of Figure 2-3, without using Horner’s rule, by employing a modified version of the method proposed in [21].

Consider again the transfer function as a 2-D linear, causal shift-invariant recursive discrete-time filter (equation 2-1) for the case $M=N=2$. We can rewrite this as follows:

$$H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{2} f_i(z_2) z_1^{-i}}{\sum_{i=0}^{2} g_i(z_2) z_1^{-i}} \quad (2-7)$$

where

$$f_i(z_2) = \sum_{j=0}^{2} a_{ij} z_2^{-j} \quad (2-8)$$

and

$$g_i(z_2) = \sum_{j=0}^{2} b_{ij} z_2^{-j} \quad (2-9)$$

We can now think of the 2-D transfer function as being a 1-D transfer function in which the coefficients $f_i(z_2)$, $g_i(z_2)$ are not constant but instead functions of $z_2$. Without loss of generality, we can assume that $b_{00}=1$, (as we have been doing so far). Accordingly, let

$$g'_i(z_2) = \sum_{j=0}^{2} b_{ij} z_2^{-j} \quad \text{for } i = 1, 2 \quad (2-9)$$

and

$$g'_0(z_2) = \sum_{j=1}^{2} b_{0j} z_2^{-j}$$

in equation (2-7) above so that

$$H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{2} f_i(z_2) z_1^{-i}}{1 + \sum_{i=0}^{2} g'_i(z_2) z_1^{-i}} \quad (2-10)$$
Thus we obtain the 2-D difference equation

$$Y(z_1, z_2) = \sum_{i=0}^{2} f_i(z_2)z_1^{-i} X(z_1, z_2) - \sum_{i=0}^{2} g_i'(z_2)z_1^{-i} Y(z_1, z_2)$$

(2-11)

A 1-D realization based on equation (2-11) above is shown in Figure 2-4. In order to obtain a 2-D realization it remains to develop realizations of the "coefficient" functions, $f_i, g_i'$. This has been done for $f_i$ and $g_i'$ in Figure 2-5(a). The other $f_i$'s and $g_i'$'s are similarly realized. Note that the realizations of $f_i (z_2)$ and $g_i'(z_2)$ can be combined into one block, by means of block diagram transformations, as shown in Figure 2-5(b). A grouping of elements to be regarded as a PE is indicated by the dashed line in this figure.

As it now stands, the realization of Figure 2-6 to be presented in Section 2-2.1, has been derived. An extra delay block may be added to the last adder of Figure 2-5(b), to obtain full modularity so that all PE's are identical, resulting in the structure of Figure 2-3 , for the 2 x 2 case. However, as before, this results in a latency of one pixel delay in the output and necessitates a rearrangement of the $b_0$ coefficients in the lowermost bank of PE's.
Figure 2-4

A 1-D realization
Figure 2-5

"Coefficient" function realizations

(a) Realizations of $f_i(z_2)$ and $g_i'(z_2)$
(b) Realizations of (a) combined
2-2 DETAILS OF HARDWARE DESIGN WITH APPLICATION TO HOMOMORPHIC FILTERING

2-2.1 2-D SEMI-SYSTOLIC FILTER

The realization as described in [13] requires line delays of differing lengths. For practical implementation using standard commercially available components, the structure described in Figure 2-3 was modified to use two types of PE instead of only one type throughout and the resulting revised structure is shown in Figure 2-6 (2 x 2 case). Type I PE's are used everywhere except at the end of each bank of PE's, where a type II PE is used instead. Both PE types are shown in Figure 2-7. Note that the type II PE is a modified version of the type I PE in which the pixel delay element denoted $z^{-1}$ is excluded. Having two types of PE in the structure allows the use of standard length line delays. Since the end of stage PE's do not contain storage elements, the latency of one pixel time in processing is removed, so that the feedback connections to the lowermost bank of PE's (i.e. the bank whose PE's contain the local variables $b_{01}$, $b_{02}$) are different.

Another change which was made for greater ease of construction with off-the-shelf components is the introduction of dual broadcast data lines, allowing either non-inverted or inverted data to be sent to each PE. (Note that in the literature on systolic structures, lines through which PE's receive the same data in parallel are called broadcast lines).

Given the high degree of modularity in systolic structures, a practical hardware realization of the structure will result if hardware is designed for the PE's and line delays (and adder for partial results).
Figure 2-6

2 x 2 semi-systolic filter structure for prototype
Figure 2-7

The two types of PE for use in practical hardware prototype.

(a) Type I PE

(b) Type II PE
2-2.2 DESIGN OF 1H DELAY LINE

As is seen in Figure 2-6, a 2 x 2 IIR filter structure requires four line delay elements, which are denoted $z^{-1}$ in the z-domain. When processing a raster scanned signal, these line delays correspond to one horizontal line scanning period (1H) which is 63.5μs in the NTSC system. A 1H delay was designed using an electronic charge-coupled device (CCD) type analog shift register as shown in the block diagram of Figure 2-8. The device used was the Fairchild-Weston CCD321, which is intended specifically for NTSC applications [22]. The CCD device has an insertion loss of 0 dB compared to $>6$ dB for glass block devices. Glass block devices have the disadvantage of requiring the modulation of a carrier due to bandwidth limitations. The delay time of analog signals through the CCD is precisely controlled by clock signals, which are two-phase symmetrical square wave signals derived from a crystal oscillator signal by means of clock driver circuits.

The type of clock driver circuits used will be a function of the type of CCD chosen and are typically based on TTL or CMOS family integrated circuit devices. For the Fairchild-Weston CCD321, which has a charge injection port at its input and a sample-and-hold circuit in its output amplifier section, the two-phase system of clocks is applied to the device to effect charge injection at the input as well as interstage charge transport and clocking of the internal sample-and-hold circuit.

A sample-and-hold device in the output stage of a CCD offers the advantage of reducing clock frequency feedthrough components in the output signal. Any of these undesirable frequency components that remain in the output may be further suppressed by a 5 MHz low-pass filter circuit.
Figure 2-8

Block diagram of CCD type 1H line delay
Figure 2-9

Block Diagram of CCD321 IC

(continued next page)
## PIN NAMES

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\phi_{1A}, \phi_{1B}$</td>
<td>Analog Shift Register Transport Clocks</td>
</tr>
<tr>
<td>$\phi_{2A}, \phi_{2B}$</td>
<td>Input Sampling Clocks</td>
</tr>
<tr>
<td>$\phi_{RA}, \phi_{RB}$</td>
<td>Output Sample and Hold Clocks</td>
</tr>
<tr>
<td>$V_2$</td>
<td>Analog Shift Register DC transport Phase</td>
</tr>
<tr>
<td>$V_{1A}, V_{1B}$</td>
<td>Analog Inputs</td>
</tr>
<tr>
<td>$V_{RA}, V_{RB}$</td>
<td>Analog Reference Inputs</td>
</tr>
<tr>
<td>$V_{OA}, V_{OB}$</td>
<td>Analog Outputs</td>
</tr>
<tr>
<td>$V_{DD}$</td>
<td>Output Drain</td>
</tr>
<tr>
<td>$V_{GG}$</td>
<td>Signal Ground</td>
</tr>
<tr>
<td>$V_{SS}$</td>
<td>Substrate Ground</td>
</tr>
</tbody>
</table>

Figure 2-9 (continued)
2-2.2.1  THE CCD IC

The block diagram of the CCD 321 IC is shown in Figure 2-9. (A legend of pin names is also given.) It contains two identical 455 bit analog shift registers A and B. (In the literature it is conventional to refer to each register stage as a "bit" whether the samples are analog or digital.) Each of the shift registers A (or B respectively) has a charge injection port and an output amplifier.

i) Charge Injection Port

A charge packet, linearly dependent on the voltage at the input $V_{in}$, with respect to the reference voltage $V_r$ is injected into the analog shift register. Charge packet injection occurs with each activation of the sampling clock $\phi_{sa}$.

ii) 455 Bit Analog Shift Register

Charge packets arriving at the injection port are transported successively from one bit to the next. A shift occurs for each cycle of the transport clock $\phi_t$. Note that channels A and B have independent transport clocks $\phi_{ta}$ and $\phi_{tb}$. For a given clock frequency $f$, a charge packet arriving at the injection port will reach the output amplifier after a delay time $T$, where $T = 455/f$.

iii) Output Amplifier

Each output amplifier consists of three source follower stages with constant current source bias. A sample and hold transistor is located between the second and third stage of the amplifier. The sample and hold transistor is clocked by $\phi_h$ to obtain a continuous output.
waveform $V_{oA}$ which is a delayed version of the input waveform (insertion loss is 0 dB). $V_o$ will contain sampling clock feedthrough that can be removed by low-pass filtering.

2-2.2.2 MULTIPLEXED VS. SERIAL MODE OPERATION

To provide a 1H delay with 910 sample resolution, the two 455 bit shift registers can be cascaded with each register operated at the same clock frequency $f$. In this case

$$T = 1H \text{ period } = 63.5\mu s = 910 / f.$$ 

Thus serial mode requires a 14.31818 MHz clock.

Alternatively a 910 sample resolution 1H delay can be obtained by multiplexed mode operation. Two 455 bit shift registers are connected in parallel. Alternate samples of the common input are carried by registers A and B, and combined at the output. Multiplex mode requires a clock frequency

$$f = 455 / 63.5\mu s = 7.159 \text{ MHz}$$

but has an effective sampling rate of 14.31818 MHz.

Since each clock input is a 30pF load, the performance of the clock driver circuits is less critical for multiplex mode. Also layout and shielding requirements to contain the high frequency harmonic radiation caused by clock driver switching are eased. In the present design multiplex mode was used.

2-2.2.3 CLOCK DRIVER CIRCUIT

The clock driver circuitry of Figure 2-10 was designed to provide the waveforms needed for multiplexed mode operation, given in the timing diagram of Figure 2-11. The clock drivers
provide two-phase symmetrical waveforms to the CCDs at an amplitude of 15Vpk at 7.159Mhz. (Note that \( \phi_{sa} \) and \( \phi_{sb} \) are drawn with less than 50% duty cycle in Figure 2-11, however a 50% duty cycle will suffice.) The crystal oscillator produces a square wave at TTL levels, which is buffered and then divided by two at the D flip-flop where the Q and \( \bar{Q} \) outputs provide antiphase signals. Each of these signals is in turn converted to a 15Vpk level by 7406 type open collector inverters with 470 \( \Omega \), 1/4W pull-up resistors. Rise and fall times are in the order of 10ns. If rise and fall times are made too fast, negative transients below ground may cause charge injection from the substrate to the shift registers. This injection can be eliminated with a negative bias on \( V_{ss} \) (-2.0 to -5.0V) with respect to signal ground, \( V_{gg} \).
Figure 2-10

Clock driver circuit for multiplex CCD
Figure 2-11

Timing diagram for the CCD IC in the multiplexed mode
2-2.2.4 INPUT COUPLING CIRCUIT AND BIASING

As indicated in Figure 2-12, inputs to the CCD are attenuated to a level of 400 - 500mVpk and AC coupled. (Inputs must be less than 1V pk-pk for distortionless operation.) The DC bias level at an input is established by a resistive divider (approx. 1.5 VDC). The reference voltage inputs $V_{RA}$ and $V_{RB}$ are set to DC levels by the same type of resistive divider (approx. 4.5 VDC).

Since signal charge injection is proportional to the difference between $V_1$ and $V_R$, adjustment of either $V_1$ or $V_R$ is necessary to assure proper operation.

2-2.2.5 OUTPUT COUPLING CIRCUIT

This circuit recombines the multiplexed outputs into one signal. Outputs $V_{OA}$ and $V_{OB}$ are AC coupled to the summing node of a summing amplifier (Figure 2-13). The amplifier shown is a high speed operational amplifier (LM318). A variable amount of gain is provided by adjusting $R_T$. This compensates for the attenuation introduced by the input coupling circuit. $R_2$ and $R_1$ must be as closely matched as possible so that equal gain is given to each CCD output.

2-2.2.6 LOW PASS FILTER

An optional low pass filter with a zero in its frequency response at 14.3 MHz may be used to remove any clock signal component remaining in the summed output. The -3dB point of this filter is chosen to be somewhat above the highest video frequency to ensure that its presence causes no amplitude or phase changes within the video bandwidth. An example of
such a filter designed with discrete components is shown in Figure 2-14. The 39\(\mu\)H inductors can be obtained in packages comparable in size to a 1/4 W carbon composition resistor. (LC filters are commonly used in video circuitry and are practical because of the small size of the inductors required at video frequencies.) In the case in which sampled data signals are being processed the optional LC filter is not required in the CCD circuitry and low pass filtering may be done exclusively at the filter output.
Figure 2-12

Input Coupling Circuit for Multiplex CCD

Figure 2-13

Output Coupling Circuit for Multiplexed CCD
Figure 2-14

Low Pass Filter - Removes CCD clock feedthrough
2-2.3 DESIGN OF PE

Type I and type II PE's, as defined in Figure 2-7, were constructed using high-speed op-amp circuits. This was done for the type I PE as shown in Figure 2-15. The op-amp is configured as an inverting weighted summer and drives a passive delay line with resistive termination. The circuitry for a type II PE is the same as for a type I PE except that it consists solely of the inverting summer op-amp circuit excluding the delay line.

With reference to Fig. 2-10, the summing amplifier configuration produces the sum \((-R_f/R_a)X_{IN} + (-R_f/R_b)Y'_{IN} - Y'_{IN}\). Comparing with Figure 2-7, \(w_1 = (-R_f/R_a)\) and \(w_2 = (-R_f/R_b)\).

For greater ease in setting coefficient values in the experimental prototype, \(R_a\) and \(R_b\) are variable resistors.

![Op-amp circuit for PE (type I shown)](image)

The op-amp used in the PE's of the prototype is an EL2020C (Elantec). This is a high-speed monolithic (50MHz - 3 dB bandwidth) current feedback type op-amp, specifically
designed for video and other applications that require high slew rates, fast settling times and low power consumption. In contrast with conventional op-amps, those such as the EL2020C that use the current feedback topology provide nearly constant bandwidth and settling time over a wide range of closed-loop voltage gains [23,24,25]. Higher performance (200 MHz - 3 dB bandwidth) current feedback op-amps specifically designed for HDTV and other comparable high-speed applications such as the EL400 (Elantec) are available in an 8 pin mini-DIP package that is pinout compatible with the EL2020C. These higher performance op-amps could be used instead of the EL2020C in Figure 2-15 for HDTV application.

A passive delay line is used to implement a pixel delay (denoted $z^{-1}$ in the z-domain). In the time domain, a signal $x(m,n)$ is delayed by one pixel time resulting in a delayed version of the signal $x(m,n-1)$. The passive delay line used in the PE’s of the prototype is a type 1514-70D (Data Delay Devices). Lumped constant delay lines of this type are widely used in a number of electronic applications [26]. They are generally designed using low-pass LC filters as basic units cascaded in stages having linear phase characteristics overall. The type 1514-70D delay line has a delay time $T_d = 70$ns and a characteristic impedance $z_0 = 250 \Omega$ and accordingly is terminated in a 250$\Omega$ resistor. This delay line is available in a four pin SIP package that does not take up much circuit board space. Lumped constant delay lines with a clock input and sample-and-hold output are available in 8 pin DIP packages (e.g. Reticon 5100 series) and can be used instead if desired. In the 2 x 2 filter prototype the delay lines used were found adequate to establish an accurate 70ns pixel delay. (The pixel delay time is 70ns for NTSC.)
2-3 HOMOMORPHIC FILTERING

An image \( f(x, y) \) is defined as a two-dimensional light intensity function of spatial coordinates \((x, y)\). In the illumination-reflectance model of an image, the illumination component \( i(x, y) \) due to the amount of source light incident on the scene being viewed and the reflectance component \( r(x, y) \) due to light reflected by objects in the scene are considered to vary independently, so that the image is formed multiplicatively, \( f(x, y) \) being expressed by

\[
f(x, y) = i(x, y) \cdot r(x, y)
\] (2-12)

Illumination is directly responsible for the dynamic range achieved by the pixels of an image while contrast is a function of the reflectance.

By means of homomorphic filtering, [18], as depicted in the block diagram of Figure 2-16, it is possible to process the illumination and reflectance components of an image separately. In this scheme, input data is logarithmically converted prior to performing a linear filtering operation (usually high pass filtering) and the result is converted back to the linear scale. In the present work the linear operation selected is a 2-D high frequency emphasis Butterworth (near linear phase) recursive filter of order 2 x 2. A cross-sectional plot of the 2-D magnitude response of this function is shown in Figure 2-17. Magnitude and group delay responses of the designed 2-D filter are plotted in Figure 2-18, while Figure 2-19 gives the coefficients of the designed filter. Group delay is essentially constant considering the scale used in the drawing.

To a first approximation we can assume as in [18] that the illumination component contains mostly low frequencies in its logarithm and that the logarithm of the reflectance component contains mostly high frequencies. Choosing the parameters \( \gamma_L \) and \( \gamma_H \) so that
\( \gamma_L = 0.5 \) and \( \gamma_H = 2 \), for the Butterworth filter function will tend to attenuate the low frequencies and amplify the high frequencies. This results in an enhancement of the reflectance components, which represent the objects in the scene, and attenuation of the illumination component, which represents the light distribution. The net result is a clearer image.
**Figure 2-16**

Homomorphic Filter

**Figure 2-17**

Cross-section of a circularly symmetric 2-D Butterworth filter function used in homomorphic filtering. $D(u,v)$ is the distance from the origin.
Figure 2-18

Magnitude and group delay response — designed 2-D filter

(a) Magnitude response
(b) Group delays: $\tau_1, \tau_2$ (both identical)
\[
(a_{ij}) = \begin{bmatrix}
-0.09347 & -0.07165 & -0.04295 \\
-0.07165 & 1.79010 & -0.87585 \\
-0.04295 & -0.87585 & 0.46525 \\
\end{bmatrix}
\]

\[
(b_{ij}) = \begin{bmatrix}
1.0 & -0.45083 & -0.00833 \\
-0.45083 & 0.26831 & 0.00681 \\
-0.00833 & 0.00681 & 0.00292 \\
\end{bmatrix}
\]

**Figure 2-19**

Filter coefficients - designed 2-D filter
2-4 SPICE SIMULATION OF THE 2-D SEMI-SYSTOLIC FILTER STRUCTURE

In this section, we present a simulation of the 2-D semi-systolic analog filter structure realization presented in Section 2-1. The particular version of SPICE being used herein is Microsim PSPICE [27].

2-4.1 SPICE MODELLING OF A SINGLE PE

The method of filter realization given in [13] results in a modular structure (Figure 2-3) in which all PE’s are identical. The fixed coefficient PE realization in analog form, as given in [13], is shown in Figure 2-20. A SPICE simulation model of this PE circuitry is shown in Figure 2-21 and its corresponding netlist is shown in Figure 2-22. (The latter figure does not show the parasitic capacitances included in the netlist so as not to clutter the diagram.)

Each PE generates a weighted sum of inputs which is delayed by one sampling clock period. With reference to the SPICE netlist in Figure 2-22, the subcircuit call to x3 introduces an op-amp inverting summer configuration with input resistors rx, ry, rz and feedback resistor rf3. Signals are input to the PE at nodes 24, 25, 26 and the (inverted) summed output appears at node 5. Independent sources vix, viy and viz are driving waveforms to test the simulated single PE. The summed output at node 5 is delayed by one sampling clock period by means of the circuitry connected between nodes 5 and 12. The op-amp circuit associated with subcircuit x1 and external components rf and rg buffers the voltage signal held on capacitor cs. Similarly the signal from cs2 is buffered by the op-amp circuit associated with x2. S1 and S2 are series sampling switches for capacitors cs and cs2.
respectively, while switches sclr1 and sclr2 are used to reset node voltages $V(1)$ and $V(11)$ to zero. Both the series and shunt switches are based on switch model "vswitch", which has a resistance of $50\,\Omega$ in the on state and $1\,\text{M}\Omega$ in the off state. Parameters $V_{\text{on}}=1.0$ and $V_{\text{off}}=0.0$ determine the upper and lower switching thresholds. Switches S1 and S2 are controlled by switching control waveforms $V_c$ and $V_{c2}$, respectively, which represent the (non-overlapping) sample timing clocks, while reset switches sclr1 and sclr2 are both

![Diagram of an analog circuit](image)

Figure 2-20

PE realization in analog form
Figure 2-21

SPICE model for PE of Figure 2-20
SINGLE PE
-
* simulates operation of a single PE
* -using idealized op-amp model
* vce and vce2 are control voltages for switches s1, s2 resp.
vce 8 0 dc 0 pulse(0 1. 105ns 5ns 5ns 25ns 70ns)
vce2 18 0 dc 0 pulse(0 1. 70ns 5ns 5ns 25ns 70ns)
* vsync is a control voltage for switches sclr1, sclr2
vsync 19 0 dc 0 pulse(0 1. 0 5ns 5ns 60ns 63.5us)
rc 8 0 200k
rc2 18 0 200k
rc3 19 0 200k
*
* vix, viy, viz are sample input waveforms
vix 24 0 pw1(0ns -2 68ns -2 72ns -3 138ns -3 142ns 2 208ns 2 212ns 4 278ns
+ 4)
viy 25 0 pw1(0ns 0 68ns 0 72ns 0 138ns 0 142ns -7 208ns -7 212n 1 278ns
+ 1)
viz 26 0 pw1(0ns -2 68ns -2 72ns -3 138ns -3 142ns 8 208ns 8 212n -8 278n
+ -8 282n -6)
*
* subcircuit call for op-amp
* summing amplifier
x3 0 23 5 uAxxxx
* cf3 and cg3 are parasitic capacitances
cf3 5 23 .2pf
cg3 23 0 .2pf
rf3 5 23 1k
* input resistors are rx ry rz
rx 24 23 2k
ry 25 23 1k
rz 26 23 2k
*
* uAxxxx is an idealized op-amp model
x1 1 3 2 uAxxxx
*
rf 2 3 250
* cf and cg are parasitic capacitances
cf 2 3 .2pf
cg 3 0 100k
cs 1 0 47pf
s1 5 1 8 0 sw
sclr1 1 0 19 0 sw
*
* subcircuit call for op-amp
x2 11 13 12 uAxxxx
rf2 12 13 250
* cf2 and cg2 are parasitic capacitances
cf2 12 13 .2pf
rg2 13 0 100k
cg2 13 0 .2pf
cs2 11 0 47pf
s2 2 11 18 0 sw
sclr2 11 0 19 0 sw
*

Figure 2-22  SPICE netlist for single PE simulation using idealized op-amp model (continued on following page)
Figure 2-22

SPICE netlist for single PE simulation using idealized op-amp model

(continued from previous page)
controlled by \( V(19) \), denoted "vsync", which is a single pulse occurring during the horizontal retrace interval.

Waveforms resulting from SPICE simulation of the single PE, plotted in Figure 2-23(a), clearly indicate the clock timing relationships. The trivial summation of voltages \( vix, xiy, \) and \( viz \) to produce \( V(5) \) at the onset of processing within the PE has been left out so as not to clutter the diagram unnecessarily. Also waveforms have been drawn offset by the specified DC levels, to make the plot more readable. Waveform \( V(5) \), the output of the summer, is sampled by \( V(8) \). The sample-and-hold type action results in waveform \( V(2) \), at the output of the first buffer amplifier. This waveform, \( V(2) \) is in turn sampled by \( V(18) \), resulting in waveform \( V(12) \), which is essentially a delayed (by one clock period) version of \( V(5) \). Sync pulse \( V(19) \) ensures zero initial conditions on capacitors \( cs \) and \( cs2 \).

In this simulation the subcircuit calls for the op-amps are based on the ideal op-amp model contained in the library file "genamp.lib" which is also listed. The use of this model allows short transient analysis run times and uses less computer memory in a case such as this involving multiple op-amp subcircuit calls. Note that the value of the feedback resistors \( rf \) and \( rf2 \) in the unity gain buffers could be reduced to zero for the majority of op-amp types that could be substituted for the ideal op-amp. A practical op-amp suitable for realization of the PE would be the Comlinear CLC400 wideband operational amplifier, which is a current feedback amplifier having specifications comparable to those of the Elantec EL2020CN, used for PE realization in Section 2-2.3. Figure 2-24 shows the SPICE netlist for the simulation done using the CLC 400 and Figure 2-25 gives the SPICE listing for the macro-model of this op-amp from the Comlinear Data Book [25].
Waveforms resulting from PE simulation with the CLC400 macro-model are shown in Figure 2-23(b). This SPICE output plot shows the possibility of high frequency noise being superimposed on waveforms within the PE, due to the effects of high-speed switching with a practical op-amp. High frequency noise components introduced into the filtered signal due to this effect may be removed with a low pass filter. This effect diminishes with lower clock signal rise times. The output signal of the 2-D filter will always need to be low pass filtered in the case of an input signal which is a true sampled data signal, in order to convert it back to an analog signal.
Figure 2-23 (a)  SPICE graphic plot for single PE simulation (using ideal op-amp model)

Figure 2-23 (b)  SPICE graphic plot for single PE simulation (using CLC400 op-amp macro-model)
SINGLE PE
* * simulates operation of a single PE
* (using CLC 400)
* vc and vc2 are control voltages for switches s1, s2 resp.
vc 8 0 dc 0 pulse(0 1. 105ns 5ns 5ns 25ns 70ns)
vc2 18 0 dc 0 pulse(0 1. 70ns 5ns 5ns 25ns 70ns)
* vsync is a control voltage for switches sclr1, sclr2
vsync 19 0 dc 0 pulse(0 1. 0 5ns 5ns 60ns 63.5us)
rc 8 0 200k
rc2 18 0 200k
rc3 19 0 200k
* vix, viy, viz are sample input waveforms
vix 24 0 pwl(0ns -2 68ns -2 72ns -3 138ns -3 142ns 2 208ns 2 212ns 4 278ns + 4)
viy 25 0 pwl(0ns 0 68ns 0 72ns 0 138ns 0 142ns -7 208ns -7 212n 1 278ns + 1)
viz 26 0 pwl(0ns -2 68ns -2 72ns -3 138ns -3 142ns 8 208ns 8 212n -8 278n + -8 282n -6)
* * summing amplifier
* c400 is the Comlinear CLC400 op-amp macro-model
x3 0 23 5 c400
* cf3 and cg3 are parasitic capacitances
cf3 5 23 .2pf
cg3 23 0 .2pf
rf3 5 23 1k
* input resistors are rx ry rz
rx 24 23 2k
ry 25 23 1k
rz 26 23 2k
* xl 1 3 2 c400
rf 2 3 250
* cf and cg are parasitic capacitances
cf 2 3 .2pf
cg 3 0 100k
cs 1 0 47pf
sl 5 1 8 0 sw
sclrl 1 0 19 0 sw
* x2 11 13 12 c400
rf2 12 13 250
* cf2 and cg2 are parasitic capacitances
cf2 12 13 .2pf
rg2 13 0 100k
cg2 13 0 .2pf
scs 11 0 47pf
s2 2 11 18 0 sw
sclr2 11 0 19 0 sw
.lib CLC400.lib
.model sw vsrw (RON=50 ROFF=1E6 VON=1.0 VOFF=0.0)
.options limpts=5200 reltol=.001
.tran 1ns 348ns
.probe v(5) v(2) v(8) v(11) v(12) v(18) v(19)
.end

Figure 2-24 SPICE netlist for single PE simulation using CLC400 macro-model
* CLC400.lib
* Comlinear CLC400 current feedback op-amp macro-model

* clc400 small-signal lumped element model using topology 1
* pin def. V+  V-  Vout
  .subckt c400 1 3 2
*
* define the non-inverting input impedance elements
rin 1 0 200k
cn 1 16 5.5pf
rn 16 0 0.01
*
* define the buffer frequency response determining elements
ra 4 5 1.0
la 5 6 11ph
ca 6 0 180pf
*
* define the inverting input impedance elements
ri 3 8 59
ci 3 8 5.3pf
li 9 10 33nh
*
* define the parasitic input capacitances
cx 1 3 .91pf
cy 3 0 1.8pf
*
* define the dependent sources to isolate between
* stages and sense the inverting node error current
e1 10 0 6 0 .9957
e4 4 0 1 0 1.00
f1 11 0 vcs 1.00
vcs 8 9 dc 0
e2 12 0 11 0 1.00
e3 14 0 13 0 1.00
*
* define the dc transimpeance gain and the
* dominant pole
rb 11 0 125k
cb 11 0 3.9pf
*
* define the high frequency poles for the
* transimpeance gain
lc 12 13 62ph
rc 13 0 .29
cc 13 0 470pf
*
* define the output impedance elements
lo 14 15 13nh
ro 15 2 7.3
*
.ends c400

Figure 2-25

SPICE netlist for Comlinear CLC400 op-amp macro-model
2-4.2 SPICE MODELLING OF THE OVERALL 2-D SAMPLE-AND-HOLD SEMI-SYSTOLIC STRUCTURE

The simulation of the overall 2-D filter structure was developed using a functional block simulation approach [19]. This approach allows the designer to develop a system level design in terms of functional blocks. It is gaining popularity especially for analog design where significant time savings often results through the use of function blocks.

A SPICE model of the 2 x 2 IIR semi-systolic filter corresponding to the block diagram of Figure 2-3 is given in Figure 2-26. Symbols used in this figure are explained in the legend (Figure 2-27)). The SPICE netlist for the simulation model is listed in Figure 2-28. The structure of the filter is highly modular, all PE's being identical, and so the PE block is the natural choice for development as a functional block in SPICE.

The operations involved in the general processing element PEij are represented as in the following set of statements:

Eij sum 0 poly(3) (XIN,0) (YIN,0) (Y'IN,0) 0 w1 w2 1
Tij sum 0 YPE 0 z0=0 td=70ns
Rij YPE 0 250

The first statement introduces a polynomial type voltage-controlled voltage source (VCVS) Eij, which is a function of multiple inputs. First, the dimension of the polynomial is specified i.e. poly(3). Then the set of voltages on which the controlled source depends are specified. Finally, the coefficients of the polynomial are specified in ascending order, where every coefficient up to the last non-zero coefficient must be specified. In general for a poly(3) type VCVS with assumed inputs v1, v2, v3, and a list of coefficients called k0, k1, k2, ..., the polynomial form for three inputs is
the constant term $k_0$

plus, the linear terms $k_1 v_1 + k_2 v_2 + k_3 v_3 +$

plus, cross terms $k_4 v_1^2 + k_5 v_1 v_2 + k_6 v_1 v_3 + . . .$

Thus with the coefficient list 0 0 0 0 as in the first SPICE statement above,

for element $E_{ij}$, only the linear terms are included and the appropriate weighting factors are assigned to the input (controlling) sources.

The second of the two above listed SPICE statements introduces an ideal transmission line, serving as a pixel delay, having a characteristic impedance $z_0=250\Omega$ and a time delay $t_d=70\text{ns}$, connected to the output of the VCVS $E_{ij}$. The third statement simply provides a terminating resistance for the transmission line.
Figure 2-26

SPICE model of the 2 x 2 semi-systolic structure
Three input voltage-controlled voltage source. If the nodes are labeled as shown, the voltage relationship is
\[ V(\text{OUT}) = w_1 \cdot V(A) + w_2 \cdot V(B) + w_3 \cdot V(C) \]
where \( w_1, w_2, \) and \( w_3 \) are polynomial coefficients associated with the controlling nodes.
The SPICE statement for this source is
\[ \text{Eij OUT 0 poly(3) (A,0) (B,0) (C,0) 0 w1 w2 w3} \]

Two input voltage-controlled voltage source.
(Similar to three input source above.)
In this case
\[ V(\text{OUT}) = w_1 \cdot V(A) + w_2 \cdot V(B) \]
SPICE statement:
\[ \text{Eij OUT 0 poly(2) (A,0) (B,0) 0 w1 w2} \]

SPICE transmission line device.
- an ideal delay line having two ports A,B.
Parameters \( z_0 \) and \( td \) are specified in the simulation
- \( z_0 \) is the characteristic impedance
- \( td \) is the delay in seconds
SPICE statement:
\[ \text{Tij A 0 B 0 z0 = <z0 value>td = <td value>} \]

Figure 2-27
Legend of symbols used in SPICE simulation model of Figure 2-26
2D SAMPLE-AND-HOLD SEMI-SYSTOLIC STRUCTURE
* 2-D filter simulation for semi-systolic structure (2 x 2)
.option IT15=0
.include vsourcex.cir
*
* Line delays for x (input) signal
Txl 0 x1 0 z=0=250 td=5us ; ideal transmission line simulating 1X delay
Tx2 x2 0 z=0=250 td=5us ; second delay line
Rx2 x2 0 z=0 250 ; matched termination
*
* In the following simulation of banks of PE's the poly type voltage
* controlled voltage source statement is used to obtain a (non-inverting)
* sum of specified voltages.
* (The constant term of the polynomial is zero and coefficients are
* specified for only the first order terms.)
*
* BANK 0
*
E02 3 0 poly(2) (x,0) (y,0) 0 -0.0859
T2 3 0 4 0 z=0=250 td=70ns
R2 4 0 250
*
E01 7 0 poly(3) (4,0) (x,0) (y,0) 0 1 -0.1433 -0.0167
T1 7 0 8 0 z=0=250 td=70ns
R1 8 0 250
*
E00 11 0 poly(2) (8,0) (x,0) (y,0) 0 1 -0.1075 0.9017
T00 11 0 bank0 0 z=0=250 td=70ns
R00 bank0 0 250
*
* BANK 1
*
E12 14 0 poly(2) (x1,0) (y1,0) 0 -1.7517 -0.0136
T4 14 0 15 0 z=0=250 td=70ns
R4 15 0 250
*
E11 18 0 poly(3) (15,0) (x1,0) (y1,0) 0 1 3.5802 -0.5366
T3 18 0 19 0 z=0=250 td=70ns
R3 19 0 250
*
E10 22 0 poly(3) (19,0) (x1,0) (y1,0) 0 1 -0.1433 0.9017
T10 22 0 bank1 0 z=0=250 td=70ns
R10 bank1 0 250
*
* BANK 2
*
E22 25 0 poly(2) (x2,0) (y2,0) 0 0.9305 -0.0058
T6 25 0 26 0 z=0=250 td=70ns
R6 26 0 250
*
E21 29 0 poly(3) (26,0) (x2,0) (y2,0) 0 1 -1.7517 -0.0136
T5 29 0 30 0 z=0=250 td=70ns
R5 30 0 250
*
E20 33 0 poly(3) (30,0) (x2,0) (y2,0) 0 1 -0.0859 -0.0167
T20 33 0 bank2 0 z=0=250 td=70ns
R20 bank2 0 250

Figure 2-28 SPICE netlist - simulation of overall 2-D filter structure
(continued on following page)
* Sum of bank outputs is obtained at node y using "Esum"
  *(Note divide by two included because coefficients aij, bij
  * are really 2X scaled.)
  Esum y 0 poly(3) (bank0,0) (bank1,0) (bank2,0) 0 0.5 0.5 0.5
  *
  * Line delays for y signal
  Ty1 y 0 y1 0 z0=250 td=4.93us
  Ty2 y1 0 y2 0 z0=250 td=5us
  Ry2 y2 0 250 ; matched termination of the two lines
  *
  * Transient analysis is done to simulate the processing of the first
  * 5 us of an NTSC line period.
  *
  .tran 5ns 15us 10us
  .probe
  .end

* Listing of include file vsource.cir follows:

* Listing of vsource.cir

vsource x 0 FWL10 0 4ns 6.00 70ns 6.00 74ns 3.00 149ns 3.00 144ns 5.00
+ 210ns 5.00 214ns 3.00 280ns 3.00 284ns 4.00 350ns 4.00 354ns 5.00
+ 420ns 5.00 424ns 4.00 490ns 4.00 494ns 6.00 560ns 6.00 564ns 6.00
+ 630ns 6.00 634ns 3.00 700ns 3.00 704ns 4.00 770ns 4.00 774ns 0.
+ 5.000us 0.  5.004us 4.00 5.070us 4.00 5.074us 3.00 5.140us 3.00
+ 5.144us 4.00 5.210us 4.00 5.214us 2.00 5.280us 2.00 5.284us 2.00
+ 5.350us 2.00 5.354us 4.00 5.420us 4.00 5.424us 4.00 5.490us 4.00
+ 5.494us 5.00 5.560us 5.00 5.564us 3.00 5.630us 3.00 5.634us 4.00
+ 5.700us 4.00 5.704us 3.00 5.770us 3.00 5.774us 0.
+ 10.000us 0. 10.004us 4.00 10.070us 4.00 10.074us 2.00 10.140us 2.00
+ 10.144us 5.00 10.210us 5.00 10.214us 4.00 10.280us 4.00 10.284us 3.00
+ 10.350us 3.00 10.354us 4.00 10.420us 4.00 10.424us 3.00 10.490us 3.00
+ 10.494us 5.00 10.560us 5.00 10.564us 4.00 10.630us 4.00 10.634us 4.00
+ 10.700us 4.00 10.704us 4.00 10.770us 4.00 10.774us 0.)

Figure 2-28

SPICE netlist - simulation of overall 2-D filter structure
(continued from previous page)
In the simulation of the overall 2-D filter structure, each PE is written as a functional block and incorporated into the overall structure. PE's with only two inputs are modelled in the same way as the general PE described above, the only difference being the use of two-dimensional polynomial (poly(2)) in the statement containing the VCVS for the PE. A function block for three input summer that adds PE bank outputs was derived from a polynomial type VCVS. Line delays are modeled as ideal transmission lines, having a characteristic impedance of 250Ω and a delay time of 5μs. A shortened version of the line delay period was selected in order to obtain a transient analysis simulation showing the processing of eleven samples within the first 5μs of an NTSC line period. The pixel (sample) time in this simulation is 70ns. A piecewise linear (PWL) independent source representing three consecutive lines of sampled data signals, used to drive the simulation for testing, is represented by a PWL statement in the include file "vsourc.ecir". Include files are simply substituted in-line by the SPICE executive program at run time.

A transient analysis was run over a time period of 15μs with printing of output data suppressed for all but the last 5μs, during which time all of the four line delays will contain stored data. The result of this transient analysis is shown in the SPICE plot of Figure 2-29. Once again, as in Section 2-4.1, DC offset values have been used to provide a vertical arrangement of waveforms for a more readable plot. Phase relationships of waveforms in the filter are clearly evident from this plot.
Figure 2-29

SPICE simulation of the 2-D sample-and-hold semi-systolic filter structure
2-4.3. FACTORS AFFECTING THROUGHPUT RATE -- 2D SYSTOLIC STRUCTURE

The throughput rate of the structure given in [13], which has been modelled in SPICE in Sections 2-4.1 and 2-4.2, will now be assessed.

Basically the throughput rate of the structure is dependent on the rate at which samples are clocked through the line stores and PE's, limited by the performance of the components which are selected to realize the structure in hardware.

In order to output sampled data signals at high rates without degradation, op-amps must be selected to have adequate slew rates and settling times. Here slew rate refers to the maximum possible rate of change (V/μs) at the op-amp output terminal under large-signal output conditions. There must be a sufficient slew-rate to avoid degrading the rise and fall times of sampled data signals. This becomes more critical when sampling times become shorter at higher data rates and with increasing peak-to-peak signal levels. High speed op-amps such as the Burr-Brown OPA620 are capable of slew rates of 1000V/μs and settling times to 0.1% of less than 50ns.

If CCD's are used as the line stores, the data rate within a CCD will depend on the number of "bits" (i.e. samples) which the CCD is capable of storing and the rate at which it is clocked. The clock frequency is related to the horizontal line period (denoted 1H in CCD literature) and the number of bits stored R, according to \( f = \frac{R}{1H} \). For example given a CCD for which \( R = 910 \) and for which the application is NTSC television requiring \( 1H = 63.5μs \) the clock frequency is \( f = \frac{910}{63.5E-6} = 14.33 \text{ MHz} \).
Let us consider the PE realization given in Figure 2-20. Within the PE, the data rate is limited by the op-amp slew rates and settling times and the switching speed of the switches. Each PE generates a weighted sum of inputs using either an op-amp weighted summer, or an op-amp summer used in conjunction with analog multipliers. (Analog multipliers are commercially available having bandwidth up to 500MHz). Each PE also includes a delay stage -- a sample is stored within the PE for one sample period, in an arrangement which is, excluding the reset switches across the hold capacitors, essentially a cascade of two sample-and-hold devices. IC devices which provide the sample-and-hold function at high sampling rates are commercially available. For example, the Comlinear Corp. CLC940 Fast Sampling, Wideband Track-and-Hold Amplifier has a 10ns acquisition time. (The device is described as "track-and-hold" rather than "sample-and-hold" to emphasize that the output will faithfully track the input signal, if the sample command is given for long periods of time between hold states.) We see then that a high sampling rate sample-and-hold device is within the capabilities of present IC technology and an additional reset switch across the hold capacitor, that is switched at the much slower rate of once for every line of samples, does not add a more stringent requirement.

The drawing of the CLC940 equivalent circuit given in the manufacturer's data sheets (Comlinear Databook [25]) shows that a diode bridge is used as the sampling switch within the device. The use of diode bridges vs. FET device switches in sample-and-hold type applications is explained in [28]. In general, a high performance wideband sample-and-hold is designed with either junction FET or diode bridge type switches. Diode bridge devices readily accommodate high speed applications that call for short RC time
constants and are commonly used in applications that involve 6-bit to 10-bit data acquisition systems with sampling rates of 1MHz to 50MHz. For the most demanding applications requiring both high speed and high accuracy, as in 10-bit to 13-bit data acquisition systems, designers often use an arrangement of two FET switches to help cancel the charge-injection error, permitting a low hold capacitor value. The vast majority of sampled-data systems, described in the literature are switched-capacitor filters designed for TELECOM applications, where typical bandwidth is 5kHz (audio frequency). When these lower data rate applications are encountered, the choice of sampling switch is less critical and simple FET or CMOS switches are adequate.
2-5. THE HARDWARE ASSEMBLY

A prototype of a 2-D real-time homomorphic filter was constructed using off-the-shelf components. The linear filter block, performing high pass Butterworth filtering, was realized as a 2-D sample-and-hold IIR semi-systolic filter of the type discussed in Section 2-2.1. The prototype filter is of order 2 x 2. Although a higher order structure could be used, the 2 x 2 structure provides an adequate approximation to the filter specifications. A 2 x 2 structure is economical in hardware, requiring only three banks of PE's and only four line delays, each bank containing only three PE's.

Essentially, what has been done in the prototype is to utilize log preprocessing and antilog postprocessing circuits in conjunction with a high frequency emphasis 2-D IIR filter, realized in real time by the methods of realization discussed in this thesis, to make a complex operation such as homomorphic filtering possible in real-time.

2-5.1 LOGARITHMIC CONVERTER

The prototype circuitry for the logarithmic converter section is shown in Figure 2-30. The input signal is assumed to be an analog baseband video signal containing only the luminance component. In the NTSC system such a signal would be obtained by removing sync and chrominance components from the baseband video signal.

A logarithmic converter was designed based on the transdiode configuration. The expression for the output voltage is derived in [29] and is given by

$$v_o = -1 \text{volt} \log_{10} (v_i)$$

(2-13)
With the values of R1 and R2 as indicated, the scale factor is 1V/decade. Op-amps X1 and X2 are LM318 and Q1 and Q2 are 2N2920.

The logarithmic converter circuitry should be followed by an inverting stage to remove the inversion in the equation. Also provision should be made for scaling the logarithmically converted signal to an appropriate peak-to-peak level and to set the DC level to a desired value.

2-5.2 ANTILOG CIRCUIT

The antilog circuit shown in figure 2-31 operates on a logarithmically compressed input signal that has been filtered by the linear operator of the homomorphic filter.

By a similar derivation to that given for the logarithmic converter it can be shown [29] that the output of operational amplifier X1 is given by

$$v_o = 10^{(v_{in} - v_{off})} \quad (2-14)$$

With the values selected for R1 and R2 the scale factor is 1V/decade. The input signal to the antilog circuit must be inverted to remove the effect of the minus sign in the exponent of the equation.

All op-amps in Figure 2-31 are LM318 and Q1 and Q2 are 2N2920.
Figure 2-30

High-speed logarithmic converter circuit

Figure 2-31

High-speed antilog circuit
2-6 FILTERING OF IMAGES

In order to demonstrate real-time operation of the filter on TV images, the 2 x 2 homomorphic filter prototype was inserted into the television receiver circuitry as shown in Figure 2-32. The detected video signal is available at the emitter follower at approximately a one volt peak-to-peak level. This signal is 2-D filtered by the prototype and sent to the final video stage, resulting in a homomorphically filtered image on the TV screen. For an IDTV (Improved NTSC Television) receiver, which is designed for separate luminance-chrominance (Y-C) processing [30], the filter would be inserted into the luminance channel. Before and after pictures showing the results of homomorphic filtering are shown in Figures 2-33(a) and (b) respectively.

A photograph of the filter prototype is shown in Figure 2-34. The filter circuit prototype boards with log and antilog circuits, but without the line delays, are shown in the photograph of Figure 2-34(b). Figure 2-34(a) shows two separate circuit boards, each board containing two line delays of the filter.

Figure 2-32

Insertion of filter prototype into TV receiver circuitry
Figure 2-33

(a) Original broadcast TV image
(b) Image homomorphically filtered by prototype
Figure 2-34

(a) Line delays on dual delay CCD boards
(b) Homomorphic filter prototype
2-7 COST OF REALIZATION - DIGITAL VS. ANALOG

Figure 2-35 presents a cost comparison, comparing the estimated cost of realization of
the 2-D semi-systolic structure (Figure 2-3) for analog vs. digital components.

An estimate of system cost is given based on the costs of components needed for the
major subsystems of the filter i.e. line delays and PE's. For the digital case we consider
high-speed multipliers and adders that are suitable for 8-bit operands, and which allow
processing at a speed comparable to that of an analog realization with the components
indicated.

We see that the end result of the comparison was a total component cost estimated at
$1558.00 / digital, vs. $165.00 / analog. (We are assuming 1992 U.S. dollars and quantity
pricing in the figures given.)
Cost of Realization - Digital vs. Analog

- Digital Multiplier - Logic Devices Inc. (CMOS) forms product in 35ns $30.00
- Digital Addn. 2 X 74AS181 ALU/FCN GEN. 8-bit word addition (10ns) $2.00
- Digital Pixel Delay - 74ALS174 hex D-Type (17ns + 5ns) $0.50/IC
- Cost of Digital PE = $62.00 (9 x $62 = $558 /PE's for 2 x 2 case)
- Analog Mult./Addn. using EL2020CN op-amps (10ns settling time) $3.00
- Analog Pixel Delay - Data Delay Devices 1514-70D (70ns) $2.00
- Cost of Analog PE = $5.00 ($9 x $5 = $45 /PE's for 2 x 2 case)
- Digital Line Delay - using TRW TDC 1006J (30ns) 512 x 1 $1000.00
  (4 line delays 2 x 2 case)
- CCD type Line Delay - using Fairchild-Weston CCD321 $120.00
  (4 line delays 2 x 2 case)

TOTAL COMPONENT COST:

  Digital $1558.00
  Analog $ 165.00

Figure 2-35

Cost of realization – digital vs. analog components
2-8 CONCLUSION

In this chapter we have presented the development of a practical hardware prototype, and a SPICE simulation of the 2-D IIR semi-systolic filter structure. A SPICE simulation was done which reveals the nature of data flow in the structure along with the details of clock timing for the processing elements. Throughput rate limitations are due mainly to the slew rate and the settling time of the op-amp relative to the sampling rate, as is generally expected for an analog system. A hardware prototype of a 2-D recursive filter of order 2 x 2 was developed using commercially available high-speed analog IC's: operational amplifiers for PE realization and CCD IC's for line delays. The prototype was further developed as a 2-D homomorphic filtering application by the addition of suitably designed high-speed pre- and post-processing circuitry and by the setting of the designed filter coefficients within the PE's. Real-time operation of the prototype was demonstrated by the insertion of the filter into the video circuitry of a television receiver. Filtered images of off-the-air broadcast signals showed the results of homomorphic image enhancement, providing simultaneous contrast enhancement and dynamic range compression.
REALIZATION OF TWO-DIMENSIONAL HYBRID IIR FILTERS

3-1 INTRODUCTION

All 2-D recursive filter realizations presented in this work so far have been based on the standard 2-D discrete-time transfer function given by

\[ H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{N} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j}}{1 + \sum_{i=0}^{N} \sum_{j=0}^{N} b_{ij} z_1^{-i} z_2^{-j}} \]  \hspace{1cm} (3-1)

It has been proposed [31] that 2-D analog realizations could alternatively be based on a hybrid 2-D transfer function given by

\[ H(z, s) = \frac{Y(z, s)}{X(z, s)} = \frac{\sum_{i=0}^{N} \sum_{j=0}^{N} a_{ij} z^{-i} s^{-j}}{1 + \sum_{i=0}^{N} \sum_{j=0}^{N} b_{ij} z^{-i} s^{-j}} \]  \hspace{1cm} (3-2)

The input and output signals \(X(z, s), Y(z, s)\) respectively, referenced in the above transfer function, are semi-discrete-time 2-D signals denoted respectively in the time domain as \(x(nT, t), y(nT, t)\). An example of a signal of this type would be a television luminance channel baseband signal which is said to have a raster scanned format (Figure 3-1).
Figure 3-1

Raster scanned signal as a 2-D semi-discrete-time signal

A raster scanned image may be either progressive scan or interlaced. Interlacing is often introduced in TV signals to obtain a higher frame rate to reduce flicker with lower video bandwidth requirements. In the case of an interlaced image we consider each field of the interlaced picture to be a separate image.

It can be noted that initial investigation in the area of 2-D FIR analog filters was proposed by J.J. Soltis in [45], [46]. He proposed the implementation of a 3 x 3 kernel for edge detection.
3.1.1 A REALIZATION

The 2-D difference equation corresponding to (3-2) is

\[ Y(z,s) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} X(z,s) z^{-i} s^{-j} - \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} Y(z,s) z^{-i} s^{-j} \]  \hspace{1cm} (3-3)

which can be rearranged as follows:

\[ Y(z,s) = Y_0(z,s) + Y_1(z,s) + Y_2(z,s) + \cdots + Y_M(z,s) \]  \hspace{1cm} (3-4)

where

\[ Y_0(z,s) = \sum_{j=0}^{N} a_{0j} X(z,s) - b_{0j} \sum_{j=0}^{N} Y(z,s) \]

\[ Y_1(z,s) = \sum_{j=0}^{N} a_{1j} z^{-1} X(z,s) - \sum_{j=0}^{N} b_{1j} z^{-1} Y(z,s) \]  \hspace{1cm} (3-5)

\[ Y_2(z,s) = \sum_{j=0}^{N} a_{2j} z^{-2} X(z,s) - \sum_{j=0}^{N} b_{2j} z^{-2} Y(z,s) \]

\[ Y_M(z,s) = \sum_{j=0}^{N} a_{Mj} z^{-M} X(z,s) - \sum_{j=0}^{N} b_{Mj} z^{-M} Y(z,s) \]

This leads to the hybrid analog realization proposed by Sid-Ahmed [31], as shown in Figure 3-2.
Figure 3-2  A 2-D hybrid analog filter realization

(a) Block diagram
(b) Realization of $Y_1(z,s)$
In Figure 3-2 the blocks denoted in the z-domain by $z^{-1}$ are line delays each storing one line of a raster scanned signal. Here the symbol $z^{-1}$ has a role analogous to the use of $z_1^{-1}$ in the standard 2-D discrete time transfer function $H(z_1,z_2)$. Each of the processing blocks, $Y_0$, $Y_1$, $Y_2$, operates on either direct or (once or twice) delayed versions of the input and output signals $X$ and $Y$. (Output feedback always occurs of necessity in IIR (recursive) filters.) Block processor component outputs $Y_0$, $Y_1$, $Y_2$ are summed to produce the filter output signal $Y$.

In the rest of this chapter further realizations will be explored in terms of hardware design and simulation models.
3-2 DETAILS OF THE HARDWARE DESIGN

The 2-D recursive $1 \times 1$ filter shown in Figure 3-3 is considered here for the purposes of introducing hardware design for real-time applications. Such a structure also provides for the most economical realization in hardware. Hardware design considerations presented here are readily extendible to higher order. The block diagram (Figure 3-3(a)) satisfies the recursive equation (3-3) for the case $M=N=1$. Note that $x(nT,t)$ corresponds to $X(z,s)$ in the $z$-domain, while $x((n-1)T,t)$ corresponds to $z^{1}X(z,s)$ etc.

The structure is seen to consist of two line delays and a processor block that produces the output signal $Y(nT,t)$.

3-2.1 DESIGN OF ANALOG PROCESSOR SECTION

The analog processor block computes the output signal $y(nT,t)$, according to the difference equation, having as inputs both the direct and line-delayed versions of the input and output (feedback) signals. The computational elements of the hardware are analog summers (that apply weighting values to their input lines), inverters, and integrators. These elements can be realized with readily available commercial components, when designed as op-amp circuits. The speed of operation of these circuits, which bears directly on the attainable throughput rate of the filter, is primarily governed by settling time [32]. Suitable op-amp devices for real-time circuitry to be developed here are the LM 318 wideband operational amplifier (available from National semiconductor) and the Elantec EL2020C. The Elantec EL2020C, which was described in Section 2-2 is based on a proprietary current-feedback
topology, and can be substituted directly for conventional op-amps in all of the op-amp circuits to be described in this section with the exception of the integrator.

Like a conventional op-amp circuit, a current feedback amplifier can be stabilized against the effect of input capacitance by using a small feedback capacitance (as in the circuits of Figures 3-4, 3-5) to introduce sufficient phase-lead to compensate for the phase-lag due to the input capacitance. Larger amounts of feedback capacitance, as required in the Miller feedback integrator circuit configuration, can cause either instability or intolerable peaking and ringing. An integrator circuit based on a current feedback amplifier would require an additional op-amp in the feedback loop [25]. Such circuits have the drawback of requiring tightly matched resistances, if lossless integration is required.
Figure 3-3

2-D recursive 1 x 1 filter realization

(a) Block diagram of structure

(b) Block diagram of processor section
i) Eliminating DC Offsets

DC offset voltages are added to signals by the DC errors of amplifiers and by bias level shifts. Since in a filtering application the required signals are AC, all elements in the design are AC coupled as a straightforward method of removing DC offsets.

ii) Design of Inverter

An inverter is required in the processor section wherever a signal must undergo a sign change with unity gain. The difference equation (3-3) should be satisfied with regard to correct algebraic signs of component signals, while minimizing the number of inverters in the filter overall. The inverter, shown in Figure 3-4, requires a 4.7 pF capacitor in the feedback loop to prevent oscillations in the output when a high-speed op-amp such as the LM318 is used. The power supply bus lines, are decoupled at pins 4 and 7, of the IC, with 0.1 μF capacitors to ground.

iii) Design of Summing Amplifier

The summing amplifier, a special case of the inverting amplifier, is shown in Figure 3-5. The node at pin 2 is the summing node. If the voltages set by input attenuators are $\alpha_1 V_1$, $\alpha_2 V_2$, ..., $\alpha_n V_n$, the summed output voltage at pin 6 is $-(R_i/10k)(\alpha_1 V_1 + \alpha_2 V_2 + ... + \alpha_n V_n)$. A given filter coefficient of the set $\{a_i, b_i\}$, corresponds to the factor $(R_i/10k)$. Note that equation (3-3) may have both sides multiplied by a common scale factor both to scale the coefficients and to improve the dynamic range of internal signals of the processor.
iv) Design of Integrator

The circuit of Figure 3-6 performs the operation of integration for the processing unit. It employs a conventional single pole op-amp circuit. For an input signal \( V_i \), the output is given by

\[
V_o = -\frac{1}{R_i C_i} \int V_i dt
\]  

(3-6)

Since the gain of the op-amp from input to output is proportional to \( 1/(R_i C_i) \) and the value of \( \int V_i dt \) is inversely proportional to frequency for a given frequency component in the signal, the value of the time constant \( R_i C_i \) is selected so that the peak output voltage falls within the dynamic range of the op-amp for the lowest video frequency in the input signal. The integrator is set to a zero initial condition at the start of each line scan (every 63.5\( \mu \)s in the NTSC system) by means of a fast analog switch connected in parallel with capacitor \( C_i \). The analog switch is of the 4066 CMOS IC type (one of four in the package is used). It has a bandwidth of 40 MHz, an on resistance of 80\( \Omega \) and a maximum repetition rate of 10 MHz at the control input. The sync pulse, which occurs at the beginning of each line scan period, is separated from the video signal, limited to 12 Vpk and applied to the control input of the analog switch. A composite sync signal suitable for resetting the integrator can be derived from the video input by means of the sync separator circuit given in Figure 3-7.
Figure 3-4

Fast Inverter

Figure 3-5

Fast summing amplifier
Figure 3-6

Fast integrator
Figure 3-7

Sync separator circuit
3-2.2 DESIGN OF 1H DELAY LINE

The 1 x 1 IIR filter structure of Figure 3-3 requires two 1H line delays. These line delays can be realized in hardware based on CCD’s as was done in Section 2-2. By means of the multiplexed CCD approach described there, a high sample resolution delay can be achieved at lower clock frequency (making hardware design requirements less critical). Inputs and outputs of the line delays are AC coupled, eliminating any problem with DC offsets.

Glass block type delay lines, although having a higher insertion loss, are considerably simpler to manage and control, with vastly fewer parts. At the present time of writing, glass block devices are less expensive and would merit consideration especially in an application such as that of Figure 3-3 where a 1 x 1 structure has been introduced for the sake of economy.

Recall that the CCD design given in Section 2-2 includes an optional low pass filter to remove any clock signal components in the CCD output signal. The use of a low pass filter circuit in conjunction with the CCD circuit is desirable here since the analog processor circuitry that receives CCD stored signals is designed to operate on continuous rather than sampled-data signals.
3-3 SPICE SIMULATION

We have seen how SPICE simulation was used to model 2-D recursive filter structures, as presented in Section 2-4. There a so-called function block approach was introduced which involved modeling at both the overall system level and separate modeling of PE performance within the 2-D structure. The main benefit of this approach is the exploitation of the modularity of systolic structures to reduce the complexity of the simulation process with corresponding gains in run-time efficiencies. This is a benefit that becomes increasingly significant in the case of higher order filters (i.e. $M \times N = 5 \times 5$, $6 \times 6$ etc.).

3-3.1 A SPICE MODEL FOR 2-D SEMI-SYSTOLIC HYBRID FILTERS

Structures of the 2-D semi-systolic type, similar to those presented in Chapter 2, can be developed for the case of the hybrid 2-D transfer function $H(z,s)$. An example of such a structure is presented in the form of a SPICE simulation model in Figure 3-8, expressly for the purpose of presenting the "function block" simulation technique for the 2-D hybrid case.

Figure 3-8 details the overall 2-D structure in terms of PE's. The structure is, although based on a different transfer function type, essentially patterned after that shown in Figure 2-26 in Chapter 2. The structure of Figure 3-8 uses two types of PE — the "end of bank PE's" (type II) differing from those used elsewhere throughout the structure (type I). Groupings of elements that constitute a PE of a given type are called out by dashed boxes in Figure 3-9.

The symbols used in this figure have been introduced in earlier Chapters. Line stores, TX1, through TX4 are simulated as SPICE transmission line devices (Figure 2-27). The PE's in this figure, which are designated as certain groupings of standard block diagram symbols, are shown with their equivalent SPICE models for function block simulation in Figure 3-9.
Consider first the type I PE. Its input-output relationship is given by

\[ V_k = \frac{1}{c_k} \int_0^T (V_r + a_{ij}V_x - b_{ij}V_y)dt \] (3-7)

where \( V_r \) and \( V_k \) are input and output node voltages respectively, \( a_{ij} \) and \( b_{ij} \) are general coefficients of the \( ij \)th PE, and \((0,T)\) is an appropriate interval of integration corresponding to the line store duration \( 1H \). The operation of integration is realized functionally by means of a voltage-controlled current source \( G_{ij} \) in parallel with capacitor \( C_k \). Note that \( R_k \), the \( 100M\Omega \) resistor in parallel with \( C_k \) is merely an arbitrarily large resistance required to keep node \( k \) from floating during a SPICE DC operating point determination. The voltage-controlled current source \( G_{ij} \) produces an output current \( i_{ij} \) according to

\[ i_{ij} = 1 \cdot V_r + a_{ij} \cdot V_x - b_{ij} \cdot V_y \] (3-8)

This current through \( C_k \) results in the integral expression for node voltage \( V_k \) given in equation 3-7 above.

The operations involved in the general processing element \( PE_{ij} \) (type I) are represented as in the following set of statements:

Gij 0 k poly(3) (x,0) (y,0) 0 1 w1 w2
Ck k 0 1
Rk k 0 100E6
.ic V(k)=0

The first statement introduces a polynomial type voltage-controlled current source (VCCS) \( G_{ij} \), which is a function of the voltages at the nodes labeled "x", "y" and "y". The controlling relationship is in accordance with a polynomial of dimension three i.e. poly(3). As we saw in Section 2-4.2 such a higher dimensional polynomial has a constant term as well as linear terms and cross terms. For our present purposes we can ignore higher order and cross terms. Now consider the list of numeric values 0 1 w1 w2 in the first SPICE statement
above, for the (VCCS). The constant term (of \( \text{poly}(3) \)) is set to zero, and \( w_1, w_2, \) and \( 1 \) are associated with first order terms, and can be thought of as "weighting" factors applied to inputs \( r, x, y \) respectively. If \( w_1 \) is set equal to \( a_{ij} \) and \( w_2 \) to \( b_{ij} \), then the controlling relationship of the VCCS is in accordance with equation 3-8 as required.

The second of the SPICE VCCS statements introduces a capacitance of 1F in order to set the constant of proportionality to unity in equation 3-7. (If the value of 1F seems unwieldy, recall that this is a function block rather than hardware model.) There is also an initial condition statement (\.ic) that causes the voltage on the capacitor to be initially zero.

A Type II PE occurs at the end of each bank of PE's in the structure. These PE's are modeled, in terms of function blocks, using a VCVS model:

\[
\text{Eij \ v k 0 poly(3) (r, 0) (x, 0) (y, 0) 0 1 \ w1 \ w2}
\]

Here "k" is a label for the output node and "r", "x", and "y" are labels for the input nodes (Figure 3-9). Only the first order linear terms of \( \text{poly}(3) \), the three dimensional polynomial, are retained and the input output relationship is given by

\[
V_k = 1 \cdot V_r + w_1 \cdot V_x + w_2 \cdot V_y
\]

(3-9)

with \( w_1 \) and \( w_2 \) being the numeric values of filter coefficients in the PE.

A complete listing of the SPICE function block simulation model of Figure 3-8 is given in Figure 3-10.
Figure 3-8

SPICE model of the 2 x 2 semi-systolic structure based on the 2-D hybrid transfer function
SPICE PE MODELS

\[ i_{ij} = V_r + c_{ij} V_x - b_{ij} V_y \]

\[ V_k = 1/c_k \int_0^T (V_r + c_{ij} V_x - b_{ij} V_y) \, dt \]

\[ V_k = V_r + c_{ij} V_x - b_{ij} V_y \]

Figure 3-9

Function block SPICE models for PE types I and II
anal2d.cir
* 2D filter data flow for 2D analog semi-systolic structure (2 x 2)
* -based on H(z, s) type transfer function
* Generate a test waveform, "vsource" using the pulse statement
* vsource source 0 pulse(0 0.24v 0us 220ns 220ns 3us 5us)
  vbias bias 0 dc -0.1 ; dc bias value to be added to vsource
* The following are resistors added since SPICE requires a minimum of
  two connections at each node
  RLs source 0 1E6
  RLb bias 0 1E6
  Rbank0 bank0 0 1E6
  Rbank1 bank1 0 1E6
  Rbank2 bank2 0 1E6
  Rx x 0 1E6
* Add dc bias voltage to vsource to obtain input waveform x
  Ex x 0 poly(2) (source,0) (bias,0) 0 1 1 ; input waveform, x
  (Note use of poly type voltage controlled voltage source statement.)
* Line delays for x (input) signal
  * Tx1 x 0 x1 0 z=250 td=15us ; ideal transmission line simulating 1H delay
  * Tx2 x1 0 x2 0 z=250 td=15us ; second delay line
  * Rx2 x2 0 250 ; matched termination
  *
  * A poly type voltage controlled current source connected so as to charge a
  * capacitor is used to simulate an ideal multi-input integrator
  * (non-inverting) with weighted inputs.
  *
  * BANK 0
  *
  G02 0 3 poly(2) (x,0) (y,0) 0 0.0 0.0 ; PEO2
  C3 3 0 1
  R3 3 0 100E6
  .ic v(3)=0
  *
  G01 0 7 poly(3) (3,0) (x,0) (y,0) 0 1 -0.865 -0.474 ; PEO1
  C7 7 0 1
  R7 7 0 100E6
  .ic v(7)=0
  *
  E00 bank0 0 poly(2) (7,0) (x,0) 0 1 -3.397 ; PEO0
  *

Figure 3-10 SPICE netlist for simulation of 2-D IIR semi-systolic hybrid filter structure
(continued on next page)
* BANK 1
*  
G12 0 14 poly(2) (x1,0) (y1,0) 0 0.0 0.0 ; PE12
C14 14 0 1
R14 14 0 100E6
.ic v(14)=0
  
G11 0 18 poly(3) (14,0) (x1,0) (y1,0) 0 1 1.042 0.211 ; PE11
C18 18 0 1
R18 18 0 100E6
.ic v(18)=0
  
E10 bank1 0 poly(3) (18,0) (x1,0) (y1,0) 0 1 2.433 0.658 ; PE10
  
* BANK 2
*  
G22 0 25 poly(2) (x2,0) (y2,0) 0 0.0 0.0 ; PE22
C25 25 0 1
R25 25 0 100E6
.ic v(25)=0
  
G21 0 29 poly(3) (25,0) (x2,0) (y2,0) 0 1 0.0 0.0 ; PE21
C29 29 0 1
R29 29 0 100E6
.ic v(29)=0
  
E20 bank2 0 poly(3) (29,0) (x2,0) (y2,0) 0 1 0.0 0.0 ; PE20
  
* Sum of bank outputs obtained at node y using "Esum"
*  
ESUM y 0 poly(3) (bank0,0) (bank1,0) (bank2,0) 0 0.5 0.5 0.5
*  
* Line delays for y signal
*  
Ty1 y 0 y1 0 z0=250 td=15us
Ty2 y1 0 y2 0 z0=250 td=15us
Ry2 y2 0 250 ; matched termination of the two lines
*  
* Transient analysis is done to simulate the processing of the first
* 15 us of an NTSC line period.
*  
*.tran 0.1us 45us UIC
*Set x range to 30us 45us in PROBE
.tran 0.1us 15us
.probe 
.end

Figure 3-10 SPICE netlist for simulation of 2-D IIR semi-systolic hybrid filter structure

(continued from previous page)
3-3.1.1 SIMULATION OF THE SPICE PE

We have considered SPICE modeling of the overall 2-D structure, which provides a simulation of the operation of the modular structure in terms of data flow in a global sense. We now turn our attention to the operation of PE's within the modular structure. Figure 3-11 shows a SPICE model for the general Type I PE. This model, based on the multiple-input integrator circuit topology, is clearly hardware dependent. The SPICE netlist is given in Figure 3-12. With reference to this listing, there are three inputs each AC coupled to input resistors of the summing integrator configuration based on op-amp subcircuit call X1. As we have seen in Chapter 2, we can run the simulation with alternative op-amp macro-models simply by invoking the appropriate library calls (.lib). In the listing of Figure 3-12, "genamp.lib" is the same op-amp macro-model used in Figure 2-22.
Figure 3-11

SPICE PE (type I) simulation model -- 2-D hybrid filter
PE for 2-D hybrid filter
.
  * simulates operation of a single PE
  *
  * vx, vr, vy are sample input waveforms
  vx x 0 pwl(0ns -2 68ns -2 72ns -3 138ns -3 142ns 2 208ns 2 212ns 4 278ns 4)
  vr r 0 pwl(0ns 0 68ns 0 72ns 0 138ns 0 142ns -7 208ns -7 212n 1 278ns 1)
  vy y 0 pwl(0ns -2 68ns -2 72ns -3 138ns -3 142ns 8 208ns 8 212n -8 278n -8
  +
  282n -6)
  *
  * subcircuit call for op-amp
  * integrating amplifier
  X1 3 2 k uAxxx
  *
  C2 2 k 50pf
  Rx 10 2 10k
  Rr 11 2 10k
  Ry 12 2 10k
  CX x 10 0.1u
  Cr r 11 0.1u
  Cy y 12 0.1u
  *
  Rp 3 0 1k
  .ic v(k)=0
  .lib genamp.lib
  .options limpts=5200 reltol=.001
  .tran 1ns 348ns UIC
  .probe v(x) v(y) v(r) v(k)
  .end

Figure 3-12

SPICE PE (type I) netlist
A HARDWARE PROTOTYPE

A hardware prototype of the 2-D hybrid analog 1 x 1 recursive filter, designed in Section 3-2, was constructed. It is shown in the photograph of Figure 3-13. The prototype was constructed on two separate printed circuit boards. The board shown above in the photo contains the two 1H line delays of the structure, while the board shown below contains the analog processor circuitry.

The operation of the prototype 2-D hybrid analog filter was demonstrated with an application drawn from [33], a phase contrast filter. In the phase contrast filtering technique, which enhances high frequency components in the image, the filter transfer function $H(z,s)$ has magnitude response which is flat and a phase response that causes those frequency components in the input signal that are above a given critical frequency $\omega_c$ to be shifted -180 degrees out of phase so that after the original image is subtracted, the frequency components below $\omega_c$ will be removed while those above will be doubled in magnitude.

In order to demonstrate the real time operation of the 2-D hybrid analog filter on TV images, the prototype was inserted into the luminance channel of television receiver circuitry as shown in Figure 3-14. The separated sync signal is brought out from the circuitry, limited to 12 Vpk and connected to the control input of the analog switch in the integrator section of the filter. The detected video signal is available at the emitter follower at approximately a one volt pk-pk level. This signal is 2-D filtered by the prototype and sent to the final video stage, resulting in an enhanced image on the TV screen. Before and after pictures showing the results of filtering are given in Figures 3-15. Phase contrast filtering of stored images with the filter
prototype resulted in filtered images identical to those produced by a computer simulation of the equivalent digital realization.

Figure 3-13

2-D hybrid analog filter prototype (photo)
Figure 3-14

Insertion of 2-D hybrid analog filter prototype into TV luminance channel
Figure 3-15

Image filtered with 2-D hybrid analog filter prototype

(Original image above — filtered image below)
3-4.1 EVALUATION

A 2-D hybrid analog filter can be constructed with conventional components and applied to the processing of TV images in real time. The resolution of the filtered picture is 910 x 525. A digital filter architecture which can be realized with hardware of approximately the same order of complexity as for a 2-D hybrid analog filter is the distributed arithmetic architecture [34,35,36,37]. A prototype of a 2-D distributed arithmetic implementation is described in [37]. A comparison between the analog and digital approaches in terms of hardware complexity, speed, and cost is provided next.

In both approaches the hardware complexity increases linearly with order. The analog approach benefits from modularity in extending order. The analog 2-D hybrid filter was realized using only op-amps, CCDs, analog switches, TTL/CMOS gates and passive components.

The analog 2-D hybrid approach is capable of real time performance irrespective of filter order. The digital filter prototype referred to in [37] having a package count of 100 IC's processes images of size up to 256 x 256 (pixels) at a speed of 350 kpixels/s. The 2-D analog filter processes images with 910 x 525 resolution at a rate of 910 x 525 x 30 = 14.33 Mpixels/s and requires a package count of 40 IC's for a 2 x 2 implementation with an overall power dissipation of 10W. A 2-D analog prototype for a 2 x 2 structure would cost approximately $180 (US, 1989 prices) compared to approximately $350 (US, 1984 prices) [37] for the 2-D digital distributed arithmetic prototype which is not capable of real time processing. Faster logic families such as ECL would be required for a higher throughput distributed arithmetic filter. This would increase the cost and power consumption considerably.
Also ECL and other high-speed logic families operate in a transmission line environment requiring more expensive circuit board designs such as stripline. Furthermore package count increases due to the non-availability of MSI functions in ECL.

3-5 CONCLUSION

A practical hardware design of a 2-D hybrid analog filter has been developed. The feasibility of the 2-D analog approach based on a 2-D hybrid transfer function $H(z,s)$ has been demonstrated. A hardware prototype of a 1 x 1 recursive filter was designed using commercially available analog IC's. The method of SPICE function block simulation developed for the 2-D semi-systolic filter in Chapter 2 has been extended to the 2-D hybrid transfer function case in which the operation of integration is involved.
SWITCHED-CAPACITOR IMPLEMENTATIONS OF 2-D FILTERS FOR VIDEO RATES

4-1 INTRODUCTION

In this chapter we present the hardware realization of a semi-systolic 2-D filter for real-time image processing using switched-capacitor (SC) circuitry. Recall that real-time processing of TV images would require throughput rates as high as \( \sim 14 \) million pixels/sec for standard NTSC TV images and \( \sim 40 \) million pixels/sec for high definition television (HDTV). Two-dimensional filters realized with SC hardware are competitive with digital filters, in cost speed and power consumption. The frequency-dependent behavior of an SC filter is primarily determined by ratios of capacitances which can be precisely controlled in integrated circuit technology. The accuracy of an SC circuit is higher (0.1-0.05\%) [38] than that of a conventional analog circuit. The basic operations such as multiplication, addition, and delay are performed in only one sampling clock cycle, making possible very high speed signal processing. The upper frequency limitation of monolithic SC circuits is dependent on the MOS process used. With presently available CMOS technology, switching frequencies of up to 30 MHz have been achieved and an absolute maximum frequency of 130MHz has been predicted [39].

Due to recent advances in VLSI technology, it has become highly desirable to design filter realizations in such a way that the repetitive use of common structures is
maximized, and the number of "random" interconnections is minimized, in order to facilitate the production of integrated circuits. Systolic structures have a speed advantage over other known forms of realization, due to the high degree of parallelism, and have other desirable features for VLSI implementation including modularity, global clocking, and localized data communication. The use of systolic array architecture in the design of SC filters will enable more efficient implementation and compilation these filters in LSI/VLSI technology.
4-2 THE 2-D SEMI-SYSTOLIC REALIZATION

We proceed to develop a hardware realization of the 2-D semi-systolic structure given in [13,17] using switched-capacitor circuitry. The block diagram form of this structure is given in Figure 4-1, for the case of a 2 x 2 filter. The input-output relationship of this recursive filter is governed by the 2-D digital transfer function of order M x N given by

\[
H(z_1, z_2) = \frac{Y(z_1, z_2)}{X(z_1, z_2)} = \frac{\sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} z_1^{-i} z_2^{-j}}{1 + \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} z_1^{-i} z_2^{-j}} \tag{4-1}
\]

where \{a_{ij}\} and \{b_{ij}\} are filter coefficients obtained from a design procedure that determines the coefficients according to some given criteria to approximate the desired frequency response of \(H(z_1, z_2)\) (i.e. high-pass, low-pass, etc.). For processing images the filter should be designed for linear phase.

The corresponding recursive equation in the z-domain is

\[
Y(z_1, z_2) = \sum_{i=0}^{M} \sum_{j=0}^{N} a_{ij} X(z_1, z_2) z_1^{-i} z_2^{-j} - \sum_{i=0}^{M} \sum_{j=0}^{N} b_{ij} Y(z_1, z_2) z_1^{-i} z_2^{-j} \tag{4-2}
\]

The symbol for the general processing element (PE) and its block diagram realization in terms of scale factor multipliers \(w_1, w_2\), a three input adder and delay element \(z_2^{-1}\) is given in Figure 4-2. Each delay element denoted \(z_2^{-1}\) in the z-domain represents a delay of one pixel time (or equivalently sample time).
Figure 4-1

Block diagram representation of a 2-D semi-systolic structure ($M \times N = 2 \times 2$)

Figure 4-2

PE symbol and its block diagram realization
4-3 A SWITCHED-CAPACITOR REALIZATION

A realization of the structure in Figure 4-1 requires the development in terms of specific hardware:

1) line delays denoted \( z^{-1} \),

2) processing elements PE's

3) bank output summer to produce output signal \( Y(z_1, z_2) \).

Switches are realized as MOS transmission gates throughout. The input signal \( X_{in} \) is required to be a discrete-time version of the analog signal as obtained from a sample-and-hold (S/H) circuit. The use of an S/H circuit at the input will also optimize the settling speed of SC circuits in the filter by creating step input settling responses. A S/H circuit at the output will help to remove continuous-time transients that arise due to finite SC circuit time constants [40].

For filtering TV images in real-time the 2-D filter is inserted in the luminance channel. The filter output signal is converted to a continuous-time signal prior to application to the final video stage. (Many ATV (advanced television) receivers are designed for separate Y-C processing). A pre-filter to prevent aliasing is usually not required since video bandwidth in the luminance channel is limited by components found in the stages of circuitry ahead of the point of filter insertion.
4-3.1 DESIGN OF LINE DELAY

As is seen in Figure 4-1, four line delays are required for the 2 x 2 recursive filter. The input signal for a 2-D system is often in raster scanned form in which case the line delays correspond to one horizontal line scanning period denoted 1H (63.5 μs in the NTSC system). In real-time 2-D filtering an image is not saved in memory. The current pixel is obtained with the use of pixels in current and previous lines of the input and output. A line delay can be realized in hardware using charge-coupled device (CCD) type analog shift registers as described in Chapter 2. Alternatively an analog memory 1H delay line can be developed entirely in SC circuitry by means of the serial-parallel-serial (SPS) technique given in [41]. In this technique the input signal is multiplexed into \( n \) parallel paths (each of which is slower by a factor of \( n \)) and demultiplexed into a fast output signal (Figure 4-4). Within each of the parallel paths the signals are delayed by slow, but large capacity SC memories. They are then reconverted into a fast output signal by a parallel-to-serial converter.

Each of the \( n \) memory banks can be implemented by means of a new high-precision capacitor memory circuit [41], shown in Figure 4-4. This type of circuit is called a feedback-readout SC memory circuit. In the WRITE mode, the input and output of the amplifier are shorted and a voltage equal to \( V_i - V_\alpha \) is written to a selected memory capacitor. In the READ mode, the shorting of the amplifier is canceled and the upper plate of the selected memory capacitor is connected to the amplifier output. This capacitor voltage is held by the amplifier and the voltage output is \( V_i - V_\alpha \) (at the capacitor) plus \( V_\alpha \) (at the amplifier input) which is equal to \( V_i \), the value that was input during the READ cycle.
Figure 4-3

Serial-Parallel-Serial configuration for 1H line delay

Figure 4-4

Analog feedback-readout capacitor memory for use in SPS type 1H line delay
4.3.2 REALIZATION OF PE IN SC CIRCUITRY

The general PF (Figure 4-2) produces an output signal given by
\[ Y_{PE} = (w_1X_{IN} + w_2Y_{IN} + Y'_{IN}) z^{-1} \]  \hspace{1cm} (4-3)

The PE is to be realized in MOS SC circuitry.

A. Amplifiers

One of the most critical components in the development of high-speed SC filter circuitry is the amplifier. The folded-cascode CMOS op-amp topology is adopted here because it yields high performance in high-speed SC circuits. It provides higher frequency operation and a higher power supply rejection ratio than conventional cascode and two-stage op-amps.

A newer modified version of the folded-cascode op-amp (Figure 4-5) has been proposed recently [42] which has the following advantages for use in SC circuits:

1) It is a single-stage amplifier as opposed to the conventional two-stage design. This entails that the loading capacitance itself will provide the closed-loop stability, and no additional compensation capacitance is required as with two-stage op-amps.

2) Design can be optimized for minimum settling time (MST) response. Settling times as low as 45 ns can be obtained with load capacitance \( C_L = 5 \) pF. The settling time turns out to be a strong function of load capacitance and is greater for load capacitance values below or above the designed (for MST) value of 5 pF.

Using the BiCMOS process, folded-cascode transconductance amplifiers have been developed with the following specifications: DC gain = 69 dB, \( f_{max} = 98 \) MHz, SR = -100 mV/ns, transconductance = - several mA/V. Experimental results [40] show that video frequency results are attainable (clock frequency to 25 MHz).
Figure 4-5

A CMOS folded-cascode one stage op-amp
B. The Switched-Capacitor PE

The PE as defined in Figure 4-2 is to be realized in high-speed SC hardware. This can be accomplished using the circuit shown in Figure 4-6(a) as proposed by Mulawka [43]. Assuming the ideal case the output expression for this circuit is given by

$$y_{PE}(nT) = \frac{C_1}{C} x_{IN}(nT-T) + \frac{C_2}{C} y_{IN}(nT-T) + y'_{IN}(nT-T)$$

(4-4)

where \( T = \) clock cycle period = 1 pixel time, as derived in [43].

In the z-domain

$$y_{PE} = (w_1 X_{IN} + w_2 Y_{IN} + Y'_{IN}) z^{-1}$$

(4-5)

as required (equation 4-3), where \( w_1 = C_1/C \) and \( w_2 = C_2/C \).

Note that this form of realization implies that filter coefficients are less than one in magnitude. This condition can be obtained by scaling the recursive equation (4-2) so that gain stages are not required.

This and all other SC circuits described in this work are controlled by a bi-phase 50% duty cycle clock pattern as shown in Figure 4-6(c). In a semi-systolic structure, as considered here, clocking is global, with all PE's receiving the same clock signal. The clock period is equal to the sampling period (or pixel time). The sampling rate should be chosen to satisfy the Nyquist criterion. In practice a sampling rate of approximately three to four times the video bandwidth is chosen.

Figure 4-6(a) shows the design with single-ended op-amps, while Figure 4-6(b) is the same design based on a fully differential op-amp. The fully differential configuration has the advantage of being less sensitive to parasitic effects. (Stray capacitances associated with capacitor bottom plates can be neglected because they are either grounded or voltage driven.)
Figure 4-6  SC PE realization
(a) using single-ended op-amp
(b) using fully differential op-amp
(c) bi-phase 50% duty cycle clock pattern
C. Negative Filter Coefficients

Each PE has two filter coefficients \( a_v, b_v \) associated with it which can be either positive or negative. Of course, when using fully differential op-amps, sign changes can be accomplished simply by reversing the connection to the output terminal pair that provides a given signal. In this way, for example, \( X_{n1} \) becomes \(-X_{n1}\), \( Y_{n1} \) becomes \(-Y_{n1}\) etc. and equation 4-2 can be satisfied for coefficients of either sign.

Another method of introducing sign changes is the introduction of dual broadcast lines, allowing either inverted or non-inverted data to be sent to each PE. (Note that in the literature on systolic structures, lines through which PE's receive the same data in parallel are called broadcast lines.) Figure 4-7 shows the same 2 x 2 structure as in Figure 4-1 but with dual broadcast lines. Essentially each of the line delays provides both inverted and non-inverted data outputs and this must be the case also for the input \( X \) and the output \( Y \).
Figure 4-7

2 x 2 semi-systolic structure with dual broadcast lines
4-3.3 BANK OUTPUT SUMMER

SC circuitry for the bank output summer is shown in Figure 4-8. In the case of a 2 x 2 structure a switched-capacitor adder generates the output as the sum of three partial results i.e. $Y(z_1,z_2) = Y_0(z_1,z_2) + Y_1(z_1,z_2) + Y_2(z_1,z_2)$. For the general case ($N \times N$), $N+1$ partial results are added, and the circuitry is extended by adding more input capacitor stages, each connected to the virtual ground point of the circuit.

![Diagram](image)

Figure 4-8

Switched-capacitor addition of partial results (2 x 2 case)
4-4 CONCLUSION

A 2-D semi-systolic recursive filter has been realized in switched-capacitor circuitry. The application of systolic architecture together with switched-capacitor circuit techniques offers advantages for the efficient implementation of 2-D filters in VLSI circuitry. The use of the folded-cascode amplifier configuration makes it possible to design for minimum settling time response. This will allow higher speed operation of SC circuits since it is essentially the op-amp settling behavior that constrains the upper bound of the filter sample rate.
CONCLUSION

In this dissertation, new architectures for 2-D digital processing have been presented. Hardware design of these architectures for real-time applications has been discussed. These architectures are derived directly from 2-D transfer functions. Sample-and-hold type realizations operating on sampled data input, based on the general discrete-time transfer function $H(z_1, z_2)$ have been developed along with a hardware prototype. Realizations based on a hybrid type transfer function $H(z, s)$, that are suitable for operation directly on input data in raster scanned format, were developed along with a hardware prototype. Both IIR and FIR filters can be implemented with these approaches. The hardware structures developed are modular, extendible to higher order filtering, and can be economically produced using commercially available components.

A novel function block approach has been developed, which exploits the modularity and global clock timing of systolic structures, to reduce the complexity of simulation of 2-D structures operating on 2-D data, while allowing the use of a general purpose simulation program, SPICE.

A 2-D systolic architecture has been implemented in switched-capacitor circuitry, making possible the development of a 2-D signal processor that is self-contained on a single analog video LSI integrated circuit.
There are numerous possibilities for the continuation of this research in the future.

Some of these are:

1) Investigating applications of 2-D real-time filters in the area of advanced television systems (ATV), e.g. coding for data rate reduction, comb filtering, TV signal format conversion, etc.

2) The development of a switched capacitor version of a 2-D hybrid filter.

3) VLSI implementation of the PE's and systolic architecture using the SPS technique for line delay, to produce a self-contained analog video IC.

4) Investigating the application of special DSP IC's in the area of real-time or high-speed filtering for video.

Material in this dissertation has been published in journals and conferences. These are cited in bibliographic references [17] and [47 - 54]. Also the 2-D semi-systolic and 2-D hybrid architectures are registered with the U.S. patent office -- patent no.'s 5,245,433 and 5,122,788. (The present author being the co-inventor).
REFERENCES


VITA AUCTORIS

Herbert Kaufman received the B.A.Sc. degree in electrical engineering from the University of Windsor, Windsor, Ont., Canada. He also holds a Diploma of Technology from St. Clair College, Windsor, Ontario. He is currently working toward the Ph.D. degree in the Electrical Engineering Department at the University of Windsor. His research work is in the area of advanced television circuits and systems.

He has eleven years of industrial experience in the area of electronics and communications.