throbber
H E W L E
`
`' - P A C K A R D
`
`JOURNAL
`
`April 1995
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`H E W L E T T
`PACKARD
`
`AMAZON-1021
`8,204,134
`
`

`

`April 1995 Volume 46 • Number 2
`
`H E W L E T T - P A C K A R D
`
`JOURNAL
`
`Articles
`
`6
`12
`23
`
`A Low-Cost, High-Performance PA-RISC Workstation with Built-in Graphics, Multimedia, and
`Networking Capabilities, by Roger A. Pearson
`
`The PA 7100LC Microprocessor: A Case Study of 1C Design Decisions in a Competitive
`Environment, by Mick Bass, Patrick Knebel, David W. Quint, and William L Walker
`
`Design Blanchard, for the PA 7100LC Microprocessor, by Mick Bass, Terry W. Blanchard,
`D. Douglas Josephson, Duncan Weir, and Daniel L Halperin
`
` An I/O McAllister, on a Chip, by Thomas V. Spencer, Frank J. Lettang, Curtis R. McAllister,
`Anthony L Riccio, Joseph F. Orth, and Brian K. Arnold
`
`A X An Workstation, Graphics Accelerator for a Low-Cost Multimedia Workstation, by Paul Martin
`
`HP Color Recovery Technology, by Anthony C. Barkans
`
`R Y T r u e C o l o r
`
`I Real-Time Software MPEG Video Decoder on Multimedia-Enhanced PA 7100LC Processors,
`by Ruby B. Lee, John P. Beck, Joel Lamb, and Kenneth E. Severson
`
`j Overview of the Implementation of the PA 7100LC Multimedia Enhancements
`
`HPTeleShare: Integrating Telephone Capabilities on a Computer Workstation, by S. Paul
`Tucker
`
`72
`
`Caller-ID
`
`/ -4 Call Progress, DTMF Tones, and Tone Detection
`
`Product Design of the Model 712 Workstation and External Peripherals, by Arlen L Hoesner
`
`Editor. Richard P. Dolan • Associate Editor. Charles I. Leath • Publication Production Manager, Susan E. Wright • illustration, Renee D. Pighini
`
`Advisory Thomas Rajeev Badval. Integrated Circuit Business Division. Fort Collins. Colorado » Thomas Beecher. Open Systems Software Division, Cheimsford,
`Massachusettes Chou, Steven Brittenham, Disk Memory Division, Boise, Idaho » Wliliam W. Brown, Integrated Circuit Business Division. Santa Clara, California » Harry Chou,
`Microwave Technology Division, Santa Rosa. California» Rajesh Desai, Commercial Systems Division. Cupertino, California» Kevin G. Ewert, Integrated Systems Division,
`Sunnyvale, Hardcopy • Bernhard Fischer, Boblingen Medical Division, Boblingen, Germany • Douglas Gennetten, Greeley Hardcopy Division, Greeley, Colorado
`» Gary Gordon, Semiconductor Laboratories. Palo Alto, California » Matt J. Marline, Systems Technology Division. Hoseville, California • Kiyoyasu Hiwada, Hachioji Semiconductor
`Test Division. Tokyo, Japan • Bryan Hoog, Lake Stevens Instrument Division, Everett, Washington » Roger L Jungerman, Microwave Technology Division, Santa Rosa.
`California» Ruby B. Lee, Networked Systems Group, Cupertino. California* Alfred Maute. Waldbronn Analytical Division, Waldbronn, Germany» Dona L. Miller. Worldwide
`Customer Loveland, Division. Mountain View. California» Michael P. Moore, VXI Systems Division, Loveland, Colorado» Shelley I. Moore, San Diego Printer Division.
`San Diego, Systems • M. Shahid Mujtaba. HP Laboratories, Palo Alto. California » Steven J. Narciso, VXI Systems Division. Loveland. Colorado » Danny J- Oldfield,
`Colorado Springs Division, Colorado Springs, Colorado » Garry Orsolini, Software Technology Division, Roseville, California • Han Tian Phua. Asia Peripherals Division,
`Singapore • Germany Software HP Laboratories, Palo Alto. California » Gunter Riebesell. Boblingen Instruments Division. Boblingen, Germany • Marc Sabatella. Software
`Engineering Stenton, Division, Fort Collins, Colorado» Michael B. Saunders. Integrated Circuit Business Division, Corvallis, Oregon» Philip Stenton, HP Laboratories
`Bristol, Willits, England» Stephen R. Undy, Systems Technology Division, FortCollins, Colorado» Jim Willits, Network and System Management Division, fort Collins,
`Colorado • Oregon Yanagawa, Kobe Instrument Division, Kobe. Japan » Dennis C. York, Corvallis Division, Corvallis. Oregon • Barbara Zimmer. Corporate Engineering.
`Palo Alto, California
`
`©Hewlett-Packard Company 1995 Printed in U.S.A..
`
`April 1995 Hewlett-Packard Journal
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`

`

`I Development of a Low-Cost, High-Performance, Multiuser Business Server System, by Dennis
`A. Bowers, Gerard M. Enkerlin, and Karen L Murillo
`
`| R HP Applications Smalltalk: A Tool for Developing Distributed Applications by Eileen Keremitsis
`and Ian J. Fuller
`
`j Object Management Group
`
`j A Software Solution Broker for Technical Consultants, by Manny Yousefi, Adel Ghoneimy, and
`Wulf Hender
`
`| HP Software Solution Broker Accessible Products
`
`J / Bugs Jack Black and White: Imaging 1C Logic Levels with Voltage Contrast, by Jack D. Benzel
`
`1 H"? Component and System Level Design-for-Testability Features Implemented in a Family of
`I U / Workstation Products, by Bulent I. Dervisoglu and Michael Ricchetti
`
`Departments
`
`4 I n t h i s I s s u e
`5 C o v e r
`5 W h a t ' s A h e a d
`1 1 4 A u t h o r s
`
`T h e H e w l e t t - P a c k a r d J o u r n a l i s p u b l i s h e d b i m o n t h l y b y t h e H e w l e t t - P a c k a r d C o m p a n y t o r e c o g n i z e t e c h n i c a l c o n t r i b u t i o n s m a d e b y H e w l e t t - P a c k a r d
`(HP) personnel. While the information found in this publication is believed to be accurate, the Hewlett-Packard Company disclaims all warranties of
`merchantability and fitness for a particular purpose and all obligations and liabilities for damages, including but not limited to indirect, special, or
`consequential damages, attorney's and expert's fees, and court costs, arising out of or in connection with this publication.
`
`Subscriptions: The Hewlett-Packard Journal is distributed free of charge to HP research, design and manufacturing engineering personnel, as well as to
`qualified address individuals, libraries, and educational institutions. Please address subscription or change of address requests on printed letterhead (or
`include the submitting card) to the HP headquarters office in your country or to the HP address on the back cover. When submitting a change of address,
`please not your zip or postal code and a copy of your old label. Free subscriptions may not be available in all countries.
`
`The Hewlett-Packard Journal is available online via the World-Wide Web (WWW) and can be viewed and printed with Mosaic. The uniform resource
`locator (URL) for the Hewlett-Packard Journal is http://www.hp.com/hpj/Journal.html.
`
`S u b m i s s i o n s : w i t h a r t i c l e s i n t h e H e w l e t t - P a c k a r d J o u r n a l a r e p r i m a r i l y a u t h o r e d b y H P e m p l o y e e s , a r t i c l e s f r o m n o n - H P a u t h o r s d e a l i n g w i t h
`HP-related contact or solutions to technical problems made possible by using HP equipment are also considered for publication. Please contact the
`E d i t o r b e f o r e a r t i c l e s s u c h a r t i c l e s . A l s o , t h e H e w l e t t - P a c k a r d J o u r n a l e n c o u r a g e s t e c h n i c a l d i s c u s s i o n s o f t h e t o p i c s p r e s e n t e d i n r e c e n t a r t i c l e s
`a n d m a y a r e l e t t e r s e x p e c t e d t o b e o f i n t e r e s t t o r e a d e r s . L e t t e r s s h o u l d b e b r i e f , a n d a r e s u b j e c t t o e d i t i n g b y H P
`
`Copyright fv granted Hewlett-Packard Company. All rights reserved. Permission to copy without fee all or part of this publication is hereby granted provided
`that 1) advantage; Company are not made, used, displayed, or distributed for commercial advantage; 2) the Hewlett-Packard Company copyright notice and the title
`of the stating and date appear on the copies; and 3) a notice appears stating that the copying is by permission of the Hewlett-Packard Company.
`
`Please Journal, inquiries, submissions, and requests to: Editor, Hewlett-Packard Journal, 3000 Hanover Street, Palo Alto, CA 94304 U.S.A.
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`April 1995 Hewlett-Packard Journal 3
`
`

`

`Real-Time Software MPEG Video
`Decoder on Multimedia-Enhanced PA
`7100LC Processors
`With including combination of software and hardware optimizations, including the
`availability of PA-RISC multimedia instructions, a software video player
`running on a low-end workstation is able to play MPEG compressed video
`at 30 frames/s.
`
`by Ruby B. Lee, John P. Beck, Joel Lamb, and Kenneth E. Severson
`
`Traditionally, computers have improved productivity by
`helping people compute faster and more accurately. Today,
`computers can further improve productivity by helping
`people communicate better and more naturally. Towards
`this end, at Hewlett-Packard we have looked for more natu
`ral ways to integrate communication power into our desktop
`machines, which would allow a user to access distributed
`information more easily and communicate with other users
`more readily.
`
`We felt that adding audio, images, and video information
`would enrich the information media of text and graphics
`normally available on desktop computers such as work
`stations and personal computers. However, for such en
`riched multimedia communications to be useful, it must be
`fully integrated into the user's normal working environment.
`Hence, as the technology matured we decided to integrate
`increasing levels of multimedia support into both the user
`interface and the basic hardware platform.
`
`In terms of user interface, we integrated a panel of multi
`media icons hito the HP VUE standard graphical user inter
`face, which comes with all HP workstations. These multi
`media icons are part of the HP MPower product.1 HP
`MPower enables a workstation user to receive and send
`faxes, share printers, access and manipulate images, hear and
`send voice and CD-quality stereo audio, send and receive
`multimedia email, share an X window or an electronic white
`board with other distributed users, and capture and play back
`video sequences. The HP MPower software is based on a
`client/ server model, in which one server can service around
`20 clients, which can be workstations or X terminals.
`
`hi terms of hardware platforms, we integrated successive
`levels of multimedia support into the baseline PA-RISC work
`stations.2-3'4 First, we integrated support for all the popular
`image formats such as JPEG (Joint Photographic Experts
`Group)t compressed images.5 Then, we added hardware
`and software support for audio, starting with 8-kHz voice-
`quality audio, followed by support for numerous audio for
`mats including A-law, (i-law, and 16-bit linear mode, with up
`to 48-kHz mono and stereo. This allowed high-fidelity,
`
`t JPEG is an international digital image compression standard for continuous-tone (multi
`level) still ¡mages (grayscale and color).
`
`44.1-kHz stereo, 16-bit CD-quality audio to be recorded,
`manipulated, and played back on HP workstations. At the
`same time, we supported uncompressed video capture and
`playback.
`
`In January 1994, HP introduced HP MPower 2.0 and the
`entry-level enterprise workstation, the HP 9000 Model 712,
`which is based on the multimedia-enhanced PA-RISC pro
`cessor known as the PA 7100LC.6'7'8 The video player inte
`grated in the MPower 2.0 product is the first product that
`achieves real-time MPEG-1 (Moving Picture Experts
`Group)9 video decompression via software running on a
`general-purpose processor. Typically real-time MPEG-1 de
`compression is achieved via special-purpose chips or
`boards. Previous attempts at software MPEG-1 decompres
`sion did not attain real-time rates.10 The fact that this is
`achieved by the low-end Model 712 workstation is significant.
`
`In this paper, we discuss the support of MPEG-compressed
`video as a new (video) data type. In particular, we discuss
`the technology that enables the video player integrated into
`the HP MPower 2.0 product to play back MPEG-compressed
`video at real-time rates of up to 30 frames per second.
`
`Digital Video Standards
`We decided to focus on the MPEG digital video format be
`cause it is an ISO (International Standards Organization)
`standard, and it gives the highest video fidelity at a given
`compression ratio of any of the formats that we evaluated.
`MPEG also has broad support from the consumer electron
`ics, telecommunications, cable, and computer industries.
`The high compression capability of MPEG translates into
`lower storage costs and less bandwidth needed for transmit
`ting video on the network. These characteristics make
`MPEG an ideal format for addressing the need for detail in
`the video used in technical workstation markets and com
`puter-based training in commercial workstation markets.
`
`MPEG is one of several algorithmically related standards
`shown in Fig. 1. All of these digital video compression stan
`dards use the discrete cosine transform (DCT) as a funda
`mental component of the algorithm. Alternatives to discrete
`cosine-based algorithms that we looked at include vector
`quantization, fractals, and wavelets. Vector quantization
`
`60 April 1995 Hewlett-Packard Journal
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`

`

`(image) of an H.261 sequence is for all practical purposes a
`highly compressed lossy JPEG image. Subsequent frames
`are built from image fragments (blocks) that are either
`JPEG-like or are differences from the image fragments in
`pre\ious frames. Most video sequences have high frame-to-
`frame coherence. This is especially true for video conferenc
`ing. Because the encoding of the movement of a piece of an
`image requires less data than an equivalent JPEG fragment.
`H.261 achieves higher isual fidelity for a given bandwidth
`than does motion JPEG. Since the encoding of the differ
`ences is always based on the previous frames, the technique
`is called/o ncard differencing.
`
`The MPEG-1 specification goes even further than H.261 in
`allowing sophisticated techniques to achieve high fidelity
`with fewer bits. In addition to forward differencing, MPEG-1
`allows backward differencing (which relies on information
`in a future frame) and averaging of image fragments. (For
`ward and backward differencing are described in more de
`tail in the next section.) MPEG-1 achieves quality compara
`ble to a professionally reproduced VHS videotape even at a
`single-speed CD-ROM data rate (1.5 Mbits/s).9'11 MPEG-1
`also specifies encodings for high-fidelity audio synchro
`nized with the video.
`
`MPEG-2 contains additional specifications and is a superset
`of MPEG-1. The new features in MPEG-2 are targeted at
`broadcast television requirements, such as support for
`frame interleaving similar to analog broadcast techniques.
`With widespread deployment of MPEG-2, the digital revolu
`tion for video may be comparable to the digital audio revolu
`tion of the last decade.
`
`The approximate bandwidths required to achieve a level of
`subjective visual fidelity for motion JPEG, H.261, MPEG-1,
`and MPEG-2 are shown in Fig. 2. Motion JPEG will primarily
`be used for cases in which accurate frame editing is impor
`tant such as video editing. H.261 will be used primarily for
`video conferencing, but it also has potential for use in video
`mail. MPEG-1 and MPEG-2 will be used for publishing,
`where fidelity expectations have been set by consumer ana
`log video tapes, computer-based training, games, movies on
`CD, and video on demand.
`
`MPEG Compression
`MPEG has two classes of frames: intracoded and non-
`inlracoded frames (see Fig. 3). Intracoded frames, also called
`I-frames, are compressed by reducing spatial redundancy
`within the frame itself. I-frames do not depend on compari
`sons with past or "reference" frames. They use JPEG-type
`compression for still images.5
`
`— International
`Image Standard
`
`Fig. transform. Digital video standards based on the discrete cosine transform.
`
`algorithms are popular on older computer architectures be
`cause they require less computing power to decompress, but
`this advantage is offset by poorer image quality at low band
`width (high compression) compared to MPEG for practical
`vector quantization methods. Algorithms based on wavelet
`and fractal technology have the potential to deliver video
`fidelity comparable to MPEG, but there is presently a lack of
`industry consensus on standardization, a key requirement for
`our use.
`Another advantage of a high-performance implementation of
`MPEG is the ability to leverage the improvements to the
`other DCT-based algorithms. Although the relationships
`shown in Fig. 1 do not represent a true hierarchy of algo
`rithms is useful for illustrating increased complexity as one
`moves from JPEG to MPEG-2, or from H.261 to MPEG-2.
`
`All of these formats have much in common, such as the use
`of the DCT for encoding. The visual fidelity of the algorithms
`was the key selection criterion and not ease of implementa
`tion or performance on existing hardware.
`
`Although JPEG supports both lossy and lossless compres
`sion, the term JPEG is typically associated with the lossy
`specification, t The primary goal of JPEG is to achieve high
`compression of photographic images with little perceived
`loss of image fidelity. Although it is not an ISO standard, by
`convention, a sequence of JPEG lossy images to create a
`digital video sequence is called motion JPEG, or MJPEG.
`
`H.261 is a digital video standard from the telecommunica
`tions standards body ITU-TSS (formerly known as CCITT).
`H.261 is one of a suite of conferencing standards that make
`up the umbrella H.320 specification. H.261 is often referred
`to as P*64 (where P is an integer) because it was designed
`to fit into multiples of 64 kbits/s bandwidth. The first frame
`t In lossless compression, decompressed data is identical to the original image data. In lossy
`compression, decompressed data is a good approximation of the original image data.
`
`30,000
`
`- 1 0 , 0 0 0
`
`! 3 , 0 0 0 -
`
`: 1 , 0 0 0
`
`Ã(cid:173) - 100
`
`•
`
`Uses
`
`ction and Broadcast
`^ 0 ^ Â ¡ 7 . 2 | P r o d u
`
`MPEG-1
`
`I Computer Based Training,
`f Analysis, and Monitoring
`
`} Conferencing and Video Mail
`
`Fig. 2. Compressed video band-
`uiillh versus subjective visual
`Fidelity. The ideal format achieves
`exceptional visual fidelity at the
`lowest bandwidth.
`
`30
`10
`
`P o o r A d e q u a t e G o o d V e r y G o o d E x c e l l e n t E x c e p t i o n a l
`Visual Fidelity
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`April 1995 Hewlett-Packard Journal 61
`
`

`

`Forward Prediction
`
`Bidirectional Prediction
`
`| I = lntracoded Frame (Like JPEG)
`
`I | P = Predicted Frame
`
`J B = B¡d¡rectionally Predicted Frame
`
`Nonintracoded frames are further divided into P-frames and
`B-frames. P-frames are predicted frames based on compari
`sons with an earlier reference frame (an intracoded or pre
`dicted frame). By considering temporal redundancy in addi
`tion to spatial redundancy, P-frames can be encoded with
`fewer bits. B-frames are bidirectionally predicted frames
`that require one backward reference frame and one forward
`reference frame for prediction. A reference frame can be an
`I-frame or a P-frame, but not a B-frame. By detecting the
`motion of blocks from both a frame that occurred earlier
`and a frame that will be played back later in the video
`sequence, B-frames can be encoded in fewer bits than I- or
`P-frames.
`
`Each frame is divided into macroblocks of 16 by 16 pixels
`for the purposes of motion estimationt in MPEG compression
`and motion compensation in MPEG decompression. A frame
`with only I-blocks is an I-frame, whereas a P-frame has P-
`blocks or I-blocks, and a B-frame has B-blocks, P-blocks, or
`I-blocks. For each P-block in the current frame, the block in
`the reference frame that matches it best is identified by a
`motion vector. Then the differences between the pixel values
`in the matching block in the reference frame and the current
`block in the current frame are encoded by a discrete cosine
`transform.
`
`The color space used is the YCbCr color representation
`rather than the RGB color space, where Y represents the
`luminance (or brightness) component, and Cb and Cr repre
`sent the chrominance (or color) components. Because
`human perception is more sensitive to luminance than to
`chrominance, the Cb and Cr components can be subsampled
`in both the x and y dimensions. This means that there is one
`Cb value and one Cr value for every four Y values. Hence, a
`16-by-16 macroblock contains four 8-by-8 blocks of Y, and
`only one 8-by-8 block of Cb and one 8-by-8 block of Cr val
`ues (see Fig. 4). This is a reduction from the twelve 8-by-8
`blocks (four for each of the three color components) if Cb
`
`Fig. 3. MPEG frame sequencing.
`
`and Cr were not subsampled. The six 8-by-8 blocks in each
`16-by-16 macroblock then undergo transform coding.
`
`Transform coding concentrates energy in the lower fre
`quencies. The transformed data values are then quantized by
`dividing by the corresponding quantization coefficient. This
`results in discarding some of the high-frequency values, or
`lower-frequency but low-energy values, since these become
`zeros. Both transform coding and quantization enable further
`compression by run-length encoding of zero values.
`
`Finally, the nonzero coefficients of an 8-by-8 block used in
`the discrete cosine transform can be encoded via variable-
`length entropy encoding such as Huffman coding. Entropy
`encoding basically removes coding redundancy by assigning
`the code words with the fewest number of bits to those co
`efficients that occur most frequently.
`
`X = Luminance (Y|
`0 = Chrominance (Cb, Cr)
`
`t Motion estimation uses temporal redundancy to estimate the movement of a block from
`one frame to the next.
`
`Fig. with Subsampling of the chrominance components (Cb, Cr) with
`respect to the luminance (Y) component.
`
`62 April 1995 Hewlett-Packard Journal
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`

`

`we found ways to reduce the execution time of the most
`frequent operation sequences. The application of algorithm
`enhancements, software tuning, and projected hardware
`enhancements was iterated until we attained our goal of
`being able to decompress at a rate greater than 15 frames/s
`via software.
`
`Algorithm and Software Optimizations
`In terms of MPEG \ideo algorithms, we improved on the
`Huffman decoder, the motion compensation, and the inverse
`discrete cosine transform. A faster Huffman decoder based
`on a hybrid of table lookup and tree-based decoding is used.
`The lookup table sizes were chosen to reduce cache misses.
`For motion compensation, we sped up the pixel averaging
`operations.
`
`For the inverse discrete cosine transform, we use a faster
`Fourier transform, which significantly reduces the number
`of multiplies for each two-dimensional 8-by-8 inverse dis
`crete cosine transform. In addition, we use the fact that the
`8-by-8 inverse transform matrices are frequently sparse to
`further reduce the multiplies and other operations required.
`
`The MPEG audio decompression is also done in software.
`This algorithm was improved by using a 32-point discrete
`cosine transform to speed up the subband filtering.12
`
`In terms of software tuning, we "flattened" the code to re
`duce the number of procedure calls and returns, and the
`frequent building up and tearing down of contexts present in
`the original MPEG code. We also did "strength reductions"
`like as multiplications to simpler operations such as
`shift and add or table lookup.
`
`The last column of Table I shows the percentage of execu
`tion time spent in each of the six MPEG decompression steps
`after the algorithm and software tuning improvements were
`made. The first two columns of Table I show the millions of
`instructions executed in each of the six decompression steps
`and the percent of the total instructions executed (path
`length) each step represents. The input video sequence was
`an MPEG-compressed clip of a football game. The total time
`taken was 7.45 seconds on an HP 9000 Model 735 99-MHz
`PA-RISC workstation, with 256K bytes of instruction cache
`and 256K bytes of data cache.
`
`Table I
`I n s t r u c t i o n s a n d T i m e S p e n t i n e a c h M P E G D e c o m p r e s s i o n
`Step on an HP 9000 Model 735
`
`M i l l i o n s o f P a t h L e n g t h T i m e ( % )
`I n s t r u c t i o n s ( % )
`
`H e a d e r d e c o d e 0 . 6 0 . 1 0 . 1
`H u f f m a n d e c o d e 5 5 . 3 1 0 . 2 7 . 5
`I n v e r s e q u a n t i z a t i o n 8 . 7 1 . 6 2 . 4
`I n v e r s e d i s c r e t e 2 0 6 . 5 3 8 . 3 3 8 . 7
`
`MPEG Decompression
`MPEG decompression reverses the functional steps taken
`for MPEG compression. There are six basic steps involved
`in MPEG decompression.
`1. The MPEG header is decoded. This gives information
`such as picture rate, bit rate, and image size.
`
`2. The video data stream is Huffman or entropy decoded
`from variable-length codes into fixed-length numbers. This
`step includes run-length decoding of zeros.
`
`3. Inverse quantization is performed on the numbers to
`restore them to their original range.
`
`4. An inverse discrete cosine transform is performed on the
`8-by-8 blocks in each frame. This converts from the frequency
`domain back to the original spatial domain. This gives the
`actual pixel values for I-blocks, but only the differences for
`each pixel for P-blocks and B-blocks.
`
`5. Motion compensation is performed for P-blocks and B-
`blocks. The differences calculated in step 4 are added to the
`pixels in the reference block as determined by the motion
`vector for P-blocks and to the average of the forward and
`backward reference blocks for B-blocks.
`
`6. The picture is displayed by doing a color conversion from
`YCbCr coordinates to RGB color coordinates and writing to
`the frame buffer.
`
`Methodology
`Our philosophy was to improve the algorithms and tune the
`software first, resorting to hardware support only if neces
`sary. We set a goal of 10 to 15 frames/s for software MPEG
`video decompression because this is the rate at which mo
`tion appears smooth rather than jerky.
`
`We started by measuring the performance of the MPEG soft
`ware we had purchased. This software initially took two
`seconds to decode one frame (0.5 frame/s) on an older
`50-MHz Model 720 workstation. This decoding was for video
`only and did not include audio. Profiling indicated that the
`inverse discrete cosine transform (step 4) took the largest
`chunk of the execution time, followed by display (step 6),
`followed by motion compensation (step 5). The decoding of
`the MPEG headers was insignificant.
`
`With this data we set out to optimize every step in the MPEG
`decompression software. After we applied all the algorithm
`enhancements and software tuning, we measured the MPEG
`decode software again. While we had achieved an order of
`magnitude improvement, the rate of 4 to 5 frames/s was not
`sufficient to meet our goal.
`
`Hence, we looked at possible multimedia enhancements to
`the basic PA-RISC processor and other system-level en
`hancements that would not only speed up MPEG decoding,
`but also be generally useful for improving performance in
`other computations. In addition, any chip enhancements we
`added could not adversely impact the design schedule, com
`plexity, cycle time, and chip size of the PA-RISC processor
`we were targeting, the PA 7100LC, which was already deep
`into its implementation phase at the time. The PA 7100LC is
`described in detail in the article on page 12.
`
`We approached this problem by studying the distribution of
`operations executed by the software MPEG decoder. Then,
`
`© Copr. 1949-1998 Hewlett-Packard Co.
`
`April 1995 Hewlett-Packard Journal 63
`
`

`

`The largest slice of execution time (38.7%) and the largest
`chunk of instructions executed (38.3%) were still the inverse
`discrete cosine transform. We studied the frequencies of
`generic operations in this group and attempted to execute
`them faster. This resulted in new PA-RISC processor in
`structions for accelerating multimedia software.
`
`PA-RISC Processor Enhancements
`The new processor multimedia instructions implemented
`in the PA 7100LC processor allow simple arithmetic opera
`tions to be executed in parallel on subword data in the stan
`dard integer data path. In particular, the integer ALU is parti
`tioned so that it can execute a pair of arithmetic operations
`in a single cycle with a single instruction. The arithmetic
`operations accelerated in this way are add, subtract, aver
`age, shift left and add, and shift right and add. The latter two
`operations are effective in implementing multiplication by
`constants.
`
`PA-RISC Multimedia Extensions 1.0. The PA 7100LC PA-RISC
`processor chip contains some instructions that operate inde
`pendently and in parallel on two 16-bit data fields within a
`32-bit register. These operations are independent in that bits
`carried or shifted out of one of the fields never affects the
`result in the other field. These operations occur in parallel in
`that a single instruction computes both 16-bit fields of the
`result. Table II summarizes these instructions.
`
`HADD does two parallel 16-bit additions on the left and the
`right halves of registers ra and rb, placing the two 16-bit re
`sults into the left and right halves of register rt.
`
`HSU B does two parallel 16-bit subtractions on the left and
`right halves of registers ra and rb, placing the two 16-bit re
`sults into the left and right half of register rt.
`
`Both HADD and HSUB perform modulo arithmetic (modulus
`216), that is, the result wraps around from the largest number
`back to the smallest number and vice versa. This is the usual
`mode of operation of twos complement adders when over
`flow is ignored.
`
`HADD and HSUB also have two saturation arithmetic options.
`With the signed saturation option, HADD.ss, both operands
`and the result are considered signed 16-bit integers. If the
`result cannot be represented as a signed 16-bit integer, it is
`clipped to the largest positive value (215-1) if positive over
`flow occurs, or it is clipped to the smallest negative value
`(-215) if negative overflow occurs.
`
`With the unsigned saturation option, HADD.us, the first oper
`and (ra) is considered an unsigned 16-bit integer, the second
`operand (rb) is considered a signed 16-bit integer, and the
`result (in rt) is considered an unsigned 16-bit integer. If the
`result cannot be represented as an unsigned 16-bit integer, it
`is clipped to the largest unsigned value (216-1) if positive
`overflow occurs, or it is clipped to the smallest unsigned
`value (0) if negative overflow occurs.
`
`The signed saturation and unsigned saturation options for
`parallel halfword subtraction are defined similarly.
`
`HAVE, or halfword average, gives the average of each pair of
`halfwords in ra and rb. It takes the sum of parallel halfwords
`and does a right shift of one bit before storing each 16-bit
`result into rt. During the one-bit right shift, the carry is
`
`Table II
`PA-RISC Multimedia Instructions in PA 7100LC
`ra contains a1; a2
`rb contains b1; b2
`rt contains t1; t2
`Parallel Operation
`
`Instruction
`
`HADD ra,rb,rt
`
`HADD.ss ra,rb,rt
`
`HADD.us ra,rb,rt
`
`HSUB ra.rb.rt
`
`HSUB.ssra,rb,rt
`
`HSUB. us ra.rb.rt
`
`tl =(a1+b1)mod216;
`I2 = (a2+b2)mod216;
`
`tl = IF(a1+b1) > (215-1)THEI\I(215-1)
`ELSEIF(a1+b1) < -215THEN (-215)
`ELSE(a1+b1);
`t2=IF(a2+b2) > (215-1)THEN (215-1)
`ELSEIF(a2+b2) < -215THEN (-215)
`ELSE(a2+b2);
`
`t1=IF(a1+b1) > (216-1)THEN(216-1)
`ELSEIF(a1+b1) < OTHENO
`E L S E ( a H b l ) ;
`t2=IF(a2+b2) > (216-1)THEN (216-1)
`ELSEIF(a2+b2) < OTHENO
`ELSE (a2+b2);
`
`tl =(a1-b1)mod216;
`t2 = (a2-b2)mod216;
`
`tl =IF(a1-b1) > (215-1)THEN(215-1)
`ELSEIF(al-bl) < -215THEN (-215)
`E L S E ( a l - b l ) ;
`t2=IF(a2-b2) > (215-1)THEN(215-1)
`ELSEIF(a2-b2) < -215THEN (-215)
`ELSE(a2-b2);
`
`tl =IF(al-b1) > (216-1)THEN(216-1)
`E L S E I F ( a l - b l ) < O T H E N O
`E L S E ( a l - b l ) ;
`t2=IF(a2-b2) > (216-1)THEN (216-1)
`ELSEIF(a2-b2) < OTHENO
`ELSE (a2-b2);
`
`HAVE ra.rb.rt
`
`tl =(a1+b1)/2;
`12 = (a2+b2)/2;
`
`HSLkADD ra.k.rb.rt
`
`tl =
`
`(for k = 1,2, or 3)
`
`tl =(a1>k) + b1;
`t2 = (a2>k) + b2;
`(for k =1,2, or 3)
`
`HSRkADD ra.k.rb.rt
`
`ss = signed saturation option
`us = unsigned saturation
`
`shifted in on the left and unbiased rounding is performed on
`the least-significant bit on the right. Because the carry is
`shifted in, no overflow can occur in the HAVE instruction.
`
`HSLkADD, or halfword shift left and add, allows one operand
`to be shifted left by k bits (where k is 1, 2, or 3) before being
`added to the other operand.
`
`HSRkADD, or halfword shift right and add, allows one operand
`to be shifted right by k bits (wher

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket