`
`ALTIVECWP/D
`
`Sam Fuller
`System Architecture & Product Planning Manager,
`Networking & Computing Core Technologies
`
`Motorola Inc.
`Semiconductor Product Sector
`6501 William Cannon Drive West, Austin, Texas 78735
`
`Introduction
`
`Over the last 25 years, microprocessors have enjoyed a con-
`tinuous increase in performance and attendant reduction in
`price/performance. Current best of breed microprocessors
`operate at frequencies in excess of 300 MHz and offer super-
`scalar instruction dispatch, sophisticated branch prediction
`techniques and support for high performance memory sys-
`tems including external second level cache controllers.
`
`As general purpose microprocessors have continued to
`become more powerful, they have been asked to perform
`increasingly complex tasks. In fact, the trend of doubling
`system performance every 1.5 to 2 years has not met the
`requirements of the networking and telecommunications
`infrastructure industry due to several emerging applications
`and trends. Example applications include the explosive
`growth of the Internet, the emergence of new digital com-
`munications technologies, including digital cellular phones
`employing CDMA, TDMA and PCS technologies, IP-based
`telephony, fax and multimedia and wireless messaging. A
`general trend in the industry is using programmable proces-
`sors to implement adaptive filters, modulators/demodula-
`tors, and other functions once only possible in hardware.
`These trends and applications have created tremendous
`opportunities for high-performance, high bandwidth proces-
`sors. These demanding new applications, along with the con-
`tinually increasing needs of the computing market, necessi-
`tated a new approach in how to maximize performance in
`order to provide our customers with the order of magnitude
`increase in key application performance they demand.
`
`To meet these needs, a new class of microprocessor product is
`called for. One which offers in a single chip solution the high-
`
`Branch
`Unit
`INST INST INST
`
`Integer
`Unit
`GPRs
`
`Floating-Point
`Unit
`FPRs
`
`Vector
`Unit
`VRs
`
`INST
`
`INST
`ADDR
`
`DATA
`ADDR
`
`DATA
`
`DATA
`Memory
`
`DATA
`
`Figure 1. High-level structural overview for PowerPC with
`AltiVec technology
`
`est level of processing performance while expanding the
`processor’s capabilities to concurrently address high-band-
`width data processing and the algorithmic intensive computa-
`tions which today are typically handled off-chip by other
`devices, such as dedicated hardware, DSP farms or custom
`ASICs. Motorola is introducing a new technology that pro-
`vides for this convergence in capabilities — AltiVec technology.
`
`AltiVec technology is Motorola’s high-performance vector
`parallel processing expansion to the PowerPC™ RISC
`processor architecture. Motorola microprocessors offering
`AltiVec technology will represent a new class of product. In
`addition to providing 100% compatibility with the industry-
`standard PowerPC Architecture™, AltiVec technology will
`also provide product designers and customers with a new
`“one part—one code base” approach to product design
`which simplifies design and support while simultaneously
`providing a tremendous jump in performance.
`
`1
`
`SAMSUNG-1010
`Page 1 of 4
`
`
`
`Motorola’s AltiVec Technology —White Paper
`
`vA
`vB
`vC
`
`vT
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`op
`
`Figure 2. Generic presentation of a four operand, 16-element, intra-element operation
`
`AltiVec Technology
`
`Motorola's AltiVec technology expands the current
`PowerPC architecture through the addition of a 128-bit
`vector execution unit, which operates concurrently with the
`existing integer and floating point units. This new engine
`provides for highly parallel operations, allowing for the
`simultaneous execution of up to 16 operations in a single
`clock cycle.
`
`AltiVec technology is a short vector parallel architecture.
`Depending on data size, vectors are 4, 8 or 16 elements long.
`This can be contrasted with the long vector architectures of
`supercomputers that were popular in the 1980s. Vector sizes
`for those machines ranged to hundreds of elements. The long
`vector approach of supercomputers, while useful for scien-
`tific calculations, is not optimal for the communications,
`multimedia and other performance-driven applications tar-
`geted by Motorola with AltiVec technology.
`
`AltiVec technology operations are performed on multiple
`data elements by a single instruction. This is often referred
`to as SIMD (single instructions, multiple data) parallel pro-
`cessing. AltiVec technology offers support for:
`
`• 16-way parallelism for 8-bit signed and unsigned integers
`and characters,
`• 8-way parallelism for 16-bit signed and unsigned integers
`
`00000000000000000000000000000
`
`00000000000000000000000000000
`
`vA
`
`vB
`
`vT
`
`+
`
`Figure 3. Sum Across — an inter-element arithmetic operation
`
`2
`
`• 4-way parallelism for 32-bit signed and unsigned integers
`and IEEE floating-point numbers
`
`AltiVec technology also includes a separate register file
`containing 32-entries, each 128-bits wide. These 128-bit
`wide registers hold the data sources for the AltiVec tech-
`nology execution units. The registers are loaded and
`unloaded through vector store and vector load instructions
`that transfer the contents of a single 128-bit register to and
`from memory.
`
`AltiVec technology can be most accurately thought of as a
`set of registers and execution units added to the PowerPC
`architecture in an analogous manner to the addition of float-
`ing point units. Floating point units were added to most
`mainstream microprocessor architectures several years ago
`to provide better support for high-precision scientific calcu-
`lations. AltiVec technology is being added to the PowerPC
`architecture to dramatically accelerate the next level of per-
`formance-driven, high-bandwidth communications and
`computing applications.
`
`Each AltiVec instruction specifies up to three source
`operands and a single destination operand. All operands are
`vector registers, with the exception of the load and store
`instructions and a few instruction types that provide
`operands from immediate fields within the instruction. 162
`new unique instructions are defined for the AltiVec technol-
`ogy. These instructions fall into the following major classes.
`
`1. Intra-Element Arithmetic Operations
`Intra-element arithmetic operations perform independent
`parallel computations on the elements contained in the
`source vector registers and place the results in the corre-
`sponding fields of the destination vector register. Both signed
`and unsigned integers and floating-point data types are sup-
`ported by the intra-element operations. The operations sup-
`port both saturation and modulo arithmetic. A variety of
`powerful intra-element operations are defined in the AltiVec
`technology: addition, subtraction, multiply, and multiply-
`
`SAMSUNG-1010
`Page 2 of 4
`
`
`
`add. Additional instructions perform min, max and average,
`as well as conversion between floating-point and 32-bit inte-
`ger numerical formats.
`
`2. Intra-Element Non-Arithmetic Operations
`Intra-element non-arithmetic operations include various
`forms of compare, shift, and rotate. The following logical
`operations are also supported: AND, OR, NOT, XOR,
`AND-NOT. A select instruction is also provided. This
`instruction is designed to select or choose source data from
`one of two source registers and transfer that data to the
`results register. The combination of compare and select pro-
`vides a powerful way to mask and replace data elements
`across the entire 16-byte field of the vector registers with a
`very few instructions.
`
`3. Inter-Element Arithmetic Operations
`A few special inter-element arithmetic operations are pro-
`vided in the AltiVec technology, these operations are sum of
`products and sum across. These operations allow for ele-
`ments within a single vector register to be summed in com-
`bination with a separate accumulation register. These opera-
`tions are valuable for generating dot products which are the
`most common vector operation.
`
`4. Inter-Element Non-Arithmetic Operations
`In addition to the powerful intra-element and inter-element
`arithmetic operations, AltiVec technology also defines a
`group of very powerful inter-element non-arithmetic opera-
`tions. These inter-element operations include wide field shift
`operations, pack and unpack operations, including a special
`operation to handle the 1/5/5/5 pixel format common for
`16-bit color pixels. Merge operations are also provided that
`can interleave data at the byte, halfword and word level.
`
`Perhaps the most powerful inter-element operation offered
`in the AltiVec technology is the permute operation. The per-
`mute operation is capable of arbitrarily selecting data with
`the granularity of a byte from two 16-byte source registers
`into a single 16-byte destination register.
`
`For operations where 8- and 16-bit data items must be
`reorganized in memory before or after computations, per-
`mute can save significant time. In many instances a single
`permute operation can operate on 16 bytes of data and
`replace 4 or 5 operations per byte using a traditional RISC
`or DSP operation.
`
`Motorola’s AltiVec Technology —White Paper
`
`01 14
`
`18
`
`10 16 15 19 1A 1C 1C 1C
`
`13
`
`08
`
`1D 1B OE
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9 A B C D E
`
`F
`
`10 11 12 13 14 15 16 17 18 19
`
`1A 1B 1C 1D 1E 1F
`
`vC
`
`vA
`
`vB
`
`vT
`
`Figure 4. The inter-element Permute operation
`
`Applications of AltiVec Technology
`
`The initial target applications for AltiVec technology
`include: IP telephony gateways, multi-channel modems,
`speech processing systems, echo cancelers, image and video
`processing systems, scientific array processing systems, as
`well as network infrastructure such as Internet routers and
`virtual private network servers.
`
`In addition to accelerating next-generation applications,
`AltiVec technology can, through its wide datapaths and wide
`field operations, also accelerate many time-consuming tradi-
`tional computing and embedded processing operations such
`as memory copies, string compares and page clears.
`
`Unlike fixed function solutions which are most often imple-
`mented as application specific integrated circuits, AltiVec
`technology will offer a programmable solution that can eas-
`ily migrate via software upgrades to follow changing stan-
`dards and customer requirements. The preferred program-
`ming environment is the C and C++ languages favored by
`
`Communication
`
`Control
`
`Computation
`
`bus
`
`DSP
`
`DSP
`
`DSP
`
`Interface
`Circuit
`
`Interface
`Circuit
`
`Controller
`
`DSP
`
`DSP
`
`DSP
`
`Memory
`
`DSP
`
`DSP
`
`DSP
`
`The powerful inter-element operations of AltiVec technology
`define a microprocessor not just capable of operating on 8,
`16 and 32-bit data elements in parallel but of operating on
`data 128 bits (16 bytes) at a time.
`
`3
`
`Figure 5. Typical controller plus DSP system
`
`SAMSUNG-1010
`Page 3 of 4
`
`
`
`Motorola’s AltiVec Technology —White Paper
`
`Communication
`
`Control & Computation
`
`Interface
`Circuit*
`
`Interface
`Circuit*
`
`bus
`
`PowerPC
`Processor
`with AltiVec
`Technology
`
`PowerPC
`Processor
`with AltiVec
`Technology
`
`Memory
`
`* Such as Motorola MPC860 PowerQUICC™ controller
`
`Figure 6. System using multiple PowerPC processors with
`AltiVec technology, sharing a common bus bridged to
`shared memory
`
`embedded systems developers. To more easily express the
`parallelism presented by AltiVec technology, Motorola has
`developed a standardized set of C/C++ language extensions.
`These language extensions allow a software developer to use
`their preferred C/C++ development environment and lan-
`guage syntax while explicitly taking advantage of the paral-
`lel functional units other facilities offered by the AltiVec
`technology. Motorola is working with leading tools
`providers to develop simulators, assemblers, linkers and
`compilers to assure full support for the AltiVec technology.
`
`While the initial PowerPC microprocessor utilizing AltiVec
`technology will target very high-performance applications in
`networking and computing, subsequent Motorola proces-
`sors with AltiVec technology could address markets and
`applications in which performance must be balanced with
`power, price and peripheral integration.
`
`A New Design Model
`
`The introduction of processors containing AltiVec technolo-
`gy creates a new model of system design for high-perfor-
`mance embedded systems. Historically, many high-perfor-
`
`mance embedded applications have contained a combination
`of a single RISC processor performing the system control
`function and one or more DSPs or ASICs performing spe-
`cialized computations.
`
`The single RISC processor plus multiple DSP system has a
`number of disadvantages, including two different architec-
`tures, code bases, hardware types, and debug environments.
`Additionally, because DSPs have not been on the same per-
`formance growth curve as general purpose processors - for
`example, they often require users to switch to newer non-
`compatible architectures from generation to generation,
`even minor upgrades in a customer’s product performance
`often required major hardware redesigns; often including
`changing DSP or controller architectures with the attendant
`cost and time to market impact.
`
`AltiVec technology-based systems can provide more capable
`single architecture systems, often at lower cost, power bud-
`get, and physical area than controller plus DSP solutions.
`The use of a single high-performance device for controller
`and signal processing functions results in quicker time to
`market and lower overall engineering cost. A single architec-
`ture solution provides a simpler development task to both
`the hardware and software engineer.
`
`Summary
`
`With the introduction of AltiVec technology, Motorola is
`demonstrating its commitment to the PowerPC architecture
`and to meeting the requirements of next generation net-
`working, communications and computing applications.
`AltiVec technology will expand the PowerPC microproces-
`sor capability by providing leading edge general purpose
`processing performance while concurrently addressing high-
`bandwidth data handling processing and algorithmic inten-
`sive computations in a single chip solution. This new class of
`processor will provide an aggressive performance growth
`path for embedded and computing systems designers, while
`lowering development barriers inherent in multiple architec-
`ture designs, thereby reducing the time to market and total
`system development expense.
`
`©1998 Motorola, Inc. All rights reserved. Printed in the U.S.A. Motorola and the are registered trademarks and AltiVec and the AltiVec logo are trademarks of Motorola, Inc. PowerPC, the PowerPC logo and PowerPC Architecture are trademarks of
`International Business Machines Corporation and used by Motorola, Inc. under license therefrom. This document contains information on a new product under development. Specifications and information herein are subject to change without notice.
`
`TM
`
`4
`
`SAMSUNG-1010
`Page 4 of 4
`
`