`
`AMD Athlon
`Processor
`
`TM
`
`Technical Brief
`
`Rev: D
`Publication # 22054
`Issue Date: December 1999
`
`CSCO-1027
`Page 1 of 10
`
`
`
`
`
`© 1999 Advanced Micro Devices, Inc. All rights reserved.
`
`The contents of this document are provided in connection with Advanced
`Micro Devices, Inc. (“AMD”) products. AMD makes no representations or
`warranties with respect to the accuracy or completeness of the contents of
`this publication and reserves the right to make changes to specifications and
`product descriptions at any time without notice. No license, whether express,
`implied, arising by estoppel or otherwise, to any intellectual property rights
`is granted by this publication. Except as set forth in AMD’s Standard Terms
`and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims
`any express or implied warranty, relating to its products including, but not
`limited to, the implied warranty of merchantability, fitness for a particular
`purpose, or infringement of any intellectual property right.
`
`AMD’s products are not designed, intended, authorized or warranted for use
`as components in systems intended for surgical implant into the body, or in
`other applications intended to support or sustain life, or in any other applica-
`tion in which the failure of AMD’s product could create a situation where per-
`sonal injury, death, or severe property or environmental damage may occur.
`AMD reserves the right to discontinue or make changes to its products at any
`time without notice.
`
`Trademarks
`AMD, the AMD logo, AMD Athlon, and combinations thereof, and 3DNow! are trademarks of Advanced Micro
`Devices, Inc.
`
`MMX is a trademark of Intel Corporation.
`
`Digital and Alpha are a trademarks of Digital Equipment Corporation.
`
`Other product names used in this publication are for identification purposes only and may be trademarks of
`their respective companies.
`
`Page 2 of 10
`
`
`
`22054D/0—December 1999
`
`AMD Athlon™ Processor Technical Brief
`
`
`
`Revision History
`
`Date
`August 1999
`
`Rev
`C
`
`December 1999
`
`D
`
`Description
`
`Initial public release.
`Added information about AMD's new 0.18-micron process technology to “Process Technology”
`on page 7
`
`Revision History
`
`iii
`
`Page 3 of 10
`
`
`
`22054D/0—December 1999
`
`AMD Athlon™ Processor Technical Brief
`
`
`
`Introduction
`
`AMD Athlon™ Processor
`
`Technical Brief
`
`The AMD Athlon™ processor powers the next generation in
`computing platforms, delivering the ultimate performance for
`cutting-edge applications and an unprecedented computing
`experience.
`
`The AMD Athlon™ processor is the first member of a new
`family of seventh-generation AMD processors designed to meet
`the computation-intensive requirements of cutting-edge
`software applications running on high-performance desktop
`systems, workstations, and servers. This technical brief
`describes the features of the AMD Athlon processor’s
`microarchitecture.
`
`The AMD Athlon processor’s microarchitecture is designed to
`support the growing processor and system bandwidth
`requirements of emerging software, graphics, I/O, and memory
`technologies. The AMD Athlon processor's high-speed
`execution core includes multiple x86 instruction decoders, a
`dual-ported 128-Kbyte split level-one (L1) cache, three
`independent integer pipelines, three address calculation
`pipelines, and the x86 industry's first superscalar, fully
`pipelined, out-of-order, three-way floating-point engine. The
`floating-point engine is capable of delivering 2.4 gigaflops
`(Gflops) of single-precision and more than 1 Gflop of
`
`Introduction
`
`1
`
`Page 4 of 10
`
`
`
`AMD Athlon™ Processor Technical Brief
`
`22054D/0—December 1999
`
`
`
`double-precision floating-point results at 600 MHz for superior
`performance on numerically complex applications.
`
`The AMD Athlon processor’s microarchitecture includes:
`I The industry's first nine-issue, superpipelined, superscalar
`x86 processor microarchitecture designed for high clock
`frequencies
`• Multiple x86 instruction decoders
`72-entry instruction control unit
`(cid:127) Advanced dynamic branch prediction
`pipelined
`fully
`Three
`out-of-order,
`superscalar,
`floating-point execution units, which execute all x87
`(floating-point), MMX™ and 3DNow!™ instructions
`Three out-of-order, superscalar, pipelined integer units
`Three out-of-order, superscalar, pipelined address
`calculation units
`I Enhanced 3DNow! technology with new instructions to
`enable improved integer math calculations for speech or
`video encoding and improved data movement for internet
`plug-ins and other streaming applications
`an
`featuring
`I High-performance
`cache
`architecture
`integrated 128-Kbyte L1 cache and a programmable,
`high-speed backside L2 cache interface
`I 200-MHz AMD Athlon system bus (scalable beyond 400
`MHz) enabling leading-edge system bandwidth for data
`movement-intensive applications
`
`2
`
`Introduction
`
`Page 5 of 10
`
`(cid:127)
`(cid:127)
`(cid:127)
`(cid:127)
`
`
`22054D/0—December 1999
`
`AMD Athlon™ Processor Technical Brief
`
`
`
`AMD Athlon™ Processor Microarchitecture
`
`The AMD Athlon processor is based on a seventh-generation
`x86 microarchitecture that features a superpipelined,
`nine-issue superscalar microarchitecture optimized for high
`clock frequency. The AMD Athlon has a large dual-ported
`128-Kbyte split-L1 cache (64-Kbyte instruction cache +
`64-Kbyte data cache), a two-way, 2048-entry branch prediction
`table, multiple parallel x86 instruction decoders, and multiple
`integer and floating-point schedulers for independent
`superscalar, out-of-order, speculative execution of instructions.
`These elements are packed into an aggressive processing
`p ip el in e t h a t inc l u d e s 1 0 -s t a ge in t eg e r a n d 1 5 -s t a ge
`floating-point pipelines, which are illustrated in Figure 1.
`
`2-Way, 64-Kbyte Instruction Cache
`24-Entry L1 TLB/256-Entry L2 TLB
`
`Predecode
`Cache
`
`Branch
`Prediction Table
`
`Fetch/Decode
`Control
`
`3-Way x86 Instruction Decoders
`
` Instruction Control Unit (72-Entry)
`
`Integer Scheduler (18-Entry)
`
`IEU0 AGU0
`
`IEU1
`
`AGU1
`
`IEU2 AGU2
`
`Bus
`Interface
`Unit
`
`FPU Stack Map / Rename
`
`FPU Scheduler (36-Entry)
`
`FPU Register File (88-Entry)
`
`FADD
`MMX™
`3DNow!™
`
`FMUL
`MMX
`3DNow!
`
`FSTORE
`
`L2 Cache
`Controller
`
`Load / Store Queue Unit
`
`2-Way, 64-Kbyte Data Cache
`32-Entry L1 TLB/256-Entry L2 TLB
`
`System Interface
`
`L2 SRAMs
`
`Figure 1. AMD Athlon™ Processor Block Diagram
`
`AMD Athlon™ Processor Microarchitecture
`
`3
`
`Page 6 of 10
`
`
`
`AMD Athlon™ Processor Technical Brief
`
`22054D/0—December 1999
`
`
`
`Multiple Decoders
`
`The AMD Athlon processor includes three full x86 instruction
`decoders. These decoders translate x86 instructions into
`fixed-length MacroOPs for higher instruction throughput and
`increased processing power. Instead of executing x86
`instructions, which have lengths of 1 to 15 bytes, the
`AMD Athlon processor executes the fixed-length MacroOPs,
`while maintaining the instruction coding efficiencies found in
`x86 programs.
`
`Instruction Control Unit
`Once MacroOPs are decoded, up to three MacroOPs per cycle
`are dispatched to the instruction control unit (ICU). The ICU is
`a 72-entry MacroOP reorder buffer (ROB) that manages the
`execution and retirement of all MacroOPs, performs register
`renaming for operands, and controls any exception conditions
`and instruction retirement operations. The ICU dispatches the
`MacroOPs to the AMD Athlon processor’s multiple execution
`unit schedulers.
`
`Execution Pipelines
`Th e A M D A t h l o n p ro c e s s o r c o n t a i n s a n 1 8 -e n t ry
`integer/address generation MacroOP scheduler and a 36-entry
`floating-point unit (FPU)/multimedia scheduler. These
`schedulers issue MacroOPs to the nine independent execution
`pipelines — three for integer calculations, three for address
`calculations, and three for execution of MMX, 3DNow!, and x87
`floating-point instructions.
`
`The AMD Athlon processor offers the most powerful,
`architecturally advanced floating-point engine ever delivered
`in an x86 microprocessor. The AMD Athlon processor's
`three-issue, superscalar floating-point capability is based on
`three pipelined, out-of-order floating-point execution units,
`each with a one-cycle throughput. These three execution units
`(FMUL, FADD, and FSTORE) execute all x87 (floating-point)
`instructions, MMX instructions, and enhanced 3DNow!
`instructions. Using a data format and single-instruction
`multiple-data (SIMD) operations based on the MMX instruction
`model, the AMD Athlon processor can deliver as many as four
`32-bit, single-precision floating-point results per clock cycle,
`resulting in a peak performance of 2.4 Gflops at 600 MHz.
`
`4
`
`AMD Athlon™ Processor Microarchitecture
`
`Page 7 of 10
`
`
`
`22054D/0—December 1999
`
`AMD Athlon™ Processor Technical Brief
`
`
`
`Branch Prediction
`
`The AMD Athlon processor offers sophisticated dynamic
`branch prediction logic to minimize or eliminate the delays due
`to the branch instructions (jumps, calls, returns) common in x86
`software. The processor includes the following:
`I Branch prediction table
`I Branch target address table
`I Return address stack
`
`The AMD Athlon processor implements a two-way, 2048-entry
`branch prediction table. The branch prediction table stores
`prediction information that is used for predicting the direction
`of conditional branches. The branch target address table stores
`target addresses of conditional and unconditional branches.
`The return address stack optimizes CALL/RET instruction pairs
`by storing the return address of each CALL within a nested
`series of subroutines and supplying a return address as the
`predicted target address of the corresponding RET instruction.
`
`Enhanced 3DNow!™ Technology
`
`The AMD Athlon processor includes enhanced 3DNow!
`technology designed to take 3D multimedia performance to new
`heights. The enhanced 3DNow! technology implemented in the
`AMD Athlon includes AMD’s original twenty-one 3DNow!
`instructions (the industry’s first x86 instruction set to use
`superscalar SIMD floating-point techniques to accelerate 3D
`performance), plus, twenty-four new instructions, which
`perform the following functions:
`I Twelve instructions that improve multimedia-enhanced
`integer math calculations used in such applications as
`speech recognition and video processing
`I Seven instructions that accelerate data movement for more
`detailed graphics and functionality for internet browser
`plug-ins and other streaming applications, enabling a richer
`internet experience
`I Five digital signal processing (DSP) instructions that
`enhance the performance of communications applications,
`including soft modems, soft ADSL, MP3, and Dolby Digital
`surround sound processing
`
`AMD Athlon™ Processor Microarchitecture
`
`5
`
`Page 8 of 10
`
`
`
`AMD Athlon™ Processor Technical Brief
`
`22054D/0—December 1999
`
`
`
`Cache Architecture
`
`In enhancing 3DNow! technology, AMD kept the instruction set
`design simple, yet powerful. AMD’s plan in designing the new
`3 D N ow ! i ns t r uc t i o n s wa s t o p rov i d e p owe r f u l S I M D
`performance while enabling ease of implementation for
`software developers. The relatively few instructions of
`enhanced 3DNow! technology allow developers to adopt this
`technology and optimize their applications quickly.
`
`The AMD Athlon processor’s high-performance cache
`architecture includes an integrated, 64-bit, dual-ported
`128-Kbyte split-L1 cache with separate snoop port, multi-level
`translation lookaside buffers (TLBs), a scalable L2 cache
`controller with a 72-bit (64-bit data + 8-bit ECC) interface to as
`much as 8-Mbyte of industry-standard SDR or DDR SRAMs, and
`an integrated tag for the most cost-effective 512-Kbyte L2
`configurations.
`
`The AMD Athlon processor’s integrated L1 cache comprises
`two separate 64-Kbyte, two-way set-associative data and
`instruction caches. The data cache has eight banks to support
`concurrent access by two 64-bit loads or stores. The instruction
`c a ch e c o n t a i n s p re d e c o d e d a t a t o a s s i s t m u l t i p l e ,
`high-performance instruction decoders. The robust bi-level TLB
`structure minimizes code and data delays when accessing
`physical memory.
`
`The AMD Athlon processor’s L2 cache controller operates at a
`programmable frequency for compatibility with a variety of
`industry-standard SRAMs including DDR. The integrated L2
`cache tag provides a full tag for a 512-Kbyte L2 cache or a
`partial tag for larger L2 caches.
`
`System Bus Interface
`
`The 200-MHz AMD Athlon system bus interface— the fastest
`bus implementation for x86 platforms — leverages the
`high-performance Digital™ Alpha™ EV6 system interface
`technology to significantly boost system performance and
`prov ide ampl e headroom for today's and tomo rrow's
`applications. The AMD Athlon system bus provides advanced
`features, such as source synchronous clocking for high-speed
`200-MHz-to-400-MHz operation, point-to-point topology for
`
`6
`
`AMD Athlon™ Processor Microarchitecture
`
`Page 9 of 10
`
`
`
`22054D/0—December 1999
`
`AMD Athlon™ Processor Technical Brief
`
`
`
`peak data bandwidth independent of the number of processors,
`packet-based transfers for improved transaction pipelining,
`large 64-byte burst data transfers, 8-bit ECC protection of data
`and instructions, low-voltage signaling for high-performance,
`low-cost motherboard implementations, and the ability to
`address more than eight terabytes of physical memory.
`
`The 200-MHz system bus implemented in the AMD Athlon
`processor is capable of delivering a peak data transfer rate of
`1.6 Gbytes per second — twice that of previous processor
`generations. With its source synchronous clocking design, the
`AMD Athlon processor's system bus is scalable to operate
`beyond 400 MHz.
`
`Process Technology
`
`The AMD Athlon processor is manufactured on AMD's six-layer
`metal, 0.25-micron process technology and AMD's new
`0.18-micron process technology. In 0.25-micron technology, the
`approximately 22-million-transistor AMD Athlon processor has
`a die siz e of 184 mm 2. In 0.18-mic ron technology, the
`AMD Athlon processor has a die siz e of 102 mm 2. The
`AMD Athlon processor is included in a cost-effective,
`industry-standard module form factor — Slot A, which is
`mechanically compatible with the existing Slot 1 infrastructure,
`and therefore, leverages commonly available chassis, power
`supply, and thermal solutions.
`
`Th e A M D A t h l o n p ro c e s s o r ' s s eve n t h -g e n e ra t i o n
`microarchitecture and high-bandwidth system bus enable it to
`attain performance levels never before achieved by an x86
`processor. The AMD Athlon significantly outperforms
`previous-generation x86 processors and delivers the highest
`integer, floating-point, and 3D multimedia performance
`available for x86 platforms, as measured by industry-standard
`benchmarks.
`
`Summary
`
`The AMD Athlon provides industry-leading processing power
`for cutting-edge software applications, including digital
`content creation, digital photo editing, digital video, image
`compression, video encoding for streaming over the internet,
`soft DVD, commercial 3D modeling, workstation-class
`computer-aided design (CAD), commercial desktop publishing,
`and speech recognition.
`AMD Athlon™ Processor Microarchitecture
`
`7
`
`Page 10 of 10
`
`