`
`199
`
`ANT-on- YARDS: F P G A / M P U Hybrid Architecture
`for Telecommunication Data Processing
`
`Akihiro Tsutsui,
`
`IEEE, and Toshiaki Miyazaki, Member,
`
`I E E E
`
`a
`
`a r e
`
`A b s t r a c t - T h i s paper presents a novel system a rc h i t e c t u r e
`that combines tightly coupled field programmable gate a r r ay s
`(FPGA’s) and a microprocessing unit (MPU) that w e have de-
`veloped. This system a rc h i t e c t u r e comprises three m a i n
`p r o -
`grammable devices w h i c h yield high flexibility.
`These
`devices
`r e d u c e d i n s t r u c t i o n set computer (RISC)-type M P U w i t h
`memories, programmable i n t e rc o n n e c t i o n devices, a n d F P G A ’ s .
`T h i s system supports va r i o u s styles of coupling b e t w e e n the
`FPGA’s and the M P U which makes s e v e r a l data processing o p e r-
`ations m o r e effective. Furthermore, w e
`indicate the m o s t suitable
`the system. They a r e
`applications for
`t e l e c o m mu n i c a t i o n
`d a t a
`processes involving complex protocol operations a n d n e t wo r k
`c o n t ro l algorithms. I n this p a p e r,
`two applications of the system
`for operation, administration, a n d management
`a r e given. One i s
`(QAM)cell processing o n a n asynchronous t r a n s f e r m o d e (ATM)
`network. The other is a dynamic r e m o t e reconfiguration protocol
`that enables the functions of the transport data processing system
`to be updated o r changed on-line.
`
`Index Te r m s - Codesign,
`(FPGA), microprocessor,
`r e d e f i n a b l e system (YARDS).
`
`programmable
`field
`telecommunication,
`
`gate
`yet
`
`a r r ay
`a n o t h e r
`
`I.
`
`implementations of
`telecommunica-
`O N V E N T I O N A L
`tion systems use fixed hardware since importance w a s
`placed o n high-speed transport data processing rather than flex-
`ibility. However, today’s enthusiasm for the inter-networking
`trend is forcing network systems to support a wide variety
`of communication protocols,
`fully
`those a r e
`not yet
`e v e n
`developed. Therefore,
`future network systems must offer not
`only high performance but also high flexibility [1].
`The asynchronous transfer mode (ATM) technique is o n e of
`the solutions to this problem [2].
`It is suitable for multimedia
`data communication because it is based o n
`the concept of the
`“virtual path (VP)” which c a n handle several data bandwidths
`equally andflexibly.
`I n addition, because the minimum transfer
`unit (the cell) is quite small, hardware implementation is so
`is easy to construct high-throughput network
`simple that
`it
`systems [3]. With the progress of optical communication
`technologies,
`the AT M network has the potential
`to realize
`high performance and flexible telecommunication services.
`Whenproviding high quality multimedia services,i t is indis-
`pensable to achieve a high performance and flexible network
`control and management [1]. Up to now, there has been no
`other choice but
`to realize these a s software components
`because they are complex and require frequent updating.
`I n
`
`Manuscript received April 4, 1997; revised November 1, 1997.
`The authors a r e with N I T Optical Network Systems Laboratories, Kana-
`gawa 239-0847 Japan.
`Publisher Item Identifier S 1063-8210(98)02955-2.
`
`the n e a r future, however, s o m e of
`the operations might be
`implemented a s hardware to achieve adequate data manipu-
`layer telecommunication protocols
`lation rates. Thus,
`lower
`flexible implementation and higher o n e s
`require m o r e
`will
`will be forced to achieve higher performance. Therefore,
`a n
`efficient technique for fusing hardware and software is be-
`coming indispensable for building the next generation network
`systems. Unfortunately, n o existing design technology in the
`telecommunication field c a n achieve this fusion.
`situation, w e considered the field programmable
`Givent h i s
`gate array (FPGA)t o be the key to future telecommunication
`systems and have developed a n original F P G A especially
`designed for high-speed telecommunication data processing
`[4], [5]. Using this device, w e also constructed a reconfigurable
`signal-transport system dedicated to real-time emulation of
`transport processing circuits [6]. This system is useful
`in i m -
`plementing lower layer transport operations. However, higher
`layer protocols such a s network management applications
`mentioned above include excessively complex logic operations
`which are not suitable for FPGA’s. Microprocessing units
`(MPU’s) and program logic implemented a s software appear
`indispensable to meet these goals.
`reports o n hybrid systems which consist
`There are s o m e
`of tightly coupled F P G A ’ s and M P U ’ s
`[7],
`[8].
`I n these
`systems, both F P G A ’ s and M P U ’ s cooperate with each other
`to execute operations. I n general, however, it is difficult to find
`a n actual target application that maximizes the performance of
`is difficult to divide a given problem into
`the system, because i t
`two implementation styles: hardware and software. Based o n
`o u r experience, w e discovered a n effective application for the
`tightly coupled system in the field of telecommunications. The
`lower layer telecommunication protocols a r e suitable for hard-
`implementation while the higher layer protocols should
`w a r e
`I n the field, both high throughput to
`be realized as software.
`support real-time data transmission and flcxibility to support
`various protocols are required.
`Thus, w e developed a novel architecture for a system c o m -
`prising tightly coupled F P G A ’ s and M P U ’ s . This architecture
`is suitable for implementing flexible and real-time transport
`data processing operations involving complex protocol oper-
`ations.
`the hybrid
`This paper presents the n e w architecture of
`system andi t s advantages. Gencral idcas for achicving cfficicnt
`coupling of software (MPU) and hardware (FPGA) are also
`mentioned. Moreover, application models of the hybrid system
`in the telecommunications field and experimental
`implemen-
`tations for a few instances are shown.
`
`© 1998 IEEE
`
`MS_SRC-SRMT_0105159
`
`IPR2018-01694
`
`EXHIBIT
`2060
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 1
`
`
`
`200
`
`I E E E T R A N S AC T I O N S O N V E RY L A R G E S C A L E I N T E G R AT I O N (VLSD SYSTEMS, VOL. 6, NO. 2, J U N E 1998
`
`Card
`
`Interrupt
`
`D R A M
`(SIMM)
`
`VME-BUS]|
`
`V M E - I / F
`
`Local Bus
`
`)
`
`FPGAs
`for Programing
`
`F P G A
`
`Prog.
`OSC-1
`
`O S C - 2
`
`1
`
`Zz
`
`Programmable Switch
`
`2-Por t
`S R A M
`
`2-Por t
`S R A M
`
`2-Por t
`S R A M
`
`2-Por t
`S R A M
`
`YA R D S Main Card
`
`Fig. 1. Basic architecture of YARDS.
`
`II. RELATED WORK
`
`Some systems consisting of M P U ’ s and F P G A ’ s have been
`I n those hybrid systems, s o m e particular
`proposed [7],
`[8].
`operations suitable to hardware implementation a r e performed
`by specially designed logic circuits o n
`the F P G A ’ s while the
`M P U works in conjunction with them.
`They communicate with each other using the M P U ’ s local
`bus. Thus, the F P G A ’ s are treated as coprocessors o r special
`peripheral devices of the M P U. Moreover, specially designed
`FPGA’s d e d i c a t e d to coworking with the M P U w e r e
`[9].
`the target applications for
`these systems involve
`M o s t of
`numerical calculation o r digital signal processing such a s
`video-CODEC. However, i t seems difficult to find a n effective
`application which derives the m a x i mu m performance from
`the system. One r e a s o n for this problem is that it is difficult
`to divide the target application into hardware and software
`the M P U
`components. Morcover, data transmission
`and F P G A becomes a bottleneck which prevents the system
`from attaining a high throughput.
`
`I .
`
`YA R D S A N D A N T
`
`A. Overview of YARDS
`
`We developed a n e w system architecture comprising tightly
`coupled F P G A ’ s and a n M P U named “ y e t another redefinable
`system” (YARDS). Fig.
`1 shows the basic architecture of
`
`a n FPGAa r r ay, pro-
`a n MPU,
`YA R D S . The main parts a r e
`grammable switching devices, and two-port SRAM’s. Fig. 2
`consists of
`three
`shows the system overview of YARDS. I t
`the main card,
`the M P U card, and the F P G A card.
`cards:
`The main card contains devices for interconnecting the F P G A
`consists
`part and the M P U part of the system. The MPU c a rd
`of a reduced instruction set computer (RISC) M P U with a
`B I O S - R O M . The F P G A card features a multiple F P G A array.
`YARDS a l s o
`interfaces: a V M E - B u s I/F
`has two external
`and a direct
`I / O channel derived from the F P G A card. Us-
`ing the VME-Bus,
`this system c a n
`communicate with other
`host computer systems. We utilize it for controlling
`YARDS o r
`I / O channel provides
`and monitoring the system. The direct
`a direct dala communication between other devices and the
`FPGA c a rd . The front view of YARDS i s
`shown in Fig. 3 and
`the back is shown in Fig. 4.
`
`B. YARDS M a i n Card
`
`the
`The main card contains interconnection elements for
`FPGA’s and the M P U. They comprise field programmable
`switching devices (I-Cube) that support connections among
`its pins such as unidirectional o r a bus. The local-bus signals
`and a fewinterrupt pins of the M P U and most I / O pins of the
`F P G A ’ s a r e connected directly to these switching devices.
`The two-port SRAM’s o n the main card have various uses.
`connected directly to the switching devices. By
`These a r e
`suitably configuring the connection pattern of
`the switches,
`those S R A M ’ s c a n be used a s shared memories o r buffers by
`
`MS_SRC-SRMT_0105160
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 2
`
`
`
`T S U T S U I A N D M I YA Z A K I :
`
`F P G A / M P U H Y B R I D A R C H I T E C T U R E
`
`201
`
`YARDS Main Gard
`
`PPOA Gard
`
`Fig. 2.
`
`System overview of YARDS.
`
`Fig. 3.
`
`Picture of YARDS (front).
`
`Fig. 4.
`
`Picture of YARDS (back).
`
`oa
`
`=
`
`LOA| Es.
`
`the F P G A ’ s and the M P U. Thus, using these programmable
`switching devices and two-port SRAM’s, various connection
`established between the F P G A ’ s and the
`patterns c a n be
`
`i
`\
`
`Local
`
`Switching Device
`
`LCA
`
`LCA
`
`LCA
`
`XC-4010
`
`XC-4010
`
`XC-4010
`
`LCA
`
`XC-4010
`
`32
`
`External
`
`I/O Connector
`
`Connector to the Main Card
`
`Fig. 5.
`
`FPGA card (using LCA).
`
`LOA
`
`LOA
`
`LOA
`
`Fig. 6. Example of a
`
`link topology (using LCA).
`
`M P U. The programming and control
`logic of
`the system are
`implemented using a n FPGA o n the main card. Thus the
`system c a n support several kinds of F P G A ’ s and M P U ’ s by
`reprogramming the F P G A .
`YARDS h a s t h r e e clock generators in addition to the M P U ’ s
`base clock (20 MHz). One of
`these provides a
`fixed clock
`This is the base clock for a typ-
`speed of 19.44 M H z .
`ical
`telecommunication circuit which handles a
`155 Mb/s
`synchronous digital hierarchy (SDH)
`interface. Others are
`programmable clock generators.
`
`C. F P G A Card a n d M P U Card
`
`We wanted to try various types of F P G A or M P U devices
`a n experimental system. Thus the F P G A
`because YARDS i s
`
`MS_SRC-SRMT_0105161
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 3
`
`
`
`2 0 2
`
`I E E E T R A N S AC T I O N S O N V E RY L A R G E S C A L E I N T E G R AT I O N (VLSD SYSTEMS, VOL. 6, NO. 2, J U N E 1998
`
`Fig. 7.
`
`Picture of ANT.
`
`part and the MPUp a r t of the system w e r e designed a s separate
`daughter cards. They c a n be replaced with cards containing
`other devices. As for
`the M P U part, w e
`have l i t t l e
`choice
`local bus compatibility. However,
`owing to the constraint of
`there are many candidates. Actually,
`as for
`the F P G A part,
`w e have developed two types of F P G A cards which consist of
`different F P G A devices. We adopted X i l i n x L C A (XC4010)
`(MAX-9000). Fig. 5 shows a block
`and A LT E R A M A X
`diagram of
`the L C A array card.
`I t consists of
`four devices
`(LCA) that are connected directly to each other.
`the card is a cascade. However,
`link topology o n
`Their
`most of the I / O pins of
`the F P G A ’ s c a n be connected to the
`programmable switching devices o n the main card and s o m e
`I / O connectors. Using the
`of
`them a r e connected to external
`switches,
`the topology of the links among the F P G A ’ s c a n b e
`varied as shown in Fig. 6.
`Generally speaking, different kinds of F P G A ’ s require dif-
`ferent configuration methods. Therefore, the configuration and
`logic control of a n F P G A must be changed for each device.
`This difference in the logic configuration c a n be absorbed
`(LCA X C 4 0 1 0
`by the programming control F P G A ’ s
`and
`XC3030) mounted o n
`the main card. A l l of
`the pins for
`configuration and control of the F P G A ’ s o n the F P G A card a r e
`connected to this programming control FPGA via the switches.
`By reprogramming to achieve a suitable configuration logic
`circuit o n the programming control FPGA, w e
`c a n handle
`different kinds of FPGA’s.
`the M P U and a B I O S - R O M . We
`The M P U card
`adopted a 32-bit R I S C microprocessor (Hyperstone E-1) which
`has a simple architecture and is easy to u s e . A l l local bus signal
`lines a r e connected to the main card bus via the connectors.
`This RISC microprocessor has five different interrupt signals.
`A few interrupt signal pins a r e connected to the F P G A card
`directly,
`the rest pins are connected to switching devices o n
`the main card via the connectors.
`
`D. A N T Architecture
`
`O u r main target application for YA R D S
`is a n intelligent
`network node system for
`the AT M network. For
`this, w e
`a n AT M N e t wo r k Termination (ANT) card which
`is designed as a
`acts as the network interface for YARDS. I t
`single board computer system with a 155 Mb/s AT M network
`interface daughter card. Fig. 7 is a photograph of a n A N T card.
`This card has a direct I/O channel which c a n be connected to
`the F P G A ’ s o n YARDS. Coupling the card and YARDS yields
`a n intelligent network interface for the single-board computer
`a s shown in Fig. 8 .
`Fig. 9 shows the block diagram of A N T. The single board
`computer part of A N T
`is designed as
`a standard V M E -
`It consists of a RISC-type microprocessor (MIPS
`Bus card.
`R3000), memories, and s o m e peripheral devices. A serial
`communication interface and a n Ethernet interface are also
`implemented o n
`the card. This computer system is managed
`by the real-time operating system called VxWorks |10] and
`stand alone.
`c a n
`The AT M interface daughter card comprises a 155 Mb/s
`optical interface, AT M physical layer processor (PHY Device),
`A I M adaptation layer processor (SAR Device), and S R A M
`memory devices. Compared to ordinary interface cards, o u r
`daughter card has a special data channel which c a n be linked
`is connected
`to the direct
`I / O port of YA R D S . This channel
`to the AT M physical layer processor directly. Thus, r a w AT M
`cells c a n be sent and received by YA R D S .
`The logging memory placed between the A N T main card
`and the AT M interface daughter card provides the communi-
`cation channel between the P H Y device of A N T and the R I S C
`microprocessor. Both c a n send or receive r a w AT M cells via
`this logging memory. This feature is useful
`for tapping and
`monitoring the AT M link connected to A N T and processing
`status in YA R D S .
`
`MS_SRC-SRMT_0105162
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 4
`
`
`
`SAR|a
`ATM intertace ||Cards
`TIMER|| EPROM
`
`|| | || |
`
`Most c o nv e n t i o n a l hybrid systems employ a bus architecture
`I/O devices
`[8]. They treat the FPGA’s as a coprocessors o r
`a s shown in Fig. 10. This interconnection style couples these
`devices tightly. However, there are s o m e problems t h a t prevent
`harmonious cooperation.
`
`T S U T S U I A N D M I YA Z A K I :
`
`F P G A / M P U H Y B R I D A R C H I T E C T U R E
`
`Link
`
`intelligent Network
`
`Processor
`
`4 i
`
`4
`
`4 i
`
`‘
`
`MPU
`
`Fig. 8. Model of
`
`intelligent network interface.
`
`i
`
`AMT
`
`|
`
`ANT Man Gard
`
`|
`
`|
`
`LOG
`
`Controller
`
`YARDS
`oUF
`
`the
`
`ATM
`PHY
`
`| ,
`
`Opt,
`
`(fe
`
`A l M
`
`Card
`
`Fig. 9.
`
`Block diagram of ANT.
`
`IV. ADVANTAGES
`
`A. Flexible Interconnection Between M P U a n d F P G A
`
`For example,
`immediate communication from a n F P G A
`to implement using this intercon-
`to the M P U is difficult
`In addition,
`local bus congestion caused by
`nection style.
`the communication between a n M P U and F P G A ’ s must be
`considered. Therefore, using only the bus
`restricts
`the applications of
`the system.
`supports three different styles of connection be-
`YA R D S
`interrupt, and a
`a direct
`tween F P G A ’ s and M P U :
`a bus,
`
`MS_SRC-SRMT_0105163
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 5
`
`
`
`204
`
`I E E E T R A N S AC T I O N S O N V E RY L A R G E S C A L E I N T E G R AT I O N (VLSD SYSTEMS, VOL. 6, NO. 2, J U N E 1998
`
`Data Stream
`
`&
`
`&
`
`{FPGA
`
`M P U
`
`FPGA
`
`§
`
`8 \
`
`\
`
`Y X
`
`M P U
`
`Local Bus
`
`Fig. 10.
`
`Interconnection using local bus.
`
`2-Port
`
`SRAM
`
`MPU
`
`FPGA
`
`Fig. 12.
`
`Typical
`
`implementation style using local bus.
`
`Main Memory
`
`Switching
`D ev i c e
`
`Local Bus
`
`Interrupt Signal
`
`Fig. 11.
`
`New interconnection style.
`
`two-port S R A M channel. These connection styles c a n
`be
`established using programmable switching devices and a r e
`configured easily.
`The busstyle is the s a m e
`a s the conventional o n e . The direct
`interrupt links connect the F P G A and the interrupt signal pins
`of the MPU a s
`shown i n Fig. 11. This connection style enables
`the F P G A to interrupt and control the behavior of the M P U.
`I n a conventional FPGA/MPU hy b r i d system, the main device
`implemented
`the main instructions a r e
`is the M P U. That
`is,
`a s software and executed by the MPU, and the F P G A ’ s a r e
`considered as subdevices. However, by using these direct
`the F P G A part is able to perform a
`interrupt links,
`leading
`the main instructions
`role in the system.
`I n such a system,
`are implemented as a logic circuit o n FPGA’s, and the M P U
`performs a supporting role. The M P U is always ready for a n
`interrupt signal
`from the FPGA’s. W h e n a n interrupt signal
`is sent, a corresponding subroutine is invoked by the M P U.
`This connection style aids in implementing some transport data
`processing protocols which have data-driven operations such
`a s a frame synchronization and transmission e r ro r handling.
`
`B. Two-Por t S R A M Channel
`
`Considering o u r main target applications,
`telecommuni-
`cation data processing operations,
`is expected that
`it
`the
`data communication between the M P U and the F P G A ’ s will
`o c c u r very frequently and asynchronously.
`In such a case,
`cooperation between the M P U and F P G A ’ s will
`local
`c a u s e
`
`bus congestion and degrade the performance of both. Using
`only the bus architecture,
`the implementation style of o u r
`target system should be similar to that shown in Fig. 12. The
`transport data stream is input to the F P G A directly. The F P G A
`low layer protocol operations and transfers the
`executes s o m e
`results to main memory. The M P U then accesses and reads
`the results from memory while executing s o m e high-layer
`protocol operations. W h e n those operations a r e
`finished,
`the
`retrieves the processed data,
`FPGA accesses main memory,
`and reshapes the data a s
`the output transport data stream.
`I n general, the local bus of the M P U is usually occupied by
`data transmission among memories, the peripheral devices and
`the MPU i t s e l f . Therefore, these repetitive data transformations
`among the MPU, the FPGA’s, and the memories will block
`the local bus. Moreover, i f bus connections a r e used, both the
`F P G A ’ s must be synchronized with local bus timing
`MPU a n d
`and yield to arbitration protocols. For a n FPGA in particular,
`a bus interface circuit should be incorporated into it, however
`this would occupy a considerable a r e a in the device.
`is o n e of the remarkable fea-
`The two-port S R A M channel
`tures of YA R D S . As shown in Fig. 11, the two-port S R A M ’ s
`o n the main card c a n be configured as
`channel devices
`between F P G A ’ s and the M P U. They perform asynchronous
`data transformation. This mechanism aids in implementing
`applications that must work at
`two or m o r e different base
`clock speeds. Moreover, by using this communication style,
`the M P U and the F P G A ’ s c a n devote themselves to their
`respective tasks without influencing each other. For example,
`multilayer protocol operation which includes both lower and
`higher layer protocol operations is o n e good application for
`I n such a n operation, each protocol
`this connection style.
`operation is executed in sequence for the s a m e datagram. The
`overhead of a data copy operation from o n e memory to another
`is a considerable problem. Using this type of connection, the
`overhead is canceled because each device c a n share the s a m e
`data in a two-port S R A M anda c c e s s it independently.
`
`MS_SRC-SRMT_0105164
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 6
`
`
`
`Arguments
`
`B e h a v i o r
`
`F u n c t i o n
`
`Arguments
`
`B e h a v i o r
`
`F u n c t i o n
`Arguments
`
`B e h a v i o r
`
`F u n c t i o n
`Arguments
`
`B e h a v i o r
`
`F P G A
`
`Switching
`Device
`
`F P G A
`
`o r
`M P U
`
`M P U
`
`Device
`
`F P G A
`
`v (Hash Function e t c . )
`
`S R A M
`
`Functional
`Memory
`
`Fig.
`
`Memory configuration.
`
`I n addition,
`the main card
`the two-port S R A M devices o n
`are used not only for channel elements between F P G A ’ s and
`a FIFO,
`a normal memory device,
`the M P U but also a s
`o r
`a STACK. Using switching devices, various kinds of storage
`c a n be built as shown in Fig. 13. Furthermore, when the M P U
`the S R A M through the FPGA’s,
`the combination
`a c c e s s e s
`these devices c a n be considered a s
`a kind of
`functional
`of
`In this case, the FPGA c a n be configured as a pre-
`memory.
`implementing a
`processor. For example,
`o r post-data a c c e s s
`hash function into the F P G A is useful for a table lookup.
`
`C. Advantages in Telecommunication F i e l d
`
`A A L Type Identifier
`aalType
`/ / V P Identifier
`vpi
`// V C Identifier
`int
`vei
`/ / Payload Type Identifier
`int
`pti
`T h i s function sets up a n AT M c o n n e c t i o n using
`specified parameters. When opening a
`the AT M network,
`this procedure mu s t
`tion o n
`be c a l l e d
`AtmOutput
`int
`vei
`char*
`
`c o n n e c -
`
`// V C Identifier
`/ / Pointer of the buffer
`/ / containing
`/ / bytes t o be sent
`int
`number o f bytes t o be sent
`c o u n t
`This function sends the data pointed t o i n
`the
`cells. The buffer c a n be
`field a s
`argument
`mapped t o memories which c a n
`be accessed by
`the M P U including the 2-port S R A M .
`AtmOutputHookAdd
`
`/ / Pointer t o output hook routine
`This function sets the output hook r o u t i n e
`interrupt s i g -
`which should be triggered by a n
`nal from A N T. The interrupt o c c u r s
`when r e -
`quested data i s
`s e n t .
`AtmInput HookAdd
`
`This function sets the Input hook r o u t i n e which
`interrupt signal from
`should be triggered by a n
`A N T. The interrupt
`when s o m e
`data
`is received and stored in specified m e m o r y by
`A N T.
`
`o c c u r s
`
`a t m O u t p u t
`
`( )
`
`FPGA
`
`|
`
`interrupt !
`
`atminputHookAdd(
`
`* F U N C )
`
`>
`
`MPU
`
`2-Port SRAM|
`
`Shared Buffer
`
`Fig. 14.
`
`Simple data communication model between A N T and YARDS.
`
`Network Element
`(Cross-Connect)
`
`Client
`
`Link
`
`v p
`
`Fig. 15.
`
`ATM n e t wo r k model.
`
`it is suitable to implement telecommunication data
`Therefore,
`a hybrid system like YA R D S . Moreover,
`processing o n
`f ro m
`the system design point of view,
`is comparatively casy
`it
`to find the splitting point between hardware and software
`implementation a s mentioned before.
`I n this paper, w e mainly focus o n the telecommunication
`field. However,
`this style of implementation should be useful
`
`MS_SRC-SRMT_0105165
`
`T S U T S U I A N D M I YA Z A K I :
`
`F P G A / M P U H Y B R I D A R C H I T E C T U R E
`
`205
`
`TABLE I
`ACCESS FUNCTIONS FOR YARDS AND A N T
`
`int
`
`2-Port|Ex.
`2-Port|Switching
`F u n c t i o n{ AtmOpenConnection
`F U N C P T R=// Pointer t o input hook routine
`F U N C P T R=atmOutputHook
`L _ _SSA N T
`
`the M P U / F P G A hybrid system,
`frequent
`Tn general,
`for
`data exchange between the M P U and the F P G A (such a s
`that shown in Fig. 12) hinders the achievement of m a x i mu m
`performance. This is because few applications c a n
`o v e r c o m e
`the data transmission overhead.
`M o s t MPU’s do not have direct I/O ports which handle
`continuous data streams. This is a weak point whenrealizing
`telecommunication data processing systems. O n the other
`suited
`I / O ports and a r e
`hand, F P G A ’ s have many direct
`real-time
`o n continuous data strcams.
`to
`Therefore, w e employ F P G A ’ s as I / O pre o r post processing
`elements for continuous data streams and link the M P U andt h e
`F P G A ’ s with two-port shared memory to allow them to a c c e s s
`each other independently. Thus, frequent and asynchronous
`data transfer between these devices c a n be achieved with
`little overhead. Consequently,
`the F P G A ’ s offset
`the defect
`of the MPU, and together they achieve efficient performance.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 7
`
`
`
`206
`
`I E E E T R A N S AC T I O N S O N V E RY L A R G E S C A L E I N T E G R AT I O N (VLSD SYSTEMS, VOL. 6, NO. 2, J U N E 1998
`
`4
`
`Interface
`
`Ethernet Packet e t c .
`
`AT M cell
`
`Main Data Stream
`
`Transport Processing Unit
`
`Controller| (WorkStation)
`=100500ms
`
`it is hard to design a target application o n
`I n general,
`a n
`FPGA/MPU hy b r i d system because a hardware/software code-
`the appropriate technology
`is needed, but
`sign environment
`has not matured yet.
`YARDS also faces this problem and w e don’t have a
`dedicated design environment. Thus,
`there is n o choice but
`to design the hardware and software parts of the system sep-
`arately. Fortunately,
`telecommunication protocol operations,
`which a r e o u r main target, a r e comparatively easy to split into
`hardware and software parts as discussed above. Therefore, w e
`developed a n a c c e s s mechanism and a n A P I between the M P U
`and FPGA’s, and prepared useful functions implemented a s C
`libraries. For example, simple data communication between
`A N T and the F P G A ’ s using the two-port S R A M which i s
`used in a n application example discussed in the next section
`c a n be implemented with the functions listed in Table T.
`Using those functions, data communication between A N T
`and YARDS c a n be described as shown in Fig. 14. I n this case,
`the M P U. For
`the F P G A ’ s a r e used a s I / O preprocessors for
`example, when A N T receives data, it copies the data into the
`two-port SRAM set as a shared buffer through the FPGA’s.
`The data a r e preprocessed by the F P G A ’ s automatically.
`the F P G A ’ s then interrupt YA R D S . Thereafter,
`A N T or
`the
`data processing routine specified by the “atmInputHookAdd”
`
`~
`
`User c e l l
`
`OAM cell
`
`configured first because they have physical level
`FPGA’s a r e
`flexibility and e r ro r s may destroy the system.
`The other programming process, dynamic reconfiguration,
`may o c c u r during system run-time. To e n s u r e that the run-time
`programming of YARDS a vo i d s f a t a l
`errors, the programming
`procedures related to the physical configuration c a n only be
`the M P U. We prepared s o m e
`useful
`handled by software o n
`library functions written in C language. Using them, u s c r s
`c a n control the programming sequence of the F P G A card and
`switching devices.
`
`B. Application Program Interface of YARDS and A N T
`
`Fig. 16. OA M cell processing model.
`
`Work Station
`
`Ethernet
`
`Local Controller
`
`Local Bus
`
`TumAround Time
`
`Transport Processing Module
`
`>
`
`Fig. 17.
`
`Conventional
`
`implementation of OA M cell processing.
`
`not only for telecom applications but also for other real-time
`and continuous data operations.
`
`V. CONFIGURATION A N D A P I
`
`A. System Configuration
`
`flexibility because almost all of
`YARDS h a s g r e a t
`its parts
`are programmable. For
`the M P U and the FPGA’s, not only
`their functions but the physical connections between them c a n
`the configuration process of the system
`be configured. Thus,
`is so complicated and dangerous that the wrong configuration
`may destroy the physical
`data o r
`There are two programming processes for YARDS;
`is turned o n
`at system start-up. W h e n the power
`o r
`o c c u r s
`the stand-by state is triggered,
`the operations needed must
`be performed automatically. Therefore, this auto configuration
`procedure is set
`in B I O S - RO M . After
`the M P U is booted
`the initial set-up procedures are automatically performed
`up,
`I n the procedures, the switching devices and the
`step by step.
`
`o n e
`
`MS_SRC-SRMT_0105166
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 8
`
`
`
`
`
`i|ATM cell :| Cia c e l l
`
`T S U T S U I A N D M I YA Z A K I :
`
`F P G A / M P U H Y B R I D A R C H I T E C T U R E
`
`Pa t
`
`i ' ,
`
`4 i 1
`
`g e
`
`Restoraban algonthm
`
`Sofware
`
`Operation
`
`DAW c e l
`selecion
`
`‘
`
`i
`
`ATM
`
`i l
`
`Pack?
`
`|!
`
`Fig. 18.
`
`Function block diagram of OA M cell processing.
`
`function is invoked by the intcrrupt signal. Thus, w e do not
`troublesome synchronization between the
`have to consider
`F P G A ’ s and the M P U.
`
`VI. APPLICATIONS
`
`A. OA M Cell Processing for Network Operation
`This paper presents two novel applications of YA R D S in
`is OA M processing
`the telecommunication field. The first o n e
`in the AT M network.
`Fig. 15 shows a n AT M network model,
`the V P concept
`enables u s to settle the bandwidth and route of links indepen-
`dently. Owing to this mechanism, w e c a n set up connections
`freely and construct a flexible and reliable network. This mech-
`anism is the basis of the AT M network. However, because of
`this flexibility,
`the network
`the control and management of
`system turns out to be a complex problem.
`typical AT M network control methods u s e OA M
`M o s t
`cells. Network Elements (NE’s) exchange requests, acknowl-
`edgments, and information for network control using the
`cells. OA M cells are extracted f ro m the main data stream at
`I f necessary,
`each N E and used for management purposes.
`
`OA M cells into the main data stream to
`N E ’ s
`communicate with each other. Many other AT M network
`management o r control protocols also u s e OA M cells and ded-
`icated VP’s/virtual channels (VC’s) for information exchange
`between NE’s. Therefore, w e selected this application. Another
`the operations a r e not completely
`is that
`of
`r e a s o n
`s o m e
`standardized and require s o m e degree of flexibility [11], [12].
`Moreover,
`improving the performance of
`these operations is
`most effective for achieving high quality telecommunication
`services [13].
`Basic OA M procedures a r e
`to pick up OA M cells from the
`data stream and execute corresponding operations indicated
`in the cell a s shown in Fig. 16. These basic operations a r e :
`D I S C A R D (take and ignore the OA M cell), D RO P (take the
`OA M cell and invoke s o m e operations instructed byt h e cell),
`invoke s o m e operations
`(observe the OA M cell,
`M O N I TO R
`instructed by the cell), and THROUGH (the node does not
`exccutc any operation by the ccll, only forward the ccll
`to
`other nodes).
`implementation style of
`Fig. 17 shows the conventional
`OA M processing o n a network node [14]. The present network
`transport processing unit (hardware part) and
`nodec o n s i s t s ofa
`
`MS_SRC-SRMT_0105167
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2092, p. 9
`
`
`
`I E E E T R A N S AC T I O N S O N V E RY L A R G E S C A L E I N T E G R AT I O N (VLSD SYSTEMS, VOL. 6, NO. 2, J U N E 1998
`
`Cell Operation
`
`Cell Operation
`Part-2
`
`OAM Cell
`
`i >
`
`Algorithm Part
`
`data channel
`
`2-Port SRAM
`
`>
`
`2-Por t S R A M
`
`"C" data type
`(easy a c c e s s
`
`s t r u c t
`
`s t r u c t
`
`pointerl i n k
`
`Data structure
`Transformation
`
`Fig. 19.
`
`Implementation model of OAM c e l l processing o n YARDS.
`
`s w|P H Y| 2-Por t SRAM
`MPU|FPGA
`2-Port SRAM|| SW
`
`a w o r k s t a t i o n . The transport processing unit handles the main
`data stream and performs most low layer protocol operations.
`The work station mainly executes the control algorithm. These
`two modules a r e connected to a L A N such a s Ethernet. W h e n
`OA M cells are terminated and extracted by the transport
`processing unit, they a r e sent to the wo r k station using the local
`network. The turn-around time c a n exceed 1 0 0 - 5 0 0 ms. This
`a significant delay in services. For example,
`overhead c a u s e s
`path restoration upon network failure c a n take a few minutes
`[14]. According to previous research, restoration should take
`to minimize the damage to multimedia data
`2 s e c at most
`communication using real-time video o r voice [13]. Thus,
`the
`bottleneck of the present system is the loose coupling between
`the system.
`hardware and software part of
`Fig. 18 shows t h e function block diagram of basic OA M cell
`processing.
`It comprises three parts. The first is cell operation
`part-1, whicht r e a t s the main data stream and processes trans-
`port data such as AT M cell termination and generation. The
`second is cell operation part-2, which treats only OA M c