`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`657
`
`--
`
`the actual data from the disk. The control lines will be used to indicate what
`type of information is contained on the data lines of the bus at each point in the
`transfer. Some buses have two sets of signal lines to separately communicate
`both data and address in a single bus transmission. In either case, the control
`lines are used to indicate what the bus contains and to implement the bus pro(cid:173)
`tocol. And because the bus is shared, we also need a protocol to decide who
`uses it next; we will discuss this problem shortly.
`Let's consider a typical bus transaction. A bus transaction includes two
`parts: sending the address and receiving or sending the data. Bus transactions
`are typically defined by what they do to memory. A read transaction transfers
`data from memory (to either the processor or an I/O device), and a write trans(cid:173)
`action writes data to the memory. Clearly, this terminology is confusing. To
`avoid this, we'll try to use the terms input and output, which are always defined
`from the perspective of the processor: an input operation is inputting data
`from the device to memory, where the processor can read it, and an output op(cid:173)
`eration is outputting data to a device from memory where the processor wrote
`it. Figure 8.7 shows the steps in a typical output operation, in which data will
`be read from memory and sent to the device. Figure 8.8 shows the steps in an
`input operation where data is read from the device and written to memory. In
`both figures, the active portions of the bus and memory are shown in color, and
`a read or write is shown by shading the unit, as we did in Chapter 6. In these
`figures, we focus on how data is transferred between the I/O device and
`memory; in section 8.5, we will see how the I/O operation is initiated.
`
`Types of Buses
`Buses are traditionally classified as one of three types: processor-memory buses,
`I/0 buses, or backplane buses. Processor-memory buses are short, generally high
`speed, and matched to the memory system so as to maximize memory(cid:173)
`processor bandwidth. I/O buses, by contrast, can be lengthy, can have many
`types of devices connected to them, and often have a wide range in the data
`bandwidth of the devices connected to them. I/O buses do not typically inter(cid:173)
`face directly to the memory but use either a processor-memory or a backplane
`bus to connect to memory. Backplane buses are designed to allow processors,
`memory, and I/O devices to coexist on a single bus; they balance the
`demands of processor-memory communication with the demands of I/O
`device-memory communication. Backplane buses received
`their name
`because they were often built into the backplane, an interconnection structure
`within the chassis; processor, memory, and I/O boards would then plug into
`the backplane using the bus for communication.
`Processor-memory buses are often design-specific, while both I/O buses
`and backplane buses are frequently reused in different machines. In fact, back(cid:173)
`plane and I/O buses are often standard buses that are used by many different
`
`Memory
`
`Memory
`
`Memory
`
`a.
`
`b.
`
`c.
`
`Control lines
`
`Data lines
`
`Disks
`
`Control lines
`
`Data lines
`
`Disks
`
`Control lines
`
`Data lines
`
`Disks
`
`Processor
`
`Processor
`
`Processor
`
`FIGURE 8.7 The three steps of an output operation. In each step, the active participants in the com(cid:173)
`munication are shown in color, with the right sid e shaded if the device is doing a read and the left side
`shad ed if the device is doing a write. Notice that the data lines of the bus can carry both an address (as in
`a_) and_data (as inc). (a) The first step in an output operation initiates a read from memory. The control
`Imes _signal a read request to memory, while the data lines contain the add ress. (b) During the second
`step man output operation, memory is accessi ng the data. (c) In the third and final step in an output
`operation, memory transfers the data usi ng the data lines of the bus and signals that the data is available
`to the 1/0 device using the control lines. The device stores the data as it appea rs on the bus.
`
`computers manufactured by G.ifferent companies. By comparison, processor(cid:173)
`memory buses are often proprietary, although in many recent machines they
`may be the backplane bus, and the standard or I/O buses plug into the
`processor-memory bus. In many recent machines, the distinction among these
`bus types, especially between backplane buses and processor-memory buses,
`may be very minor.
`During the design phase, the designer of a processor-memory bus knows all
`the types of devices that must connect to the bus, while the I/O or backplane
`bus designer must design the bus to handle unknown devices that vary in la(cid:173)
`tency and bandwidth characteristics. Normally, an I/O bus presents a fairly
`simple and low-level interface to a device, requiring minimal additional elec(cid:173)
`tronics to interface to the bus. A backplane bus usually requires additional
`logic to interface between the bus and a device or between the backplane bus
`
`INTEL - 1012
`
`
`
`658
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`659
`
`Memory
`
`Memory
`
`a.
`
`b.
`
`Control lines
`
`Data lines
`
`Disks
`
`Control lines
`
`Data lines
`
`Disks
`
`Processor
`
`Processor
`
`FIGURE 8.8 An input operation takes less active time because the device does not need
`to wait for memory to access data. As in the previous figure, the active participants in each
`step in the communication are shown in color, with the right side shaded if the device is doing a
`read and the left side shaded if the device is doing a write. (a) In the first step in an input opera(cid:173)
`tion, the control lines indicate a write request for memory, while the data lines contain the
`address. (b) The second step in an input operation occurs when the memory is ready and signa ls
`the device, which then transfers the data. Typically, the memory will store the data as it receives
`it. The device need not wait for the store to be completed. In the steps shown, we assume that the
`device had to wait for memory to indicate its readiness, but this will not be true in some systems
`that use buffering or have a fast memory system.
`
`and a lower-level I/0 bus. A backplane bus offers the cost advantage of a
`single bus. Figure 8.9 shows a system using a single backplane bus, a system
`using a processor-memory bus with attached I/0 buses, and a system using all
`three types of buses. Machines with a separate processor-memory bus normal(cid:173)
`ly use a bus adapter to connect the I/0 bus to the processor-memory bus. Some
`high-performance, expandable systems use an organization that combines the
`three buses: the processor-memory bus has one or more bus adapters that in(cid:173)
`terface a standard backplane bus to the processor-memory bus. I/0 buses, as
`well as device controllers, can plug into the backplane bus. The IBM RS/6000
`and Silicon Graphics multiprocessors use this type of organization. This orga(cid:173)
`nization offers the advantage that the processor-memory bus can be made
`much faster than a backplane or I/0 bus and that the I/0 system can be ex(cid:173)
`panded by plugging many I/0 controllers or buses into the backplane bus,
`which will not affect the speed of the processor-memory bus.
`
`Synchronous and Asynchronous Buses
`The substantial differences between the circumstances under which a
`processor-memory bus and an I/0 bus or backplane bus are designed lead to
`two different schemes for communication on the bus: synchronous and
`
`Processor
`
`Memory
`
`Backplane bus
`
`1/ 0 devices
`
`Processor-memory bus
`
`Processor
`
`Memory
`
`Bus
`adapter
`
`Processor
`
`Memory
`
`Processor-memory bus
`
`Bus
`adapter
`
`Backplane
`bus
`
`Bus
`ada pter
`
`1/ 0 bus
`
`Bus
`adapter
`
`a.
`
`b.
`
`c.
`
`FIGURE 8.9 Many machines use a single backplane bus for both processor-memory and 1/0 traffic. Some high(cid:173)
`performance machines use a separate processor-memory bus that I/0 buses p lug into. Some systems make use of all
`three types of buses, organized in a hierarchy. (a) A single bus used for processor-to-memory communication, as well as
`communica tion between 1/0 devices and memory. The bus used in older PCs has this structure. (b) A separate bus is
`used for processor-memory traffic. To communicate data between memory and 1/0 devices, the I/0 buses interface to
`the processor-memory bus, usi ng a bus adapter. The bus adapter provides speed matching between the buses. In many
`recent PCs, the processor-memory bus is a PCI bus (a backplane bus) that has 1/0 devices that interface directly as well
`as an I/0 bus that plugs into the PC! bus; the latter is a SCSI bus. (c) A sepa rate bus is used for processor-memory traffic.
`A small number of backplane buses tap into the processor-memory bus. The processor-memory buses interface to the
`backplane bus. This is usually done with a single-chip controller, such as a SCSI bus controller. An advantage of this
`organization is the small number of taps into the high-speed processor-memory bus.
`
`INTEL - 1012
`
`
`
`660
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`661
`
`nsy11chro11011s. If a bus is synchronous, it includes a clock in the control lines
`and a fixed protocol for communicating that is relative to the clock. For exam(cid:173)
`ple, for a processor-memory bus performing a read from memory, we might
`have a protocol that transmits the address and read command on the first
`clock cycle, using the control lines to indicate the type of request. The memory
`might then be required to respond with the data word on the fifth clock. This
`type of protocol can be implemented easily in a small finite state machine.
`Because the protocol is predetermined and involves little logic, the bus can
`run very fast and the interface logic will be small. Synchronous buses have
`two major disadvantages, however. First, every device on the bus must run at
`the same clock rate. Second, because of clock skew problems, synchronous
`buses cannot be long if they are fast (see Appendix B for a discussion of clock
`skew). Processor-memory buses are often synchronous because the devices
`communicating are close, small in number, and prepared to operate at high
`clock rates.
`An asynchronous bus is not clocked. Because it is not clocked, an asynchro(cid:173)
`nous bus can accommodate a wide variety of devices, and the bus can be
`lengthened without worrying about clock skew or synchronization problems.
`To coordinate the transmission of data between sender and receiver, an asyn(cid:173)
`chronous bus uses a handshaking protocol. A handshaking protocol consists of a
`series of steps in which the sender and receiver proceed to the next step only
`when both parties agree. The protocol is implemented with an additional set
`of control lines.
`A simple example will illustrate how asynchronous buses work. Let's con(cid:173)
`sider a device requesting a word of data from the memory system. Assume
`that there are three control lines:
`
`1. ReadReq: Used to indicate a read request for memory. The address is
`put on the data lines at the same time.
`
`2. DatnRdy: Used to indicate that the data word is now ready on the data
`lines. In an output transaction, the memory will assert this signal since
`it is providing the data . In an input transaction, an 1/ 0 device would
`assert this signal, since it would provide data. In either case, the data is
`placed on the data lines at the same time.
`
`3. Ack: Used to acknowledge the ReadReq or the DataRdy signal of the
`other party.
`
`In an asynchronous protocol, the control signals ReadReq and DataRdy are
`asserted until the other party (the memory or the device) indicates that the con(cid:173)
`trol lines have been seen and the data lines have been read; this indication is
`made by asserting the Ack line. This complete process is called handshaking.
`Figure 8.10 shows how such a protocol operates by depicting the steps in the
`communication.
`
`ReadReq
`
`Data
`
`Ack
`
`DataRdy
`
`7
`
`The steps in the protocol begin immediately after th e device signals a request by raising ReadReq
`and putting the address on the Data lines:
`
`l. When memory sees the ReadReq line, it reads the address from the data bus and raises Ack to
`indicate it has been seen.
`2. I / 0 device sees the Ack line high and releases the ReadReq and data lines.
`3. Memory sees that ReadReq is low and drops the Ack line to ack nowledge the Readreq signal.
`4. This step starts when the memory has the data ready. It places the d ata from the read req uest
`on the data lines and raises DataRdy.
`5. The I/O device sees DataRdy, reads the data from the bus, and signals that it has the data by
`raising Ack.
`6. The memory sees the Ack signal, drops DataRd y, and releases the data lines.
`7. Finally, the I/O device, seeing DataRdy go low, drops the Ack line, which indicates that the
`transmission is completed.
`
`A new bus transaction ca n now begin.
`
`FIGURE 8.10 The asynchronous handshaking protocol consists of seven steps to read a
`word from memory and receive it in an 1/0 device. The signals in color are those asserted bv
`the I/O device, while the memory asserts the signals shown in black. The arrows label the seve~
`steps and the event that triggers each step. The symbol showing two lines (high and low) at the
`same time on the data lines indicates that the data lines have va lid data at this point. (The symbol
`indicates that the data is valid, but the value is not known. )
`
`An asynchronous bus protocol works like a pair of finite state machines that
`are communicating in such a way that a machine does not proceed until it
`knows that another machine has reached a certain state; thus the two machines
`are coordinated.
`The handshaking protocol does not solve all the problems of communicat(cid:173)
`ing between a sender and receiver that have different clocks. An additional
`problem arises when we sample an asynchronous signal (such as ReadReq).
`This problem, called a synchronization failure, can lead to unpredictable
`behavior; it can be overcome with devices called synchronizers, which are de(cid:173)
`scribed in Appendix B.
`
`INTEL - 1012
`
`
`
`662
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`663
`
`FSM Control for 1/0
`
`Example
`
`Show how the control for an output transaction to an I/0 device from
`memory (as in Figure 8.7) can be implemented as a pair of finite state ma(cid:173)
`chines.
`
`Answer
`
`Figure 8.11 shows the two finite state machine controllers that implement
`the handshaking protocol of Figure 8.10.
`
`If a synchronous bus can be used, it is usually faster than an asynchronous
`bus because of the overhead required to perform the handshaking. An exam(cid:173)
`ple demonstrates this.
`
`Example
`
`Performance Analysis of Synchronous versus Asynchronous Buses
`
`We want to compare the maximum bandwidth for a synchronous and an
`asynchronous bus. The synchronous bus has a clock cycle time of 50 ns,
`and each bus transmission takes 1 clock cycle. The asynchronous bus re(cid:173)
`quires 40 ns per handshake. The data portion of both buses is 32 bits wide.
`Find the bandwidth for each bus when performing one-word reads from a
`200-ns memory.
`
`Answer
`
`First, the synchronous bus, which has 50-ns bus cycles. The steps and times
`required for the synchronous bus are as follows:
`
`1. Send the address to memory: 50 ns
`
`2. Read the memory: 200 ns
`
`3. Send the data to the device: 50 ns
`Thus, the total time is 300 ns. This yields a maximum bus bandwidth of
`4 bytes every 300 ns, or
`4 bytes =
`300 ns
`
`4MB
`-0-.3--se_c_o_n_d_s
`
`= 13.3~
`second
`
`At first glance, it might appear that the asynchronous bus will be much
`slower, since it will take seven steps, each at least 40 ns, and the step cor(cid:173)
`responding to the memory access will take 200 ns. If we look carefully at
`Figure 8.10, we realize that several of the steps can be overlapped with the
`memory access time. In particular, the memory receives the address at the
`end of step 1 and does not need to put the data on the bus until the begin(cid:173)
`ning of step 5; steps 2, 3, and 4 can overlap with the memory access time.
`This leads to the following timing:
`
`Step 1: 40 ns
`
`Steps 2, 3, 4: maximum (3 x 40 ns, 200 ns) = 200 ns
`Steps 5, 6, 7: 3 x 40 ns = 120 ns
`Thus, the total time to perform the transfer is 360 ns, and the maximum
`bandwidth is
`
`4 MB
`4 bytes _
`360 ns - 0.36 seconds
`
`= ll.1 MB
`second
`
`Accordingly, the synchronous bus is only about 20% faster. Of course, to
`sustain these rates, the device and memory system on the asynchronous
`bus will need to be fairly fast to accomplish each handshaking step in
`40 ns.
`
`Even though a synchronous bus may be faster, the choice between a
`synchronous and an asynchronous bus has implications not only for data
`bandwidth but also for an I/0 system's capacity in terms of physical distance
`and the number of devices that can be connected to the bus. Asynchronous
`buses scale better with technology changes and can support a wider variety of
`device response speeds. It is for these reasons that I/0 buses are often asyn(cid:173)
`chronous, despite the increased overhead.
`
`Increasing the Bus Bandwidth
`Although much of the bandwidth of a bus is decided by the choice of a syn(cid:173)
`chronous or asynchronous protocol and the timing characteristics of the bus,
`several other factors affect the bandwidth that can be attained by a single
`transfer. The most important of these are the following:
`
`l. Data bus width: By increasing the width of the data bus, transfers of
`multiple words require fewer bus cycles.
`
`2. Separate versus multiplexed address and data lines: Our example in
`Figure 8.8 used the same wires for address and data; including separate
`lines for addresses will make the performance of writes faster because
`the address and data can be transmitted in one bus cycle.
`
`INTEL - 1012
`
`
`
`664
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`665
`
`New 1/ 0 request
`
`Performance Analysis of Two Bus Schemes
`
`Example
`
`Suppose we have a system with the following characteristics:
`
`1. A memory and bus system supporting block access of 4 to 16 32-bit
`words.
`
`2. A 64-bit synchronous bus clocked at 200 MHz, with each 64-bit
`transfer taking 1 clock cycle, and 1 clock cycle required to send an
`address to memory.
`
`3. Two clock cycles needed between each bus operation. (Assume the
`bus is idle before an access.)
`
`4. A memory access time for the first four words of 200 ns; each addi(cid:173)
`tional set of four words can be read in 20 ns. Assume that a bus
`transfer of the most recently read data and a read of the next four
`words can be overlapped.
`
`Find the sustained bandwidth and the latency for a read of 256 words for
`transfers that use 4-word blocks and for transfers that use 16-word blocks.
`Also compute the effective number of bus transactions per second for each
`case. Recall that a single bus transaction consists of an address transmis(cid:173)
`sion followed by data.
`
`Answer
`
`For the 4-word block transfers, each block takes
`
`1. 1 clock cycle that is required to send the address to memory
`
`2.
`
`200 ns
`5 ns/cycle 40 clock cycles to read memory
`
`3. 2 clock cycles to send the data from the memory
`
`4. 2 idle clock cycles between this transfer and the next
`
`This is a total of 45 cycles, and 256/ 4 = 64 transactions are needed, so the
`entire transfer takes 45 x 64 = 2880 clock cycles. Thus the latency is 2880 cy(cid:173)
`cles x 5 ns / cycle= 14,400 ns. The number of bus transactions per second is
`.
`1 second
`64 transactions x
`1
`4,400 ns
`
`4.44M transactions/ second
`
`The bus bandwidth is
`(256 x 4) b tes x 1 second
`14,400 ns 71.11 MB I sec
`y
`
`FIGURE 8.11 These finite state machines implement the control for the handshaking protocol illus(cid:173)
`trated In Figure 8.10. The numbers in each state correspond to the steps shown in Figure 8.10. The first
`state of the 1/0 device (upper-left corner) starts the protocol when a new 1/0 request is generated, just as in
`Figure 8.10. Each state in the finite state machine effectively records the state of both the device and memory.
`This is how they stay synchronized during the transaction. After completing a transaction, the 1/0 side can
`stay in the last state until a new request needs to be processed.
`
`3. Block transfers: Allowing the bus to transfer multiple words in back-to(cid:173)
`back bus cycles without sending an address or releasing the bus will
`reduce the time needed to transfer a large block.
`
`Each of these design alternatives will increase the bus performance for a
`single bus transfer. The cost of implementing one of these enhancements is one
`or more of the following: more bus lines, increased complexity, or increased
`response time for requests that may need to wait while a long block transfer
`occurs.
`
`INTEL - 1012
`
`
`
`666
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`667
`
`For the 16-word block transfers, the first block requires
`
`1. 1 clock cycle to send an address to memory
`2. 200 ns or 40 cycles to read the first four words in memory
`
`3. 2 cycles to send the data of the block, during which time the read of
`the four words in the next block is started
`
`4. 2 idle cycles between transfers and during which the read of the
`next block is completed
`Each of the three remaining 16-word blocks requires repeating only the
`last two steps.
`
`Thus, the total number of cycles for each 16-word block is 1 + 40 + ~ x
`(2 + 2) = 57 cycles, and 256/16 = 16 transactions are needed, so the entire
`transfer takes, 57 x 16 = 912 cycles. Thus the latency is 912 cycles x 5
`ns/ cycle = 4560 ns, which is roughly one-third of the latency for the case
`with 4-word blocks. The number of bus transactions per second with 16-
`word blocks is
`
`.
`16 transactions x
`
`1 second
`ns = 3.51M transactions/second
`4560
`which is lower than the case with 4-word blocks because each transaction
`takes longer (57 versus 45 cycles).
`
`The bus bandwidth with 16-word blocks is
`
`(256 x 4) bytes x
`
`1 second
`d
`/
`ns = 224.56 MB secon
`4560
`which is 3.16 times higher than for the 4-word blocks. The advantage of us(cid:173)
`ing larger block transfers is clear.
`
`Elaboration: Another method for increasing the effective bus bandwidth when multi(cid:173)
`ple parties want to communicate on the bus is to release the bus when it is not being
`used for transmitting information. Consider the example of a memory read that we
`examined in Figure 8.10. What happens to the bus while the memory access is occur(cid:173)
`ring? In this simple protocol , the device and memory continue to hold the bus during
`the memory access time when no actual transfer is taking place. An alternative proto(cid:173)
`col, which releases the bus, would operate like this:
`1. The device signals the memory and transmits the request and address.
`2. After the memory acknowledges the request, both the memory and device re(cid:173)
`lease all control lines.
`3. The memory access occurs, and the bus is free for other uses during this period.
`
`4. The memory signals the device on the bus to indicate that the data is available .
`
`5. The device receives the data via the bus and signals that it has the data , so the
`memory system can release the bus.
`
`For the synchronous bus with 16-word transfers in the example above, such a scheme
`would occupy the bus for only 272 of the 912 cycles required for the complete bus
`transaction.
`This type of protocol is called a split transaction protocol. The advantage of such a
`protocol is that, by freeing the bus during the time data is not being transmitted, the pro(cid:173)
`tocol allows another requestor to use the bus. This can improve the effective bus band(cid:173)
`width for the entire system, if the memory is sophisticated enough to handle multiple
`overlapping transactions.
`With a split transaction, however, the time to complete one transfer is probably
`increased because the bus must be acquired twice. Split transaction protocols are also
`more expensive to implement, primarily because of the need to keep track of the other
`party in a communication. In a split transaction protocol, the memory system must con(cid:173)
`tact the requestor to initiate the reply portion of the bus transaction, so the identity of
`the requestor must be transmitted and retained by the memory system.
`
`Obtaining Access to the Bus
`Now that we have reviewed some of the many design options for buses, we
`can deal with one of the most important issues in bus design: How is the bus
`reserved by a device that wishes to use it to communicate? We touched on this
`question in several of the above discussions, and it is crucial in designing
`large I/0 systems that allow 1/0 to occur without the processor's continuous
`and low-level involvement.
`Why is a scheme needed for controlling bus access? Without any control,
`multiple devices desiring to communicate could each try to assert the control
`and data lines for different transfers! Just as chaos reigns in a classroom when
`everyone tries to talk at once, multiple devices trying to use the bus simulta(cid:173)
`neously would result in confusion.
`Chaos is avoided by introducing one or more bus masters into the system. A
`bus master controls access to the bus: it must initiate and control all bus re(cid:173)
`quests. The processor must be able to initiate a bus request for memory and
`thus is always a bus master. The memory is usually a slave-since it will re(cid:173)
`spond to read and write requests but never generate its own requests.
`The simplest system possible has a single bus master: the processor. Having
`a single bus master is similar to what normally happens in a classroom-all
`communication requires the permission of the instructor. In a single-master
`system, all bus requests must be controlled by the processor. The steps in(cid:173)
`volved in a bus transaction with a single-master bus are shown in Figure 8.12.
`The major drawback of this approach is that the processor must be involved in
`every bus transaction. A single sector read from a disk may require the proces(cid:173)
`sor to get involved hundreds to thousands of times, depending on the size of
`each transfer. Because devices have become faster and capable of transferring
`
`INTEL - 1012
`
`
`
`668
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`669
`
`Memory
`
`a.
`
`Bus
`
`Processor
`
`Bus request lines
`
`
`
`
`
`
`
`Bus Arbitration
`
`Deciding which bus master gets to use the bus next is called bus nrbitmtio11.
`There are a wide variety of schemes for bus arbitration; these may involve
`special hardware or extremely sophisticated bus protocols. In a bus arbitra(cid:173)
`tion scheme, a device (or the processor) wanting to use the bus signals a bus
`request and is later granted the bus. After a grant, the device can use the bus,
`later signaling to the arbiter that the bus is no longer required . The arbiter can
`then grant the bus to another device. Most multiple-master buses have a set of
`bus lines for performing requests and grants. A bus release line is also needed
`if each device does not have its own request line. Sometimes the signals used
`for bus arbitration have physically separate lines, while in other systems the
`data lines of the bus are used for this function (though this prevents overlap(cid:173)
`ping of arbitration with transfer).
`Arbitration schemes usually try to balance two factors in choosing which
`device to grant the bus. First, each device has a bus priority, and the highest(cid:173)
`priority device should be serviced first. Second, we would prefer that any
`device, even one with low priority, never be completely locked out from the
`bus. This property, called fairness, ensures that every device that wants to use
`the bus is guaranteed to get it eventually. In addition to these factors, more so(cid:173)
`phisticated schemes aim at reducing the time needed to arbitrate for the bus.
`Because arbitration time is overhead, which increases the bus access time, it
`should be reduced and overlapped with bus transfers whenever possible.
`Bus arbitration schemes can be divided into four broad classes:
`(cid:127) Daisy chain arbitration: In this scheme, the bus grant line is run through
`the devices from highest priority to lowest (the priorities are deter(cid:173)
`mined by the position on the bus). A high-priority device that desires
`bus access simply intercepts the bus grant signal, not allowing a lower(cid:173)
`priority device to see the signal. Figure 8.13 shows how a daisy chain
`bus is organized. The advantage of a daisy chain bus is simplicity; the
`disadvantages are that it cannot assure fairness-a low-priority request
`may be locked out indefinitely-and the use of the daisy chain grant
`signal also limits the bus speed.
`(cid:127) Centralized, pnrallel arbitration: These schemes use multiple request
`lines, and the devices independently request the bus. A centralized ar(cid:173)
`biter chooses from among the devices requesting bus access and notifies
`the selected device that it is now bus master. The disadvantage of this
`scheme is that it requires a central arbiter, which may become the bot(cid:173)
`tleneck for bus usage. PCI, a standard backplane bus, uses a central ar(cid:173)
`bitration scheme.
`
`Bus request lines 8 Disks 8
`
`U Disks LJ
`
`L - - -Mem_____,ory 1 -1 -----1-,f ri - - -Bus - r r~L - - -Proce_____,ssor
`
`b.
`
`Memory
`'--------
`
`C.
`
`Processor
`
`FIGURE 8.12 The initial steps In a bus transaction with a single master (the processor).
`A set of bus request Jines is used by the device to commu111cate_ with the processor, which then
`initiates the bus cycle on behalf of the requesting device. The active !mes and u111ts are shown m
`color in each step. Shading is used to indicate the source of a read (memory) or_destmation of a
`
`write (the disk). After step c, the bus cycle continues like a normal read transact10n, as m Figure
`
`8 7 (a) First the device generates a bus request to indicate to the processor that the device wants
`t~ ~se the b~s. (b) The processor responds and generates appropriate bus control signals. For
`example if the device wants to perform output from memory,_ the processor asserts the read
`request lines to memory. (c) The processor also notifies the device that its bus request 1s bemg
`processed; as a result, the device knows it can use the bus and places the address for the request
`on the bus.
`
`at much higher bandwidths, involving the processor in every bus transaction
`has become less and less attractive.
`.
`The alternative scheme is to have multiple bus masters, each of whJCh can
`initiate a transfer. If we want to allow several people in a classroom to talk
`without the instructor having to recognize each one, we must have a protocol
`for deciding who gets to talk next. Similarly, with multiple bus 1:1~sters, ~e
`must provide a mechanism for arbitrating access to the bus so that 1t 1s used m
`a cooperative rather than a chaotic way.
`
`INTEL - 1012
`
`
`
`670
`
`Chapter 8
`
`Interfacing Processors and Peripherals
`
`8.4 Buses: Connecting 1/0 Devices to Processor and Memory
`
`671
`
`--
`
`Highest priority
`
`Lowest priority
`
`Device 1
`
`Device 2
`
`Device n
`
`Grant
`
`Grant
`
`Bus
`arbiter
`
`Grant
`
`Release
`
`Request
`
`FIGURE 8.13 A daisy chain bus uses a bus grant line that chains through each device
`from highest to lowest priority. If the device has requested bus access, it uses the grant line to
`determine access has been given to it. Because the grant line is passed on only if a device does not
`want access, priority is built into the scheme. The name "daisy chain" arises from the structure of
`the grant line that chains from device to device. The detailed protocol used by a daisy cham 1s
`described in the elaboration below.
`
`(cid:127) Distributed arbitration by self-selection: These schemes also use multiple
`request lines, but the devices requesting bus access determine who will
`be granted access. Each device wanting bus access places a code indicat(cid:173)
`ing its identity on the bus. By examining the bus, the devices can deter(cid:173)
`mine the highest-priority device that has ma