`Introduction to Real-Time Data Processing
`When the Real Time Manager wants to determine if there is enough processing time still
`available to install a new task, it uses a simple algorithm to decide which of the two
`available values, the GPB estimate or the GPB actual value, it should use for each
`module in its calculations. This selection is based on the state of the UseActualGPB flag
`in each module header.
`For smooth algorithms, this flag is always set. The selection algorithm is this: if the actual
`value is non—zero and the flag is set, use the GPB actual Value as the current value;
`otherwise, use the GPB estimate. This algorithm is designed to give the most accurate
`accounting of the available GPB at any given time. However, the estimated value is used
`until the module has a chance to run at least once. After that, the actual Value is
`used, whether it is smaller or larger than the estimate. This is how the GPB system
`automatically adapts to different CPU configurations.
`GPB for Lumpy Algorithms
`The simple approach to GPB used for smooth algorithms does not work for lumpy
`algorithms. A somewhat different approach is required for this case. First, it is necessary
`to separate lumpy algorithms into two different classes: smart lumpy algorithms and
`dumb lumpy algorithms. A smart lumpy algorithm determines cases when it is
`executing code that will result in maximum utilization of GPB. A dumb lumpy algorithm
`cannot determine when this may be the case.
`An example of a smart lumpy algorithm is a multirate modem. There are various stages
`of the modem, including initialization, setup, and data transfer. The maximum GPB use
`is usually taken by one of the steady-state data processing programs. When this
`algorithm is reached, the DSP program calls the GPBSetUseActual routine.
`An example of a dumb lumpy algorithm is a Huffman decoder. This decoder takes
`longer to decode some bit streams than others, and there is no way to tell beforehand
`how long it will take. In fact, the processing time can grow without limit in the case of
`random noise input.
`Two different mechanisms handle these two cases. For smart lumpy algorithms, the DSP
`program knows where the maximum GPB usage is in the code, and so is required to set
`the UseActualGPB flag with the GPBSetUseActual routine. The DSP operating
`system does not actually set the flag until the GPB calculations for this module are
`completed. This forces the Real Time Manager to continue using the estimated value
`until after the peak use frame has occurred. After that, the actual Value correctly reflects
`the processing needed by this module on this hardware configuration. The DSP
`operating system continues to use the peak detection algorithm for computing the actual
`Value, so future peaks may slightly increase the actual value because of Variations in I/ O
`and bus utilization.
`For dumb lumpy algorithms, the DSP program can check on the available processing
`time left in the real-time frame, and shut down the process if an overrun is about to
`happen or has already happened.
`There are two macro calls to the DSP operating system that support the dumb lumpy
`algorithm. The GPBExpectedCycles macro returns the expected processing time;
`the GPBElapsedCycles macro returns the amount of processing time used so far. If
Introduction to Real-Time Data Processing
`the amount used so far is getting close to the expected time, the module must execute
`its processing termination procedure. This procedure should end the processing in
`whatever manner is appropriate for this algorithm. If the processing duration has
`exceeded the time, the UseActualGPB flag should be set, and the processing termina-
`tion procedure should be followed.
`If the dumb lumpy algorithm exceeds its GPB estimate, it may cause a frame overrun. If
`this happens, the offending real-time task that includes this module is set inactive by the
`DSP operating system, and the application is notified by an interrupt. This process is
`described in ”Frame Overruns,” later in this chapter.
`Dumb lumpy algorithms are tricky to program correctly. If at all possible, such
`algorithms should not be done in real time, but in timeshare, where length of execution
`is not a vital factor.
`Fast Execution Versus Real-Time Execution
`A task executes faster as a real-time task than as a timeshare task only if the real-time
`task list is using most of the processing bandwidth of the DSP. In many cases, running in
`the timeshare list will yield more processing time. By carefully analyzing what
`applications need real-time processing and what need ”run as fast as you can go”
`processing, you can decide which tasks should go into the timeshare list. Candidates for
`timesharing normally include tasks such as lossless compression of disk files, graphics
`animation, and video decompression. All such tasks should use as much DSP bandwidth
`as possible, because the more they run the sooner they finish. Such tasks must not be
`confused with real-time tasks, which require a specific amount of data be processed
`during a specific time period.
`Processor Allocation for Timeshare Tasks
`Timeshare processing is considerably different than real-time processing. A timeshare
`task often has no way to determine how much processing time it will have in a given
`frame. It is even possible to load the real-time task list so that no timeshare task
`execution is possible. Bear in mind that it takes a significant amount of processing time
`to load and unload a timeshare task. If there is not sufficient time to perform both
`operations the task will not execute during that frame.
`Two numbers can help an application determine if it is worth installing a timeshare task
`in a given DSP task list. The two numbers are
`I Average timeshare available (ATA). This is effectively the average sleep time that the
`DSP is getting per frame. It represents actual unused DSP processing, averaged over
`several frames.
`Average timeshare used (ATU). This number is effectively the average amount of
`timeshare being consumed by timeshare tasks that are already installed.
`Adding the two numbers above yields the average total timeshare (ATT) available.
`Figure 3-16 diagrams this concept.
Introduction to Real-Time Data Processing
`The application calculates the ATA and ATU using the maximum
`number of cycles for the processor, the number of real—time cycles
`allocated, the number of real—time cycles used during the last frame,
`and the number of timeshare cycles used during the last frame. 0
`Figure 3-16
`Timeshare capacity figures
`10 ms frame
`Allocated real-time GPB
` |
` I Realtime <j| Timeshare (ATU) I:{><j:I Sleep (ATU) I:{>
`As shown in Figure 3-16, the ATT value is not necessarily the difference between the
`frame processing capacity and the GPB allocated to real—time tasks. It is often the case
`that real—time tasks are inactive, or not running at full processing bandwidth. This makes
`additional timeshare processing available.
`The averaging process is used to calculate the timeshare processing numbers because
`they will usually fluctuate with time. The numbers are provided to allow an application
`to determine if installing a timeshare task is effective at any given time.
`Once a timeshare task is installed, it is recommended that the application check the value
`of the ATT every so often to make sure that it is still getting service from its timeshare
`task. Alternatively, the timeshare task itself can report to the application on its activity
`level. The Real Time Manager does not warn the application when timeshare tasks are
`not being executed.
`Frame Overruns
`When several tasks have been installed on the DSP, or if one large and lumpy task is
`installed, and if the estimated GPB requirements are not accurate, it is possible for the
`DSP to still be processing data when the next frame interrupt is received. This results in a
`frame overrun. There are three categories of frame overrun:
`I Category one: the DSP acknowledges the current frame interrupt after the next
`interrupt is received, but before a second interrupt. The DSP misses only one interrupt.
`I Category two: the DSP acknowledges the frame interrupt after two interrupts have
`been received, but before a third interrupt. The DSP misses two interrupts.
`I Category three: the DSP has not acknowledged the frame interrupt for five successive
`interrupts. The DSP misses five or more interrupts.
Introduction to Real-Time Data Processing
`Category One Frame Overrun
`The DSP operating system detects a frame overrun if the interrupt line is asserted before
`it has been acknowledged. When a category one frame overrun occurs the DSP operating
`system attempts to recover during the next frame. Since the DSP operating system
`cannot tell if one or more frames has passed it assumes only one frame has been skipped.
`To recover, the DSP operating system checks all modules for their current GPB usage, the
`task with the module having the worst overage is set inactive, and the application is
`notified. All other clients (such as toolbox routines) are notified that the DSP has skipped
`a frame.
`When an application receives a task inactive message from the Real Time Manager it
`should deallocate the offending module. This will update the DSP Prefs file with the
`correct GPB value for that module. The application can then reallocate the module and
`attempt to reinstall the task. l/Vhen an application receives a skipped frame message it
`can do anything from ignoring it to removing and reinstalling the task.
`Category Two Frame Overrun
`Since the DSP cannot determine how many frames have passed, the external interrupt
`logic must detect a category two frame overrun. To recover, the interrupt logic sends a
`hardware interrupt to the main processor and the Real Time Manager executes its DSP
`overrun recovery code. The Real Time Manager checks with the DSP operating system
`for the offending module and sets it inactive. If the DSP operating system cannot identify
`the worst-case module then the Real Time Manager will determine which module is at
`fault. The Real Time Manager then issues the DSP a reset command and the application
`that installed the offending module is notified that the task is inactive. All other clients
`(such as Toolbox routines) are notified that the DSP has been restarted.
`The application that receives the task inactive message should respond in the same way
`as for a category one overrun. When an application receives the DSP restart message it
`should check the task’s memory for possibly corrupted data or code. The recommended
`response is to remove, rebuild, and reinstall the task.
`Category Three Frame Overrun
`In the event that the DSP does not respond to interrupts by the sixth frame, the frame
`overrun logic will issue a hardware reset to the DSP and I/ O subsystems. In this case
`it is assumed that both the DSP and the main processor have crashed. It is important
`that the DSP and I / O subsystems be reset to prevent possible problems in the output
`subsystems—for example, a fixed sound on the speaker or the telecom system
`left offhook.
`Recovery from a category three frame overrun is impossible. All clients, including
`application and Toolbox routines, must start over and install their tasks from
`the beginning.
Introduction to Real-Time Data Processing
`Data Structures
`As explained in ”Real-Time Processing Architecture,” earlier in this chapter, it is
`important to distinguish DSP modules from DSP tasks:
`I DSP modules are the basic building blocks of digital signal processing software. They
`always include DSP code. They also usually include some data, input and output
`buffers, and parameter blocks. There are an infinite number of combinations possible,
`depending on the desired function.
`A DSP task is made up of one or more DSP modules. The purpose of this grouping
`is to place together, in the appropriate order and with the appropriate I/ 0 buffer
`connections, all of the DSP modules needed to complete a particular job. A DSP task
`will frequently contain only one DSP module.
`The DSP module is provided to the Macintosh program as a resource, and is loaded into
`a DSP task using the Real Time Manager. A DSP task is constructed by the Macintosh
`application using a series of calls to the Real Time Manager. These calls create the task
`structure, load and connect modules in the desired arrangement, allocate the required
`memory, and install the completed task into the DSP task list. The reason for combining
`modules into tasks is to ensure that the combined function is always executed as a set.
`A good example of a task is one that plays compressed speech that was recorded Via the
`telecom subsystem. The data is recorded via the subband decoder at 8 kHz sample rate
`and compressed before being stored on a disk drive. To play the data over the speaker,
`it must be decompressed back to 8 kHz samples, and then the sample rate must be
`converted to 24 kHz data to match the sample rate of the speaker system. A diagram
`of this example is shown in Figure 3-17.
`This task is executed by following the chain of modules from left to right. The task is
`activated or deactivated as a single unit. It is also installed and removed from the DSP
`task list as a unit.
`Figure 3-17
`Task with two modules
`Previous task
`Next task
`1 to 3 sample
`Section Section
Introduction to Real-Time Data Processing
`Sections Defined
`The internal structure of the DSP module is compartmentalized into code and data
`blocks. It is this design of the DSP module that gives the real-time data processing
`architecture its real power and flexibility. Each module is made up of a header and one
`or more sections, as shown in Figure 3-18.
`Figure 3-18
`The module data structure
`The header contains information about the entire module, such as its name, GPB
`information, and control flags. Also included in the header is a count of the number of
`sections in the module. This allows the module data structure to be of variable length.
`Each section also has a name, flags, and data-type fields. In addition, each section has
`pointers to two containers. It is the containers that actually hold the data or code for the
`section. The sections are the building blocks of the module. A section can point to code,
`data tables, Variables, buffers, parameters, work space, or any other resource needed to
`provide the desired function. The only requirement is that the first section must always
`point to code. A simplified diagram of a section is shown in Figure 3-19.
`Figure 3-19
`The section data structure
`Primary container
`Secondary container
Introduction to Real-Time Data Processing
`The section does not contain the actual code or data used by the DSP
`chip. Rather, it is a data structure that contains pointers to the code or
`data block. The DSP operating system uses the section structure and
`flags to cache the actual code or data block as required. 0
`The Section Control Flags and Data Types are used to control caching and manage
`buffers. The connection data is also used for buffer management internally to the Real
`Time Manager. These operations are discussed in ”Buffer Connections Between
`Modules,” later in this chapter.
`The two containers are called the primary container and the secondary container. A primary
`container is always required. The secondary container is optional. The primary container
`is usually allocated in the cache, but can also be in local memory. The secondary
`container is usually allocated in local memory, but in special cases can be allocated in the
`cache. Allocated memory for the containers must be in either local or cache memory.
`The visible caching system moves data from the secondary container to the primary
`container, which is usually moving the contents from local memory to cache memory.
`This is called a cache load. The visible caching system also moves data from the primary
`container to the secondary container, which is usually moving the contents from cache
`memory to local memory. This is called a cache save.
`In cases where no caching is required, only one container is needed. The primary
`container in this case is located in local memory if it contains fixed data or parameters
`for communication between the main processor application and the module, or in cache
`memory if it is simply work space.
`The section concept was developed to facilitate creating modules with generic functions
`that can be used in many different applications. It also forms the basis of the plug-and-
`play module architecture, where input and output data streams can be interconnected
`between off-the-shelf modules to create new functions. In addition, it supports several
`different execution models and is easily adapted to future hardware advances, such as
`significantly larger cache memories and hardware instruction caches.
`With the AutoCache caching model (discussed in ”Visible Caching,” earlier in this
`chapter), the section data is moved from the secondary to the primary container, before
`the module runs, if the Load flag is set. Likewise, the section data is moved from the
`primary to the secondary location after the module runs if the Save flag is set. During
`execution of an AutoCache module, the primary and secondary pointers never change.
`With the DemandCache caching model, two container sections are used much the same
`way as they are used in AutoCache. The only difference occurs when a section is pushed
`or popped.
Introduction to Real-Time Data Processing
`When a section is pushed it changes from a one-container to a two-container section.
`Data is moved from the secondary to the primary location if the Load flag is set. When a
`section is popped it changes from a two-container to a one-container section. Again, data
`is moved from the primary to the secondary container if the Save flag is set.
`Sections and Caching
`The actual operation, with either AutoCache or DemandCache, loads code or data by
`section into the cache prior to its use, and then saves data back from the cache when
`completed. The section data structure contains flags, pointers, and other information to
`support these functions.
`For every section there are two possible containers (buffers): the primary container and
`the secondary container. The caching function moves data between the secondary and
`primary containers prior to module execution, and moves data between the primary and
`secondary containers after module execution. Only the minimum required moves are
`made. For example, it is only necessary to move code into the cache from the secondary
`container. It is not necessary to move it back, assuming the code is not self—modifying.
`A diagram of a sample AutoCache module, including its primary and secondary
`containers, is shown in Figure 3-20. This example shows five sections in the module: the
`program (code) section, state variables, a data table, an input buffer, and an output
`buffer. The first three sections have two containers each, while the last two have only a
`primary container.
`Figure 3-20
`Dual-container AutoCache example
`Sound PRB
`5 sgctions
`Se‘?tl°" 1
`Section 2
`input buffer
`Section 4
`output buffer
`In the example, the code, variables, and table sections are loaded into the cache before
`the code section is executed. After execution completes, only the variables are saved back
`to local memory. It is important to recognize that the input and output buffers are not
`moved, but exist in the cache. This buffer mechanism is described in ”Buffer Connections
`Between Modules” and ”Buffer Connections Between Tasks,” later in this chapter.
Introduction to Real-Time Data Processing
`This discussion of the caching system is primarily applicable to
`AutoCache. More detailed information about AutoCache and
`DemandCache, including the differences between them, is presented in
`"Execution Models,” later in this chapter. 0
`Container Memory Allocation
`The structure of modules and sections requires several different blocks of memory. The
`example shown in Figure 3-20 uses nine different blocks: the module itself, five primary
`containers, and three secondary containers. The module and the secondary containers
`are in local RAM, and the primary containers are in the cache.
`Substantial memory management and allocation effort is needed to support this type of
`data structure. Fortunately for the programmer, the work is done automatically by the
`DSP operating system. The allocation and memory management is done in two phases.
`When the client loads the module into memory from a resource file, the Real Time
`Manager allocates all the required blocks in local memory to hold the structure. In the
`example shown in Figure 3-20, the allocation includes the module itself and three
`secondary containers. The containers are then loaded with data from the resource file.
`This completes the first phase of memory allocation.
`The application must also specify the I/ 0 connections for the module, a process covered
`in ”Buffer Connections Between Modules,” later in this chapter. Once all of this is done,
`the Real Time Manager calls one of its routines to take care of cache allocation; this is the
`second phase of allocation. The task is now ready to install. For DemandCache,
`additional allocation is performed by the DSP operating system at run time.
`There are many factors that the Real Time Manager must take into consideration when
`placing section containers in the cache. First, it must be aware of any reserved memory
`in the cache. This includes areas for the DSP operating system as Well as buffers. Next, it
`must be aware of the bank configuration of the cache. For some DSP implementations, it
`is important to locate different sections in different banks to ensure highest performance
`operation. This is not true for the DSP3210, but it was for the DSP32C and will be true for
`future versions of the DSP3200 family.
`It is important to properly mark the sections for bank preference to ensure correct
`placement for all future DSP320O processors. This takes the form of Bank A and Bank B
`preference flags. If both are set, this indicates that any bank will do. If neither are set, it
`indicates the section should be located outside of the cache. In the example above, the
`program, variables, and table sections (primary containers) are located in Bank A. The
`I / 0 sections (primary containers) are located in Bank B. The architectural concept
`behind this bank organization is explained in the AT&T DSP3210 manual.
`Other allocation decisions are related to the connections between module I/ O buffers.
`The Real Time Manager attempts to arrange the sections in the cache in such a way as to
`eliminate as much buffer movement as possible. If a buffer can be set and left in one
`place without being moved between modules or tasks, it reduces the overhead for
`maintaining the buffer.
Introduction to Real-Time Data Processing
`A Complete Software Example
`Figure 3-21 diagrams a typical structure of digital signal processing software with
`sections, modules, and tasks. It shows a dual task list (real-time and timeshare) and adds
`multiple DSPS and DSP client controls.
`Figure 3-21 shows two DSP devices (two separate DSP subsystems), where the structure
`detail is shown only for the first device. The first device might be a DSP located on the
`main logic board and the second device might be a DSP located on either a PDS or
`NuBus card. In machines not having a DSP on the main logic board both DSPS would be
`located on accessory cards.
`For each device, there can be a number of clients. A DSP client is either a system Toolbox
`routine or an application that wishes to use a DSP. An application cannot use a DSP
`without first signing in as a client. The client must sign in to each device that it intends
`to use.
`Each device has two task lists. The primary one is for real-time task execution; it is
`executed once and only once in each frame. The Real Time Manager ensures that the
`clients do not install too much work in this list, so that the entire list can always be
`executed by the end of the frame.
`The second list is the timeshare task list. It is executed using any time left over in each
`frame after all real-time tasks have been run. The DSP operating system will repeatedly
`execute timeshare tasks until it either runs out of time (the next frame begins) or until it
`makes it through the list once without finding anything to do. If the DSP operating
`system does not find an active task prior to the frame ending, the DSP is put into sleep
`mode until the start of the next frame.
`Data Buffering
`In digital signal processing it is often desirable to connect input and output buffers from
`several different algorithms, using signal flow techniques. There are routines in the Real
`Time Manager that accomplish this. The programmer needs to specify the number and
`format of these buffers (for example, input or output buffers, 32-bit floating-point
`format, other formats). The buffers can be connected at run time to similar buffers in
`other, separately designed, algorithms. The application makes calls to the Real Time
`Manager to specify which connections are desired. The Real Time Manager must attempt
`to connect these buffers in an efficient manner to minimize the loss of DSP time used in
`moving buffers around.
Introduction to Real-Time Data Processing
`Figure 3-21
`Data structure overview
`DSP globals
`DSP device
`Real-time task list
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
`DSP module
Introduction to Real-Time Data Processing
`FIFO Buffers
`First-in, first-out (FIFO) buffers are used to buffer data between processors or processes.
`Essentially a FIFO is an asynchronous buffer. In the sound player example (see ”Software
`Architecture,” earlier in this chapter), FIFOS are used as buffers between the main
`processor application and the DSP system for music, speech, and sound—effect data.
`Likewise, a FIFO is used between the DSP and the speaker I/ O port, as shown in
`Figure 3-22.
`Figure 3-22
`Example of FIFO buffers
`Music \ CD-XA \
`FIFO j>
`Subband >
`3 to 24 SRC
`audk, mixer
`24 KHZ
`:> FIFO
`Sound effect
`FIFO > 22.254 to 24 SRC
`Figure 3-22 shows how FIFOs can be used in a typical application. The speaker FIFO is
`required because the DSP must keep one frame ahead of the audio serial DMA port. The
`data FIFOs are necessary because of the slow response time of the disk drive and main
`processor application. Typically, a buffer in the range of 20 KB to 40 KB is used to buffer
`the disk to the DSP, depending on the data rate. The disk fills the buffer, and the DSP
`removes a block every frame. When the FIFO is half empty, the DSP operating system,
`which handles the FIFO for the DSP module, sends a message to the main processor
`application. This message tells the main processor application to refill the FIFO from
`the disk.
`FIFOs are also used to buffer output from the DSP to a main processor app1ication—in
`sound recorders, for example. They work exactly like the FIFOs described above, except
`in the opposite direction.
`Another use for FIFOs is to handle data that is not synchronized to the frame rate. For
`example, if data is produced at a rate of 22,254.54 samples per second, the amount of
`data per frame is either 222 or 223 samples (at 100 frames per second). Using a FIFO
`allows the processes that are filling and emptying the buffer to read or write exactly the
`amount of data they need. One prime characteristic of any FIFO is its status. It can be
`empty or full, half empty or half full, or it can be overrun (following an attempt to put
`more data into the FIFO than it can contain). An overrun happens if the data consumer
`cannot keep up with the data producer. It is important to make the FIFO large enough to
`prevent this from occurring or provide a mechanism in the application to halt data
Introduction to Real-Time Data Processing
`Large FIFOs are usually placed in main memory, because
`local memory is limited and the data rate is usually small.
`Small FIFOs can be located in DSP local memory. 0
`FIFOs can also be underrun. This happens when the data receiver is not able to read as
`much data from the FIFO as it needs to produce one frame’s worth of data. The FIFO
`routines help by automatically doing a zero fill of the unused buffer. For sound, either in
`DSP floating-point format, 8-bit integer packed format, or 16-bit integer packed format, a
`zero fill is equivalent to silence. For those functions that require it, the actual amount of
`data retrieved is reported.
`FIFOs are accessed by making DsPFIFORead and DSPFIFOWrite calls to the DSP
`operating system. The DSP operating system is responsible for handling status
`conditions, such as empty or full, half-empty or half-full, and overrun or underrun. The
`DSP operating system is also responsible for updating the FIFO pointers, and sending
`messages to the client as required. Typical messages include FIFO Empty (DSP is reading
`from the FIFO) and FIFO Full (DSP is writing to the FIFO). In order for the DSP
`operating system to manage this, the FIFO has a header block called DSPFIFO. This data
`structure is shown in Figure 3-23.
`Figure 3-23
`The FIFO and its data header
`FIFO address and size
`Read pointer
`Write pointer
`Each FIFO requires two separate blocks of memory: the DSPFI FO structure located in
`local memory and the FIFO itself located in either local memory or main memory.
`Usually, large FIFOs are placed in main memory by the client, to conserve the limited
`local memory space.
`You must write to a FIFO to add data to it and you must read from a FIFO to look
`at the data in it. Hence you need two separate move operations for each datum: a
`DSPFI FOWrite and a DSPFI FORead. Usually, two different processors or processes
`are responsible for the two operations. For example, the application playing the
`sound writes to the FIFO, while the sound player task reads from the FIFO.
Introduction to Real-Time Data Processing
`Real-time data processing FIFOs can read from or write to the DSP side only in
`longwords. This restriction is necessary because of the real—time cost of reading bytes
`and reordering them. However, the Real Time Manager supports byte reads and writes
`to FIFOS from the main processor side. It is also important to note that the DSP operating
`system masks the lower two bits of the main processor write pointer (for DSP FIFO
`reads) before using the value to determine the amount of data available in the FIFO.
`Thus, if the main processor writes six bytes to the FIFO, the DSP will process only four
`of them. If the main processor writes another three bytes, the DSP will process four
`more bytes, and so on. This forces all FIFO read /Write operations from the DSP to
`use longwords.
`While the FIFO algorithm is ideal for many buffering operations, it requires the DSP
`operating system to manage the DSPFIFO structure and its flags and pointers and also
`requires dual data movements. These operations make it inefficient for many common
`buffering operations. It was this realization that resulted in the creation of a new type of
`buffer, called an AIAO buffer, described in the next section.
`AIAO Buffers
`AIAO stands for all-in/all-out, a naming convention derived from FIFO. AIAO buffers
`transfer data from one module to another during a given frame. The buffer is transient
`and acts like a data bucket between modules; the first module fills the b

