`(19) World Intellectual Property
`International Bureau
`( 43) International Publication Date
`31 December 2003 (31.12.2003)
`(10) International Publication Number
`WO 2004/001615 A1
`(51) International Patent Classification7:
`G06F 13/10
`(21) International Application Number:
`PCT /SE2002/00 1225
`(22) International Filing Date:
`19 June 2002 (19.06.2002)
`CZ, DE, DK, DM, DZ, EC, EE, ES, FI, GB, GD, GE, GH,
`GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC,
`LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW,
`MX, MZ, NO, NZ, OM, PH, PL, PT, RO, RU, SD, SE, SG,
`SI, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ,
`VN, YU, ZA, ZM, ZW.
`(25) Filing Language:
`(26) Publication Language:
`(71) Applicant (for all designated States except US): TELE(cid:173)
`S-126 25 Stockholm (SE).
`(84) Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW),
`Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European patent (AT, BE, CH, CY, DE, DK, ES, FI, FR,
`GB, GR, IE, IT, LU, MC, NL, PT, SE, TR), OAPI patent
`(BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, ML, MR,
`NE, SN, TD, TG).
`(72) Inventor; and
`(75) Inventor/Applicant (for US only): ANDJELIC, Mario
`[SE/SE]; KransbindarvaGEN 41, S-126 36 Hagersten
`(74) Agents: HEDMAN, Anders Aros Patent AB eta!.; Box
`1544, S-751 45 Uppsala (SE).
`Declaration under Rule 4.17:
`ofinventorship (Rule 4.17(iv))for US only
`with international search report
`(81) Designated States (national): AE, AG, AL, AM, AT, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
`For two-letter codes and other abbreviations, refer to the "Guid(cid:173)
`ance Notes on Codes and Abbreviations" appearing at the begin(cid:173)
`ning of each regular issue of the PCT Gazette.
`··-··-----------~I' ______ ---------- ---,1'·-------- ------------- ------ -- -------------------------
`~ 15
`\ I
`iiiiiiii ---iiiiiiii ---
`r- 3
`(57) Abstract: The invention proposes a network device driver architecture with functionality distributed between kernel space and
`~ user space. The overall network device driver comprises a kernel-space device driver (10) and user-space device driver functionality
`~ (20). The kernel-space device driver (10) is adapted for enabling access to the user- space device driver functionality (20) via a
`kernel-space-user-space interface (15). The user-space device driver functionality (20) is adapted for enabling direct access between
`~ user space and the NIC (30) via a user-space-NIC interface (25), and also adapted for interconnecting the kernel-space-user-space
`interface (15) and the user- space-NIC interface (25) to provide integrated kernel-space access and user-space access to the NIC (30).
`0 The user-space device driver functionality (20) provides direct, zero-copy user-space access to the NIC, whereas information to be
`> transferred between kernel space and the NIC will be "tunneled" through user space by combined use of the kernel-space device
`~ driver (10), the user-space device driver functionality (20) and the two associated interfaces (15,25).

`wo 2004/001615
`5 The present invention generally relates to a network device driver architecture for
`efficient and flexible access to a network interface controller (NIC).
`10 Computer software can generally be divided into two types, operating system software
`and application software. The operating system (OS) can be viewed as a resource
`manager that makes the computer's resources such as processors, memory, input/output
`(I/0) devices and communication devices available to the users. It also provides the base
`functionality upon which application software can be written and executed. Important
`operating system functions include sharing hardware among users, preventing users from
`interfering with each other, resource scheduling, organizing data for secure and rapid
`access, and supporting I/0 functions and network communications.
`The central part of the OS is commonly referred to as the kernel. The kernel is normally
`only a portion of the code of what is commonly thought of as the entire OS, but it is one
`of the most intensively used portions of the code. The kernel defines the so-called user(cid:173)
`space, in which the application software runs, and provides services to user applications,
`including memory management, allocating processing resources, and responding to
`system calls from user applications or processes. Other important kernel functions
`include interrupt handling, process management and synchronization, as well as I/0
`management including network communications.
`Since many different hardware devices can be connected to the computer system, some
`of the I/0 functionality is typically implemented as common functionality that is device
`independent. Device related functionality is then allocated within so-called device

`wo 2004/001615
`drivers. This means that a user application that needs to access a particular hardware
`device, such as a network communication device, makes a system call to the OS, which
`in tum invokes the device driver associated with the hardware device.
`5 A Network Interface Controller (NIC) is a hardware device that is commonly connected
`to computer systems for providing network communication capabilities, such as Ethernet
`or ATM. communication. NIC controllers usually implement lower-level protocols, such
`as layer 1 (PHY) and layer 2 (MAC, LLC) protocols, whereas higher level protocols (e.g.
`the TCP/IP protocol suite) traditionally are allocated in the OS, running in kernel mode.
`10 Moreover, clusters, for example, usually have proprietary protocols running on top of
`Ethernet because TCPIIP (Transport Communication Protocol/Internet Protocol) is not
`very well suited for cluster computing in System Area Networks (SANs). These
`proprietary protocols are generally also running in kernel mode.
`15 However, centralized in-kernel protocol processing prevents user applications from
`realizing the potential raw performance offered by the underlying high-speed networks.
`The performance problem is mainly caused by message copying between user space and
`kernel space, polluted cache, interrupts and non-optimized code. The intensive message
`copying creates a large overhead, especially for short messages, and constitutes the main
`reason for high processor load and low throughput of network subsystems with standard
`operating systems.
`This problem has become more pronounced with the advent of high-performance
`network communication technologies such as Gigabit Ethernet, ATM and Infiniband.
`The main challenge in putting such high-performance communication technologies into
`use lies primarily in building systems that can efficiently interface these network media
`and sustain high bandwidth all the way between two network communicating

`wo 2004/001615
`This has lead the computer industry to develop network device drivers that support NIC
`access directly from user space, avoiding message copying between user space and
`kernel space. The most commonly known example of this type of user-space network
`access architecture is the Virtual Interface Architecture (VIA) developed by Intel
`5 Corporation, Microsoft Corporation and Compaq Computer Corporation. The Virtual
`Interface Architecture (VIA) is an industry standard for System Area Networks that
`supports direct, zero-copy user-space access to the NIC. The VIA Architecture was
`designed to eliminate message copying, per-message interrupts and other kernel
`overhead that have made traditional networked applications become performance
`bottlenecks in the past. As described, e.g. in the specification Intel Virtual Interface (VI)
`Architecture Developer's Guide, September 9, 1998 and the International Patent
`Application WO 00/41358, the VIA Architecture avoids intermediate data copies and by(cid:173)
`the operating system kernel
`to achieve
`latency, high bandwidth
`communication. The VIA model includes a VI consumer and a VI provider. The VI
`consumer typically includes a user application and an operating systems communication
`facility and a VI user agent. The VI provider typically includes the combination of a VI
`NIC and a VI kernel agent. The Virtual Interface (VI) is a direct interface between a VI
`NIC and a user application or process. The VI allows the NIC to directly access the user
`application's memory for data transfer operations between the application and the
`network. The VI generally comprises a send queue and a receive queue, each of which
`can be mapped directly to user address space, thus giving direct user-space access to the
`network level and by-passing the operating system kernel.
`The technical report DART- A Low Overhead ATM Network Inteiface Chip, TR-96-18,
`July 1996 discloses an ATM NIC designed for high bandwidth, low overhead
`communication, by providing direct, protected application access to/from the network.
`The main drawback of the VIA architecture (and similar architectures) is that it requires
`special VIA-enabled NIC controllers, and can not run on off-the-shelf NIC controllers
`such as ordinary Ethernet NIC controllers. Since a lot of functionality for network

`wo 2004/001615
`communication rely on kernel-level protocols such as TCP/IP, both a VIA-enabled NIC
`and an ordinary Ethernet (TCP/IP) NIC are required with the VIA architecture. The VIA
`architecture is thus not optimized for implementation into existing systems, but generally
`requires hardware re-design of existing systems, adding an extra NIC and/or NIC port to
`the system. Re-designing a circuit board, including design, testing, product handling,
`maintenance, spare parts, etc. may easily lead to extra costs in the order of millions of
`The present invention overcomes these and other drawbacks of the prior art
`It is a general object of the present invention to provide efficient and flexible access to a
`network interface controller (NIC), eliminating the CPU as the bottleneck in the
`communication chain.
`It is also an object of the invention to provide an improved and cost-optimized network
`device driver architecture. In particular, it is beneficial if the network device driver
`architecture is suitable for implementation and integration into existing systems.
`Yet another object of the invention is to provide a robust and flexible network device
`driver that is not NIC dependent and works with any off-the-shelfNIC hardware.
`These and other objects are met by the invention as defined by the accompanying patent
`The general idea of invention is to provide an efficient, flexible and cost-effective
`network device driver architecture by means of integrated kernel-space access and user(cid:173)
`space access to the NIC, preferably over the same NIC port. This is accomplished by
`enabling direct user-space access to the NIC, in similarity to user-space network

`wo 2004/001615
`access architectures, and most importantly enabling user-space tunneled access
`between kernel-space and the NIC.
`From an architectural point of view, the novel network device driver architecture
`normally comprises a kernel-space device driver as well as user-space device driver
`functionality. The kernel-space device driver is adapted for enabling access between
`kernel space and user space via a kernel-space-user-space interface. The user-space
`device driver functionality is adapted for enabling direct access between user space
`and said NIC via a user-space-NIC interface. This user-space device driver
`functionality is also adapted for interconnecting the kernel-space-user-space interface
`and the user-space-NIC interface to enable integrated kernel-space access and user(cid:173)
`space access to the NIC. In this way, efficient user-space access to the NIC is obtained,
`while at the same time kernel-level protocols are allowed to run over the same NIC.
`Preferably, the kernel-space device driver has two different operational modes. In the
`first mode, the kernel-space device driver is operable for directly accessing the NIC
`via a kernel-space-NIC interface. In the second mode, also referred to as user-space
`tunneled access mode, the kernel-space device driver is operable for accessing the NIC
`via the user-space device driver functionality.
`Advantageously, the user-space device driver functionality is configured for execution
`in application context of a user application, for example implemented as user library
`functionality. For robustness and security, when the user-space tunneled access mode
`is activated, the operating system orders the kernel-space device driver to switch back to
`the first operational mode if the user application crashes. As a second line of defense, or
`as an alternative, the kernel-space device driver may optionally be provided with a
`watchdog that switches back to the first operational mode if there has been no call from
`the user-space device driver functionality for a predetermined period of time.

`wo 2004/001615
`In a preferred implementation, the kernel-space device driver has two basic building
`blocks, the network device driver core and a kernel space agent. The network device
`driver core is preferably based on a standard network device driver, for example
`obtained from a commercial vendor, with additional functionality for making the
`device driver work in both default mode as well as the user-space tunneled access
`mode of the invention. In default mode, the network device driver core operates as an
`ordinary network device driver, directly accessing the NIC. In user-space tunneled
`access mode, the driver core routes outgoing data to the kernel agent and receives
`incoming data from the kernel agent. The kernel agent manages the kernel-space-user-
`1 0
`space interface, and supports transfer of information to/from the user-space device
`driver functionality. The kernel agent generally comprises functionality common to
`different types of NIC controllers, thus allowing easy adaptation of standard network
`device drivers for a particular NIC to the novel network device driver architecture
`supporting user-space tunneled access between kernel space and the NIC.
`In conclusion, the invention allows simultaneous user-space and kernel-space access to
`the network layer over the same NIC port, thus leading to a reduction of the number of
`required NIC ports and eliminating the need for hardware re-design. By running on top
`of the same NIC, smaller footprint/cost and better network utilization can be achieved.
`20 The novel network device driver architecture is well suited for applications that need
`high performance network communication as well as functionality relying on kernel(cid:173)
`level protocols. Examples of such applications can be found in embedded environments,
`communication systems and so forth.
`It should be understood that the expressions ''NIC access" and "access to the NIC"
`include both sending information to and receiving information from the network level.
`Other benefits of the novel network device driver architecture include:
`Reduced hardware space and power dissipation, which is especially important for
`embedded type of systems;

`wo 2004/001615
`Less cabling;
`Reduced number of ports required on the associated communication switches,
`thus allowing the use of smaller and cheaper switches; and
`Efficient use of bandwidth in the network.
`Further advantages offered by the present invention will be appreciated upon reading of
`the below description of the embodiments of the invention.
`The invention, together with further objects and advantages thereof, will be best
`understood by reference to
`the following description taken together with the
`accompanying drawings, in which:
`Fig. 1 is a schematic general block diagram of a network device driver architecture
`according to a preferred embodiment of the invention;
`Fig. 2 illustrates integrated user-space access and kernel-space access to the NIC
`supported by zero-copy message transfer within the network device driver according to
`the invention;
`Fig. 3 is a schematic block diagram illustrating a preferred realization of the network
`device driver architecture according to the invention;
`Fig. 4 is a schematic flow diagram of a method for network access according to a
`preferred embodiment of the invention;
`Figs. 5-l 0 are simplified views illustrating different traffic cases in the distributed
`network device driver architecture of Fig. 3; and

`wo 2004/001615
`Fig. 11 illustrates a particular example of an overall system implementation.
`5 Throughout the drawings, the same reference characters will be used for corresponding
`or similar elements.
`Fig. 1 is a schematic general block diagram of a network device driver architecture
`according to a preferred embodiment of the invention. The network device driver
`architecture is illustrated in its system environment, including user space, kernel space as
`well as network space.
`The invention proposes a network device driver architecture in which a fraction of the
`standard device driver functionality is distributed to user space providing direct NIC
`communication, and the kernel-space device driver has additional functionality for NIC
`access via user space. The network device driver functionality is thus distributed
`between kernel space and user space, and the overall network device driver comprises
`a kernel-space device driver 10 and user-space device driver functionality 20. The
`kernel-space device driver 10 is adapted for enabling access to the user-space device
`driver functionality 20 via a kernel-space-user-space interface 15. The user-space
`device driver functionality 20 is adapted for enabling direct access between user space
`and the NIC 30 via a user-space-NIC interface 25, and also adapted for interconnecting
`the kernel-space-user-space interface 15 and the user-space-NIC interface 25 to
`provide integrated kernel-space access and user-space access to the NIC 30. The user-
`space device driver functionality 20 provides direct, zero-copy user-space access to the
`NIC, whereas information to be transferred between kernel space and the NIC will be
`"tunneled" through user space by combined use of the kernel-space device driver 10,
`the user-space device driver functionality 20 and the two associated interfaces 15, 25.

`wo 2004/001615
`In this way, efficient user-space access to the NIC 30 is obtained, while at the same time
`kernel-level protocols 45 are allowed to run over the same NIC. The network device
`driver architecture of the invention supports usage of a dedicated NIC port for user-space
`traffic to/from a user application 40, but also supports efficient sharing of the same port
`for both kernel-level protocols and user-level protocols. The possibility of sharing the
`same NIC port generally opens up for cost-optimized solutions. Another important
`benefit of sharing the same NIC port is the possibility to integrate the novel device driver
`architecture into existing systems without hardware modifications. Thus, system re(cid:173)
`design may be avoided, leading to cost savings in the order of several million dollars.
`Preferably, the kernel-space device driver 10 has two different operational modes. In
`the first mode, the kernel-space device driver 10 operates as a standard network device
`driver directly accessing the NIC 30 via a kernel-space-NIC interface 35. In the second
`mode, also referred to as user-space tunneled access mode, the kernel-space device
`driver 10 is operable for accessing the NIC 25 by means of the user-space tunneling
`mechanism described above.
`Advantageously, the user-space device driver functionality 20 is configured for
`execution in application context of a user application 40, for example implemented as
`user library functionality. It is important that the kernel-level protocols 45 are not
`stalled in the case of a user application crash or deadlock. In user-space tunneled
`access mode, the operating system orders the kernel-space device driver 10 to switch
`back to the first operational mode ifthe user application crashes. The kernel-space device
`driver 10 now accesses the same NIC port as the user application did before it crashed.
`25 As a second line of defense, or as an alternative, the kernel-space device 10 driver may
`be provided with an optional software watchdog 12 that switches back to the first
`operational mode if there is no call from the user-space device driver functionality 20 for
`a predetermined period of time. Alternatively, a counter-based hardware watchdog can
`be connected to the network device driver architecture.

`wo 2004/001615
`In a preferred embodiment of the invention, all of the communication interfaces 15, 25
`and 3 5 within the novel network device driver architecture support zero-copy transfer of
`information. For a better understanding of the invention, an example of integrated user(cid:173)
`space access and kernel-space access to the NIC supported by zero-copy message
`transfer within the network device driver will now be described with reference to Fig. 2.
`Each of the interfaces 15, 25 and 35 is preferably based on a shared memory structure,
`for example in the form of buffer queues. Each interface is normally associated with a
`send queue (KTX; TX; NTX) and a receive queue (KRX; RX; NRX). The buffer queues
`are typically adapted for holding pointer information, and accessed by writing for the
`tail and reading from the head. The pointer information points to the real data such as a
`message stored in common memory.
`The information transfer will now be described in the outbound direction from user
`application toNIC, both for user-level protocols as well as for kernel-level protocols. It is
`apparent that the information transfer is similar in the inbound direction.
`In the case of a user-space terminated protocol, a message MSG-1 to be sent from a user
`application 40 to the NIC 30 is stored in common system memory 50 or any other
`memory that can be accessed by the involved system components. A pointer P-1 that
`points (dashed line) to the corresponding memory position in system memory 50 is
`delivered to the user-space device driver functionality 20 together with a request for NIC
`access. The user-space device driver functionality 20 puts the pointer into the TX queue
`(located in user address space) of the user-space-NIC interface 25. The NIC 30
`subsequently consumes the message by reading the pointer from the TX queue and
`performing a direct memory access (DMA) from the corresponding position in the
`system memory 50 to fetch the message.
`In the case of a user application 40 in need of a kernel-level protocol, the user application
`makes a corresponding system call, and the message to be transferred to the NIC 30 is
`copied into kernel-space and handled by the invoked kernel-space protocol45. Once the

`wo 2004/001615
`message MSG-2 is in kernel-space, there will generally be no more message copying.
`Instead, the kernel-level protocol45 delivers a pointer P-2 that points (dashed line) to the
`memory position of the message in system memory 50 to the kernel-space device driver
`10, which inserts the pointer into the KTX queue of the kernel-space-user-space interface
`15. The user-space device driver functionality 20 polls the KTX queue and moves the
`pointer to the TX queue of the user-space-NIC interface 25. Once, the pointer has moved
`to the head of the queue, the NIC 30 will read the pointer and fetch the corresponding
`message through a DMA access to system memory 50.
`Preferably, all buffer queues are allocated in kernel address space by the kernel-space
`device driver. The queues are mapped to the address space of the user-space device
`driver functionality. To make the queues visible to the NIC, they are first mapped to the
`NIC bus address space and the obtained addresses are then written to the specific NIC
`By working with message pointers, instead of complete messages, there will be no actual
`message copying.
`Fig. 3 is a schematic block diagram illustrating a preferred realization of the network
`device driver architecture according to the invention. The kernel-space device driver 10
`preferably has two basic building blocks, a network device driver core (NDD core) 14
`and a kernel space agent 16. Together with the user-space device driver functionality
`20, the NDD core 14 and the kernel agent 16 generally define the overall network
`device driver architecture.
`User-space messages are exchanged between user space and NIC without kernel
`involvement, and since the user-space device driver functionality typically works in
`polling mode, there will be no per message interrupts. Messages originating from
`kernel-level users are tunneled between the NDD core 14 and the NIC 30, via the

`wo 2004/001615
`kernel agent 16, the user-space device driver functionality 20 and the associated
`interfaces 15, 25.
`Most operating systems such as Tru64, Linux, Windows and OSE support some form
`of device driver framework, which comprises a set of rules, interfaces and guidelines
`on how to develop device drivers. These frameworks are well documented and OS
`vendors often supply tools for generating device driver templates, thus saving valuable
`design time and effort for developing new device drivers. The network device driver
`core 14 as well as the kernel agent 16 are generally implemented according to a
`suitable device driver framework.
`The network device driver core 14 is preferably based on a standard network device
`driver, for example obtained from a commercial vendor, with additional functionality
`for making the device driver work in both default mode as well as the user-space
`tunneled access mode of the invention. Source code for the design base network device
`driver can usually be obtained from the device driver vendor, or by using freely
`available source code (Linux, NetBSD and FreeBSD for example). The design base
`adaptation for allowing user-space tunneling can typically be realized by adding about
`50 lines of code ( ~ 1% of the design base code) to the design base device driver. It is
`also possible to design the NDD core 14 in-house by using any of the available tools
`for generating device drivers.
`In default mode, the NDD core 14 operates as an ordinary network device driver,
`directly accessing the NIC.
`In user-space tunneled access mode, the NDD core 14 routes outgoing data to the
`kernel agent 16 and receives incoming data from the kernel agent. The NDD core or
`the user-space device driver functionality preferably also masks interrupts related to
`message processing since the user-space device driver functionality 20 normally works
`in polling mode.

`wo 2004/001615
`Conveniently, the kernel agent 16 performs some initialization procedures, allocates
`contiguous memory, implements the kernel-space-user-space interface 15 as well as
`the interface to/from the NDD core 14, and maps contiguous memory and memory
`mapped configuration and state registers (CSR) to the address space of the user-space
`device driver functionality 20. The kernel agent 16 supports transfer of messages
`between the NDD core 14 and the user-space device driver functionality 20 via the
`kernel-space-user-space interface 15. Since the FIFO queues KTX, KRX of the kernel(cid:173)
`space-user-space interface are allocated in kernel address space and mapped to user
`address space, no message copying is required between the kernel agent 16 and the
`user-space device driver functionality 20. The kernel agent module is generally not
`dependent on the particular NIC used by the system, and can transparently and
`simultaneously support different types of NIC controllers, including Fast Ethernet,
`Gigabit Ethernet and ATM NIC controllers.
`15 The kernel agent 16 may also be adapted for monitoring the status of any process
`using the user-space device driver functionality 20. This makes it possible for the
`kernel agent to order the NDD core 14 to switch back to default mode in the case of a
`user process failure.
`In a typical case, the kernel agent 16 may be realized by approximately 200 lines of
`new code together with about 300 lines of standard device driver framework code.
`As mentioned above, the user-space device driver functionality 20 is a small part of
`the overall device driver functionality, and preferably implemented as user library
`functionality executing in user space. It normally works in polling mode and supports
`direct exchange of messages between user-space and NIC. Typically, the user-space
`device driver functionality may be realized by approximately 200 lines of code.
`The interface between the kernel-level protocols 45 such as TCP/IP and DLI (Data Link
`Interface) on one hand and the NDD core 14 on the other hand is conveniently an

`wo 2004/001615
`existing network device driver API (Application Programming Interface) supplied with
`the OS.
`The interface between the NDD core 14 and the kernel agent 16 is normally an API that
`supports sending/receiving messages over a specific NIC.
`The interface 15 between the kernel agent 14 and the user-space device driver
`functionality 20 is preferably realized as a standard file interface, supporting user-space
`device driver functionality requests for opening a connection towards the kernel agent,
`10 mapping of contiguous buffer memory and memory mapped CSR from the kernel agent
`to application context. If desired, it may also support the watchdog functionality
`implemented in the kernel agent as well as NIC status notification from the kernel agent
`16 to the user-space device driver functionality 20. Message transfer between the kernel
`agent 14 and the user-space device driver functionality 20 is realized by means of a
`shared memory structure, as previously described.
`The interface between the user application 40 and the user-space device driver
`functionality 20 is normally an API that supports sending/receiving messages directly
`between the user address space and the NIC 30, in combination with the FIFO-queue
`based interface 25 between the user-space device driver functionality 20 and the NIC 30.
`This interface can be realized as a standard VI interface.
`Fig. 4 is a flow diagram of a method for network access according to a preferred
`embodiment of the invention. In step Sl, direct access between user space and the NIC is
`provided via a user-space-NIC interface. In step S2, which relates to the default operation
`mode, direct access between kernel space and the NIC may be provided via a kernel(cid:173)
`space-NIC interface. In user-space tunneled access mode, access between

