`
`
`(12)
`
`US 7,356,680 B2
`(10) Patent No.:
`United States Patent
`Svensson et al.
`(45) Date of Patent:
`Apr. 8, 2008
`
`
`(54) METHOD OF LOADING INFORMATION
`INTO A SLAVE PROCESSORIN A
`
`6,012,142 A *
`6,058,474 A
`
`1/2000 Dokic et al. nce 7 13/2
`5/2000 Baltz et al.
`
`(75)
`
`“
`”
`”
`LOADER.
`Inventors: Mats Svensson, Lund (SE), Peter
`Aulin, Malmé (SE); Niclas Bauer,
`Malmé (SE); Michael Rosenberg,
`Sédra Sandby (SE)
`
`(73) Assignee: Telefonaktiebolaget L M Ericsson
`(publ), Stockholm (SE)
`
`(#) Notice:
`
`Subject to any disclaimer, the termofthis
`patent is extended or adjusted under 35
`USC. 154(b) by 482 days.
`
`.
`EP
`
`21)
`(21)
`(22)
`
`Appl. No.: 11/040,798
`Appl.
`No
`40,
`Filed:
`Jan. 22, 2005
`
`6,490,722 BI
`6,601,167 Bl
`6,684,397 Bl
`6,691,216 B2
`
`12/2002 Barton etal.
`7/2003 Gibson etal
`1/2004 Byeret al.
`2/2004 Kelly et al.
`
`(Continued)
`TS
`DOAEERP
`PEER
`nent
`DAPI
`PAPArE
`FOREIGN PATENT DOCUMENTS
`
`A620
`|
`1460539 A2
`
`.
`9/2004
`
`(Continued)
`OTHER PUBLICATIONS
`
`(65)
`
`Prior Publication Data
`US 2006/0168435 Al
`Tul. 27. 2006
`Int. Cl
`(51)
`(2006.01)
`" GOGF 9/00
`(52) US. Ch. ee sesseteeenene
`4713/1, 713/2; 713/100;
`709/208, 709/212; 709/213
`(58) Field of Classification Search ........000 713/11,
`713/2, 100; 709/208, 212, 213
`See application file for complete searchhistory.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`7/1990 Koppet al.
`4,943,911 A
`11/1991 Bruckert et al.
`5,068,780 A
`5,155,833 A * 10/1992 Cullison et al. wc 713/2
`5,347,514 A *
`9/1994 Davis etal. wie 370/429
`5,652,886 A
`7/1997 Tulpule et al.
`5,754,863 A *
` S/L998 Reuter cece FATAT3
`5,799,186 A
`8/1998 Compton
`5,835,784 A
`11/1998 Gillespie et al.
`5,944,820 A
`8/1999 Beelilz
`
`PCTInternational Search Report, mailed Aug. 8, 2006, in connec-
`tion with International Application No. PCT/EP2006/000351.
`(Continued)
`Primary Examiner--Thomas Lee
`Assistant Examiner-Maleolm D Cribbs
`(74) Attorney, Agent, or Firm-—-Potomac Patent Group
`PLLC
`(57)
`
`.
`-
`ABSTRACT
`
`A conventional bootloader can conflict with the operating
`system (OS) of a multi-processor system. An OS-friendly
`bootloader and methodsare described that integrate an OS
`with a bootloader in any system in which a host processor
`and a client processor have a communication mechanism
`that requires the OS for the mechanism to work and the
`client has two memory systems: onevisible to both host and
`client and one visible only to the client.
`
`23 Claims, 2 Drawing Sheets
`
`
`
`Non-Volatile
`
`
`
`a 100
`
`
`
`Int.
`|
`DSP
`Store;
`SARAM
`
`
`Area |& DARAM
`
`108
`
`DSP
`XRAM
`
`440
`
`104
`
`INTEL 1210
`
`
`
`US 7,356,680 B2
`Page 2
`
`6,760,785 Bl
`6810478 Bl
`2002/0059560 AL*
`2002/0138156 Al
`2003/0126424 Al
`2004/0059906 Al
`2004/0088697 Al
`2004/0215952 Al
`
`U.S. PATENT DOCUMENTS
`7/2004
`10/2004
`5/2002
`9/2002
`712003
`3/2004
`5/2004
`10/2004
`
`Powderly et al.
`Anandetal.
`Philips oo. ese eeereees FAT/L24
`Wongetal.
`Horanzyet al.
`Park et al.
`Schwartz etal.
`Oguma
`
`FOREIGN PATENT DOCUMENTS
`4/2001
`
`01/27753 A2
`
`WO
`
`OTHER PUBLICATIONS
`
`in connection with
`PCT Written Opinion, mailed Aug. 8, 2006,
`International Application No. PCT/EP2006/00351.
`Ilyde,
`J., “low to Make Pentium Pros Cooperate”, BYTT,
`McGraw-Hill, Inc. St. Peterborough, US, vol. 21, No. 4, Apr. 1996,
`pp. 1770178, XP000586039.
`Winderweedle, B. et al., “TMS320VC5470/5471 Bootloader Appli-
`cation Report”, Texas Instruments, Dallas, TX, SPRA376, Jun.
`2002.
`“OMAPS910 Dual-Core Processor DSP Subsystems Reference
`Guide”, Texas Instruments, Dallas, TX, SPRU672, Oct. 2003.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Apr.8, 2008
`
`Sheet 1 of2
`
`US 7,356,680 B2
`
`106
`
`400
`
`~~
`
`Non-Volatile
`
`| |
`
`wee a te ee ne cee ee ne es te ee he ec
`
`
`
`me Aer
`
`
`ARM
`ewe ree mee ee ne yee ee a ee nee
`
`
`
`
`DSP
`
`
`DSP
`Int.
`|
`SARAM
`Store;
`Area | & DARAM
`
`108
`
`DSP CPU
`
`4
`
`10
`
`DSP
`
`110
`
`FIG. 1
`
`
`tenth
`a
`
`
`
`Dest.
`
`Addr.
`
`|
`
`7
`
`------
`
`~
`
`Header
`
`a
`
`Transfer Block
`
`FIG. 3
`
`intermediate
`Storage
`Area
`
`
`
`U.S. Patent
`
`Apr. 8, 2008
`
`Sheet 2 of 2
`
`US 7,356,680 B2
`
`Reset and Hold
`Slave Processor
`
`Push tnfo to
`Slave Processor
`
`
`
`
`
`Push
`
`
`Complete?
`
`202
`
`204
`
`206
`
`Yes
`
`Boot Slave
`
`Processor
`
`Start OSin
`Slave Processor
`
`Reserve Intermed.
`Storage Area
`
`Send Message to
`Host Processor
`
`Push Info to ISA
`
`Send Message to
`Slave Processor
`
`FIG. 2
`
`Complete?
`
`No
`
`CopyISA to
`"Invisible" Memory
`
`224
`
`Yes
`
`Release Blocks;
`Load Complete
`
`Send Messageto
`Host Processor
`
`226
`
`208
`
`210
`
`212
`
`214
`
`216
`
`218
`
`220
`
`925
`
`
`
`US 7,356,680 B2
`
`1
`METHOD OF LOADING INFORMATION
`INTO A SLAVE PROCESSOR IN A
`MULTI-PROCESSOR SYSTEM USING AN
`OPERATING-SYSTEM-FRIENDLY BOOT
`LOADER
`
`BACKGROUND
`
`This invention relates to initialization of electronic sys-
`tems having multiple programmable processors.
`‘Yhe process of starting, or booting up, an electronic
`system having a programmable processor connected to one
`or more memorydevices for storing program instructions, or
`code, and data is not as simple as it might seem at first
`glance. An important part of the reasonfor this is the need
`for the processor to begin operation in a well-definedstate.
`‘fhe traditional ways ofloading program code and data to
`a bare systemare either by “pushing” the code and data into
`the system’s random-access memory (RAM)directly or by
`using a bootloader. The bootloader, which is sometimes
`called a boot
`loader or a bootstrap loader,
`is a set of
`instructions (i.e, program code, sometimes called “boot
`code”) that can be either “pushed”into the system’s RAM
`in one aspect, a method of
`This invention provides,
`or loaded into the RAM from a non-volatile memory, such
`loading program code into a slave processor in a multi-
`as read-only memory (ROM). Inits execution by the pro-
`processor system that includes a master processor and the
`cessor, the bootloader then “drags” in the rest of the code
`slave processor. The method includes the steps of resetting
`and data and starts the system.
`the slave processorandholding the slave processor in a reset
`Examples of prior mechanisms for starting processor
`state; pushing informationinto a first memory that is acces-
`systems, including bootloaders, are U.S. Pat. No. 5,652,886
`sible by the master and slave processors; booting the slave
`to Tulpule et al. and U.S. Pat. No. 6,490,722 to Bartonet al.
`processor; starting an operating system in the slave proces-
`and U.S, Patent Application Publication No. US 2002/
`sor,
`including, blocking scheduling of processes having
`0138156 Al
`to Wong et al. Barton et al., for example,
`programcodelocated in a second memory thatis accessible
`describes a two-stage bootloader in which the second stage
`by the slave processor and inaccessible by the master
`finds, verifies, and loads the operating system. In Wong et
`processor; reserving an intermediate storage area in the first
`al., a multiprocessor system uses a master processor coupled
`memory; sending to the master processor information about
`to a ROMto transfer boot code to slave processors, with
`a location andsize of the intermediate storage area reserved;
`memory controllers in the slave processors denying memory
`based on the sent
`information,
`loading the intermediate
`access requests until the boot code has been transferred to
`storage arca with information to be loaded into the second
`their RAMs.
`memory; sending a first message to the slave processor that
`Asindicated by Bartonet al. and Wonget al., for example,
`indicates the intermediate storage area has been loaded and
`starting up a multi-processor system, which can be generally
`whether loading is finished or more information is to be
`considered as having a master or host processor, i.e., the
`loaded: copying informationin the intermediate storage area
`system that orders the boot, and one or moreslave orclient
`wo the second memory; andsending a second messageto the
`processors,
`i.e.,
`the system to be booted,
`is even more
`master processorthat indicates that informationin the inter-
`complicated thanstarting up a single-processor system.
`mediate storage area has been copied.
`Advantages of the “push” method are that it requires no
`In another aspect of the invention, a multi-processor
`code to execute in the slave during boot and that the only
`system includes a host processor, at least one client proces-
`synchronization required is to hold the slave in a reset state
`sor, a first random-access memoryaccessible by the host and
`and release jt when loading is finished. Nevertheless, the
`client processors, a second random-access memory acces-
`“push”? method works only when the memory or memories
`sible by the client processor and not accessible by the host
`of the slave are visible to the host. This visibility can be
`processor, and a bootloader. The first memory includes an
`implemented in several ways. For example, a memory may
`intermediate storage area, and the bootloader includesa host
`be visible on the address and data busses of boththe host and
`part and a client part. The host part is loadable into the first
`the slave processors or direct memory access (DMA)trans-
`wn wa
`random-access memory and hasafirst stage and a second
`fers may be allowed from the host’s memory or memoriesto
`stage. Thefirst stage resets and holdsthe client processor in
`the slave’s memory or memories.
`a reset state and pushes information into the first random-
`When the slave’s memory to be loaded is invisible to the
`access memory. The second stage is initiated bythe client
`host, the “push” method cannot be used. In that situation,
`part, loads the intermediate storage area with information to
`some form of bootloading must be used. As noted above, the
`be loaded to the second random-access memory, and sends
`bootloader technique requires either that boot code can be
`to the client part a first message indicating the intermediate
`pushed onto the slave (which in this case is not possible) or
`storage area is loaded. The client part is loadable into the
`that the slave can load code from a non-volatile memory.
`first random-access memory, starts an operating system
`‘The bootloader then initiates a transfer of code from the host
`including an idle process and initially blocking scheduling
`to the slave and finishes loading the memory.
`ofall processes having programcode located in the second
`Multi-processor systems in which someorall of a slave’s
`random-access memory, copies information loaded into the
`memory is not visible to a host are possible. In such systems,
`intermediate storage area to the second random-access
`it can be advantageousto take advantage of well-established
`
`2
`software frameworks for loading and inter-processor com-
`munication, which rendertraditional bootloaders undesir-
`able. Moreover, a bootloader can conflict with the operating
`system, which can be said to want to have control over the
`entire system and all of the memory.
`Amongthe problems faced whenintegrating a bootloader
`with an operating system (OS) are ensuring that code thatis
`not yet loaded is not executed,efficiently loading code to a
`memory or memoriesinvisible to the host, and synchroniz-
`ing with the host the loading and booting of the slavo(s).
`Moreover,it is necessary to determine which portions ofthe
`system must be loaded to memoriesvisible to both host and
`slave processors and how the binary image to be loaded
`should be arrangedfor the bootloader to work together with
`the OS. Anotherissue that can be importantis the integration
`of the bootloader and the OS, as an already established
`framework for communication betweenhost and slave then
`can be used during loading. Such a framework typically
`would include one or more primitives for communication
`that rely on OS-features.
`
`SUMMARY
`
`10
`
`is
`
`30
`
`40
`
`50
`
`60
`
`65
`
`
`
`US 7,356,680 B2
`
`3
`memory, and sends to the host part a second message
`indicating information has been copied.
`In another aspect ofthe invention, a computer-readable
`medium contains a computer program for loading informa-
`tion into a slave processor in a multi-processor systemthat
`includes a master processor and the slave processor. The
`computer program performsthe steps of resetting the slave
`processor and holding the slave processor in a reset state;
`pushing informationinto afirst memory thatis accessible by
`the master and slave processors; booting the slave processor,
`starting an operating system in the slave processor, including
`blocking scheduling of processes having program code
`located in a second memory that is accessible by the slave
`processor and inaccessible by the master processor, reserv-
`ing an intermediate storage area in the first memory; sending
`to the master processorinformation abouta location and size
`of the intermediate storage area reserved; based on the sent
`information,
`loading the intermediate slorage area with
`information to be loaded into the second memory; sending
`a first message to the slave processor that
`indicates the
`intermediate storage area has been loaded and whether
`loading is finished or more information is to be loaded;
`copying information in the intermediate storage area to the
`second memory: and sending a second message to the
`master processorthat indicates that information in the inter-
`mediate storage area has been copied.
`
`BRIBE DESCRIPTION OF THR DRAWINGS
`
`The various features, objects, and advantages of this
`invention will be understood by reading this description in
`conjunction with the drawings, in which:
`FIG. 1 depicts a multi-processor system,
`FIG, 2 is a flowchart of an OS-tfriendly bootloader; and
`FIG. 3 depicts an example of an organization of an
`intermediate storage area.
`
`DETAILED DESCRIPTION
`
`ie
`
`15
`
`25
`
`30
`
`As noted above, a conventional bootloader can conflict
`with the operating systemof a multi-processor system. This
`application describes an OS-friendly bootloader and meth-
`ods that meet
`the challenge ofintegrating an OS with a
`bootloader in systems in which the host and a client have a
`communication mechanism that requires the OS for the
`mechanismto work andthe client has two memorysystems:
`one visible to bath host and client and one visible onlyto the
`chient.
`1 depicts such a multi-processor system 100 that
`FIG,
`includes a host processor 102 and a client processor 104. it
`will be appreciated that although FIG. 1 shows one client
`processor 104, more can be provided.It will also be appre-
`ciated that
`the host and client processors may be any
`programmable
`electronic processors.
`In the
`example
`depicted in FIG.1, the processor 102 is shownas the central
`processing unit
`(CPU) of an advanced RISC machine
`(ARM), and the processor 104 is shownas the CPU ofa
`digital signal processor (DSP) device. The dashed line in
`FIG. 1 depicts the hardware boundary between the host and
`slave devices, in this example, the ARM and the DSP, and
`also a non-volatile memory106. ‘The memory 106 may be a
`ROM,a flash memory, or other type of non-volatile memory
`device.
`Most commercially available DSP devices include on-
`chip memories, andas indicated in FIG.1, the DSP includes
`“internal” single-access RAM (SARAM)and dual-access
`RAM (DARAM) 108, as well as an “external” RAM
`
`4
`(XRAM)110. Anintermediate storage area, indicated by the
`dashedline, is defined within the memory 108 as described
`in more detail below. The arrows in FIG. 1 indicate access
`paths, ¢.g., busses and DMApaths, between the CPUs and
`the memories. The ARM host CPU 102 can access the
`non-volatile memory 106 and the SARAM and DARAM
`108 ofthe DSP, but not the DSP’s KRAM 110,and the DSP
`slave CPU 104 can access all of the RAMs 108, 110.
`The SARAM and DARAM 108 can be loaded from the
`non-volatile memory 106 by the trivial “push” method.
`When code needs to be loaded to.the XRAM 110 during
`boot, however, a bootloadersolutionis required because the
`XRAM 110 is invisible to, i.c., not accessible by, the CPU
`102 and so boot code cannot be pushed to the XRAM 110.
`As described in more detail below and in connection with
`the flow chart of FIG. 2, an OS-friendly bootloader advan-
`tageously has a host part and a client part that is loaded into
`a memory or memories visible to both the master and the
`slave (e.g., SARAM and DARAM 108).
`The host part of the OS-friendly bootloader may be
`considered. as including two stages or modes of operation.
`Thefirst stage resets and holds the slave 104 inthe reset state
`(Step 202) and pushes information (programinstructions
`and/or data) (Step 204) in the usual way from the non-
`volatile memory 106 into the commonly visible memories
`108. The information pushed into these memoriesis mainly
`the bootloader, the OS, and any necessary start-up code for
`the OS.
`It should be appreciated that an application or
`applications or parts thereof may also be pushedinto these
`memories at start-up and may start executing during the
`loading ofthe “external” memory 110. Whenthis “push”is
`finished (Step 206), the slave 104 is allowed to boot (Step
`208) and to start up the OS(e.g., it is released fromthereset
`state) and its normal communication mechanisms (Step
`210). The host part then awaits a message from the slave,
`whichinitiates operation of its second stage as described in
`more detail below.
`The slave part of the OS-friendly bootloader that is loaded
`(“pushed” by the host part’s first stage) into the commonly
`visible memories 108 starts the operating system, carrying
`out
`the following operations (Step 216). First,
`interrupt
`handlers are created. The code for the interrupt handlers
`must be located in the memory that
`is already loaded
`because an interrupt may occur at any time. Second, data
`structures (e.g., process control blocks and stacks) of com-
`mon processes, i.c., processes that run in both the host and
`the slave, are created. It should be understood that since
`these commonprocesses have not yet executed, their code
`may be loaded at a later time and may very well be located
`in “external”? memory visible only to the slave, e.g., XRAM
`110. Third, the systemidle process is created. The code for
`the idle process must be located in the memory that
`is
`already loaded because the idle process is
`the process
`selected to run by the OSifthere is nothing useful to do.
`Fourth, the scheduling ofat least all processes residing in,
`ie., having program code ordata located in, the “external”
`memory 110 is blocked. Execution of processes residing in
`the “internal”? memory can thus advantageously start or
`continue in parallel with the loading of the “external”
`memory as noted above.It is also possible to stop scheduling
`all processes except the idle process, but this is not neces-
`sary. Making this blocking the last thing done before the OS
`scheduler switches on ensures that the code in these pro-
`cesses will not run whenthe schedulerreleases, Finally, the
`OS scheduler is released, which allows the OS to start
`executing code and scheduling processes. It will be under-
`stood that since at least all external-memory-process sched-
`
`40
`
`45
`
`50
`
`wnma
`
`60
`
`68
`
`
`
`US 7,356,680 B2
`
`va
`
`5
`uling was blocked, all that the OS can nowdo is schedule
`interrupts and the idle process.
`At this point, the slave 104 is partly up and running, The
`slave part of the OS-friendly bootloader has been loaded,
`and the slave’s idle processis executing. The slave’s OS can
`schedule and execute code in response to interrupts and can
`schedule the idle process and any unblocked processes
`having code residing in internal memory. OS mechanisms
`for which all code and data accesses are in memory that has
`already been loaded (SARAM and DARAM 108,
`in this
`example) are available, including the usual communication
`mechanisms. These OS communication mechanisms, being
`high-level abstractions of DMA,shared memory, and struc-
`tured registers, are more capable than simple semaphores
`and enable the host processor to communicate efficiently
`with a processor (the slave) that has not completely started,
`whichis to say 4 processorthat is executing mainly only the
`OS, interrupt services, and processes residing in “internal”
`RAM.
`‘Theidle process reserves a block of memory inthe slave’s
`heap of memory thatis located in the memory visible to the
`host,
`such as “internal” memory 108 (Step 212). As
`described in more detail below,
`this reserved block of
`memory is used for intermediate storage of information
`(code and/or data) to be transferred to the slave-private
`memory,i.e., the memory thatis invisible to the host, such
`as “external” XRAM 110. The slave’s idle process advan-
`tageously uses the established communication mechanisms
`to send to the host (Step 214) information about the address ~
`and size or length of the intermediate storage area reserved
`in the previous step. After sending the information, which
`may be contained in one or more suitable messages, the
`slave blocks, awaiting. a message from the host. While
`“blocked”, the slave does not conduct any further loading ~
`activities until il receives the host’s response.
`It will be understood that whether the slave’s OS acts on
`an interrupt at
`this stage depends on the nature of the
`interrupt. Since many OS mechanisms(like those used to
`communicate with the host, for example) rely on interrupts,
`andit cannot be knownin advance when aninterrupt will
`occur, all interrupt code must have been loaded into “inter-
`nal” memory.Inthat respect, interrupts are served during the
`second stage of the bootloading. Nevertheless,ifan interrupt
`is lo trigger a chain of events such as processes starling lo
`do some data processing and the code or data for those
`processes are located or will be located in “external”
`memory, the interrupt is blocked and the interrupt service
`puts the requestin the “in-queue”ofthat process so that the
`request will be served after booting has finished and that
`process can execute.
`Onreceipt ofthe slave’s information, the second stage of
`the host bootloader fills the intermediate storage area with
`information (code and/or data) to be loaded into the slave's
`invisible memory (Step 216). Code and datais pushed to the
`intermediate storage area in the usual way because this area
`is memory that both processors can access, but the push is
`activated through the OS communication mechanisms.
`The host now sends a message to the slave (Step 218) that
`indicates the intermediate storage area has been loaded and
`whether loading is finished or more code and/or data is
`available. This is the message the slave is waiting for. The
`host in turn now blocks, awaiting a message fromtheslave.
`‘The slave copies the contents of the intermediate storage
`area to appropriate locations in its slave-private memory
`(Step 220),
`thereby implementing its actual loading. The
`
`40
`
`6
`slave then sends a message to the host (Step 222) that
`indicates that
`the slave has copied the contents of the
`intermediate storage area.
`If there is more code and/or data to load (Step 224), this
`cycle of copying and messaging (Steps 216-224) can be
`repeated as many times as required. When the loading is
`finished, i.e., when no more information needs to be copied
`to the slave, the slave releases the blocking of processes that
`were blocked earlier, thereby allowing scheduling ofcode in
`its slave-private memory (Step 226). Loading is now com-
`plete.
`Asdescribed above,the hostfills the intermediate storage
`area in the memory 108 with code anddata that the slave
`further copies fo end destinations in the slave-private
`memory 110. Perhaps the simplest way of doing this is to
`precede all code and data in the intermediate storage area
`witha tag that contains the destination address andlength of
`the block to be loaded. FIG. 3 depicts onc example ofsuch
`an organization ofthe intermediate storage area. A block of
`code and/or data to be transferred into the intermediate
`storage area includes a headerthat indicates the length ofthe
`block and whereit is to be loaded in the slave memory, i.¢.,
`the destination address. As indicated by the dashed lines in
`FIG. 3, several such blocks may be concatenated in the
`intermediate storage area.
`The information (code and data) to be loaded can be
`arranged in many ways in the intermediate storage area and
`memories. Often the information is arranged as blocks of
`consecutive information that are to be loaded to different
`addresses, and thus an arbitrarily chosen size ofthe inter-
`mediate storage area may not match the sizes ofall such
`blocks. SUH,
`iL should be understood thal
`the system will
`operate more efficiently when the intermediate storage area
`is always filled. ‘This means that if the blocks to be loaded
`are smaller than this area, a transfer of several (smaller)
`blocks should be doneat the same time. Lhis also meansthat
`a block should besplitifit is larger than the remaining part
`of the intermediate storage area, and one part transferred to
`the intermediate storage area with the remaining part trans-
`forred in the next block. Moreover, if a block is several times
`larger than the intermediate storage area, it may have to be
`split more than once. All ofthis splitting and concatenation
`is donein the host part of the OS-friendly bootloader in ways
`that are well known to computer scientists. From the point
`of view of dala communications engineers, the host part of
`the OS-friendly bootloader is thus a kind of “transport
`layer”.
`The artisan will understand the benefit ofthis splitting and
`concatenation of information into transfer blocks. Some
`kind of communication mechanismis required to perform
`the actual transfers of information between memories, and
`whatever the mechanism used, fewer large transfers are
`typically preferable to more small
`transfers. A kept-full
`intermediate storage area can make the most efficient use of
`the available bandwidth by advantageously minimizing
`overhead on the communications channel. Each message
`requires some amount of administration and administrative
`information, and so fewer messages means less overhead.
`A good example of the benefit of block splitting and
`concatenation effect is DMA as the communication mecha-
`nism. DMAtypically requires some setup overhead (i.c., it
`takes sometime to set up), but then DMAis veryefficient
`once it has been started because transfers can be carried out
`in minimal CPUcycles. In order to gain the greatest benefit
`fromthe use of DMA,the largest DMA transfer permitted by
`the hardware should be doneevery time. Thus, it is currently
`
`60
`
`65
`
`
`
`US 7,356,680 B2
`
`10
`
`30
`
`40
`
`7
`believed to be advantageous to set the size of the interme-
`diate storage area to the maximum DMAblocksize.
`The host part of the OS-friendly bootloader should
`“know” whento leave its first stage (loading information by
`pushing it
`into memory) and to enter ils second stage
`(loading information through one or more communication
`mechanisms). After all, the host cannot push information
`into memory thatis invisible to it. Although the slave sends
`a message to the host part when it has reached the idle
`process, this may not be cnoughfor the host part to tell the
`slave to start executing. This transition from pushing to
`bootloading will be seen as a change fromthe paradigm of
`passive loading (i.e., no code executing in the slave) to the
`paradigm of active loading (Le., a partly alive, executing
`slave).
`One way for the host part to know when to change stages
`is to tag the code and data to be loaded with information on
`what memory it shall be loaded to. For example, information
`intended forthe invisible memory could include a tag or tags
`that indicate the information is to be loaded to the invisible
`memory. The absence ofsuch a tag could indicate that the
`informationis to be loaded to the visible memory, although
`it will be appreciated that a tag explicitly indicating that the
`information is to be loaded to the visible memorycould also
`be used. This enables the host to do two passes over the 2
`information and load only the information required in each
`pass. Inthefirst pass, things that go into theternal memory
`would be found and loaded, and in the second pass, things
`that go into the external memory would be found and Inaded.
`Another way, which currently appears to be simpler,is to
`arrange the slave-privale memory suchthat all ofit resides
`above (or below) a predetermined address. The information
`to be transferredis then sorted accordingly, with all sections
`of code and data to be loaded to the slave-private memory
`put at the end (or the beginning) ofthe sorted image. Then,
`all the host part ofthe OS-friendly bootloaderhas to dois to
`enter its second stage when an address larger (or smaller)
`than the predetermined (boundary) address is encountered.
`In order to save memory or increase code integrity and
`platform security on the host side, information to be loaded
`to the slave can also be pre-processed in several different
`ways. For example,
`the information may be compressed
`according to a suitable algorithm, thereby reducing the size
`of the memory needed for it on the host side, For another
`example,
`the information may be encrypted,
`thereby
`strengthening platformsecurity, as a potential hacker will
`not be able to disassemble the information easily.
`It
`is
`currently believed that encryption is valuable if the infor-
`mation to be loaded to the slaveis stored in the internal file
`system of the host, where the information is available (at
`least in theory) to anyone.
`it will be understood that OS
`[rom this description,
`mechanisms are available to the slave part of the OS-
`friendly bootloader that is executed by the slave processor
`and that the slave can reuse existing OS-dependent code
`required for communication. Moreover,
`the OS-friendly
`bootloader uses loading resources (e.g., DMA) efficiently,
`with the host part automatically deciding when to switch
`from a first stage, or push mode,
`to a second stage, or
`bootloader mode.
`It is expected that this invention can be implemented in a
`wide variety of environments, including for example mobile
`communication devices. Newer ones of such devices can
`employ the OS-friendly bootloader described here to boot
`their DSPs, which may be provided to handle multimedia
`tasks,
`in cooperation with their main-processor software
`systems.
`
`50
`
`an aA
`
`60
`
`65
`
`8
`The OS-friendly bootloader described here takes the
`operating system into account andactually executes on an
`operating system. The host is fully running whenthe slave
`is booted or re-rebooted. This bootloader does not require
`the host processor to be in a certain state in order to start up
`the slave processor. Indeed, the startup ofthe slave processor
`can be carried out any time during the execution ofthe host
`processor software. The OS-friendly bootloader does not
`need a special executable file that
`is
`run in the slave
`processor while information is being loaded to it from the
`host processor and the host-inaccessible RAM. One execut-
`able is linked to all of the slave processor’s memories. The
`slave is booted before all code is loaded, but code that is
`linked to host-inaccessible memory is not run until
`it
`is
`loaded with the help of code that is linked to the slave
`processors host-accessible memory.
`It will therefore be understood that the OS-friendly boot-
`luader described here also makes it possible to change
`software executing in the slave processorand tostart slave
`execution ofan application software before it is completely
`loaded. One or more application processes can be chosenfor
`“pushing” with the bootloader into the slave processor's
`host-accessible memory, and those processes will start
`executing at the same point in time as the slave processor's
`host-inaccessible memorybegins to be loaded.
`This capability can be important
`in many devices and
`many use cases. In a mobile telephone, for example, such
`use cases include making, acall, receiving a call, compress-
`ing/decompressing speech, playing musicfiles, etc. With the
`OS-friendly bootloader described here, one can load and
`execute new software in the slave processor virtually any-
`time the host processoris running.
`It will be appreciated that procedures described aboveare
`carried out repetitively as necessary. To facilitate under-
`standing, many aspects of the invention are described in
`terms of sequences ofactions that can be performed by, for
`example, elements of a programmable computer system.It
`will be recogaized that various actions could be performed
`by specialized circuits (e.g., discrete logic gates intercon-
`nected to perform a specialized function or application-
`specific
`integrated circuits),
`by program instructions
`executed by one or more processors, or by a combination of
`both.
`Moreover, the invention described here can additionally
`be considered to be embodied entirely within any form of
`computer-readable storage mediumhaving stored therein an
`appropriate set ofinstructions for use by or in connection
`with an instruction-execution system, apparatus, or device,
`such as a computer-based system, processor-containing sys~
`tem, or other system that can fetch instructions fram a
`mediwn and execule the instructions, As used here, a
`“computer-readable medium” can be any means that can
`contain, store, communicate, propagate,