`
`(12) United States Patent
`US 8,838,949 B2
`(10) Patent No.:
`
`Gupta et al. Sep. 16, 2014
`(45) Date of Patent:
`
`(54) DIRECT SCATTER LOADING OF
`EXECUTABLE SOFTWARE IMAGE FROMA
`PRIMARY PROCESSOR TO ONE OR MORE
`SECONDARY PROCESSOR IN A
`MULTI-PROCESSOR SYSTEM
`
`(58) Field of Classification Search
`CPC ...... G06F 9/4405; G06F 9/445; G06F 15/177
`USPC ....................... 713/1, 2, 100; 712/E9.003, 30
`See application file for complete search history.
`
`(75)
`
`Inventors: Nitin Gupta, San Diego, CA (US);
`Daniel H. Kim, San Diego, CA (US);
`Igor Malamant, San Diego, CA (US);
`Steve Haehnichen, San Diego, CA (US)
`
`(73) Assignee: QUALCOMM Incorporated, San
`Diego, CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 362 days.
`
`EP
`JP
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,978,589 A
`6,079,017 A
`7,447,846 B2
`
`11/1999 Yoon
`6/2000 Han et 31.
`11/2008 Yeh
`
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`2034416 A1
`S63233460 A
`
`3/2009
`9/1988
`
`(Continued)
`OTHER PUBLICATIONS
`
`(21)
`
`Appl. No.: 13/052,516
`
`(22)
`
`Filed:
`
`Mar. 21, 2011
`
`(65)
`
`Prior Publication Data
`
`US 2012/0072710A1
`
`Mar. 22, 2012
`
`(60)
`
`(51)
`
`(52)
`
`Related US. Application Data
`
`Provisional application No. 61/324,035, filed on Apr.
`14, 2010, provisional application No. 61/316,369,
`filed on Mar. 22, 2010, provisional application No.
`61/324,122,
`filed on Apr.
`14, 2010, provisional
`application No. 61/325,519, filed on Apr. 19, 2010.
`
`Int. Cl.
`G06F 15/1 77
`G06F 9/445
`G06F 9/44
`US. Cl.
`
`(2006.01)
`(2006.01)
`(2006.01)
`
`CPC .............. G06F 15/1 77 (2013.01), G06F 9/445
`(2013.01), G06F 9/4405 (2013.01)
`USPC .............. 713/2, 713/1, 713/100, 712039.003,
`712/30
`
`International Search Report and Written OpinioniPCT/US2011/
`02948471SNEP07May 30, 2011.
`
`Primary Examiner 7 M Elamin
`(74) Attorney, Agent, or Firm 7 Peter Michael Kamarchik;
`Nicholas J. Pauley; Joseph Agusta
`
`(57)
`
`ABSTRACT
`
`In a multi-processor system, an executable software image
`including an image header and a segmented data image is
`scatter loaded from a first processor to a second processor.
`The image header contains the target locations for the data
`image segments to be scatter loaded into memory of the
`second processor. Once the image header has been processed,
`the data segments may be directly loaded into the memory of
`the second processor without further CPU involvement from
`the second processor.
`
`23 Claims, 5 Drawing Sheets
`
`Zero Copy Tmnspan F
`
`10w
`
`s: mu m
`
`
`
`
`
`
`
`
`
`
`
`
`PRIMARY PROCESSOR
`309
`
`mmware muspon
`Meahams
`(1.2. U ‘
`
`Phymcal Dam Prpc
`
`(1c. HSVUSB 02131:)
`
`Pmullhm ,
`
`
`Dma
`
`
`Segnent 1mm
`Segment2
`
`
`
` W.
`
`mm1
`
`
`
`
`
`
`INTEL 1101
`
`INTEL 1101
`
`
`
`US 8,838,949 B2
` Page 2
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`7,765,391 B2
`2002/0138156 A1
`2009/0204751 A1
`2010/0077130 A1
`2011/0035575 A1*
`2012/0089814 A1
`
`7/2010 Uemura et 31.
`9/2002 Wong et 31.
`8/2009 Kushita
`3/2010 Kwon
`2/2011 Kwon ............................... 713/2
`4/2012 Guptaetal.
`
`FOREIGN PATENT DOCUMENTS
`
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`KR
`WO
`WO
`WO
`
`H08161283 A
`H09244902 A
`2000020492 A
`2004086447 A
`2004252990 A
`2005122759 A
`2007157150 A
`20070097538 A
`WO2006077068 A2
`2008001671 A1
`2011119648 A1
`
`6/1996
`9/1997
`1/2000
`3/2004
`9/2004
`5/2005
`6/2007
`10/2007
`7/2006
`1/2008
`9/2011
`
`JP
`
`H06195310 A
`
`7/1994
`
`* cited by examiner
`
`
`
`US. Patent
`
`Sep. 16, 2014
`
`Sheet 1 of 5
`
`US 8,838,949 B2
`
`
`
`
`
`
`
`
`
`boEmEmfiflo>Emmmogacowmomfigfi
`
`wc‘wk
`
`
`
`
`
`mom>mmcowmofisEEQO
`
`
`
`
`
`
`
`
`@Em,m...csmm4
`
`wmw38
`
`
`
`m<
`
`@Bmfiowxw
`
`mama:
`
`NNV
`
`
`
`
`
`EEK
`
`
`
`
`
`boEmEmfimmoxficoz\Cchm
`
`
`
`
`
`
`
`95,52
`
`vmw
`
`
` KKK
`Bmmmfioicommofiaxw
`
`J
`mamcommumgghcoo
`
`
`
` WZK
`
`
`
`
`
`
`
`”20%Sam
`
`350
`
`wm.‘
`
`
`
`
`
`Emma“:
`
`wEfiéF/L
`
`omw
`
`Ema
`
`
`
`
`
`
`EmuoE
`
`EnmSomxm
`
`mmm::
`
`mmw
`
`
`
`O‘ZK
`
`
`
`SmmmoofiEmvoE
`
`
`
`0:Emnoma
`
`
`
`
`
`.fiGEwS»mfimmc>éoz«smucoomm
`
`
`
`
`
`fiaEmEmEmB>LowmmocfiEwmoE
`
`
`
`
`
`
`
`NWMK
`
`
`
`EwaoE
`
`mEmSowxm
`
`“(er
`
`mmmE.
`
`
`
`
`
`mama:
`
`m:w,
`
`$388me”2
`
`Emuimmin? m:
`
`
`
`
`
`
`
`M@Q
`
`
`
`US. Patent
`
`41m
`
`02
`
`5
`
`US 8,838,949 B2
`
`
`
`
`
`
`
`
`owncmmE.wane.
`
`
`
`E396anfisomxmmEmEowxm
`
`VrN5—.N
`
`
`
`
`
`
`EEgm
`2En_<
`SN
`
`
`
`N.Cmm
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`20m«90m
`
`
`
`
`
`
`
`
`
`fwomk
`D...H?&>6szm_=m_o>howwmooico=mo=aa<boEmEw_=m_o>LomwmuoiEmnos.
`
`
`
`
`
`
`
`t3N%mamscamuEaEEoo
`ImNNaw255E.38
`
`6.,«mmNa“Wmommm:an"
`
`
`
`1San.8mg95.7ch$8525.5:
`
`
` monkmax
`cosmoEsEEoo
`Lomwwoen.co_fiu__&<Lowmmoohn.£3.22atoak
`
`
`
`
`
`
`vEEn—EMn2OESJUGXMErUCEEOE—2:_0ww>wm__n_
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`EoEms.m_=m_o>.coz
`
`@250
`
`
`
`2EEaves.EflflOS14.
`
`
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 3 of5
`
`US 8,838,949 B2
`
`Zero Copy Transport 1: 10w
`
`PRIMARY PROCESSOR
`
`SECONDARY PROCESSOR
`
`
`"
`309
`I—Iardwm'e Transport
`,
`
`305
`a >
`W
`Mechanigm
`Hardware Transport
`Mechanism
`(it USE
`[Mutt/tit /
`(3.6. USB East)
`COfltmII-sr}
`Memory
`
`
`
`Physical Data Pipe
`I’Iatdwats Buffsr
`
`
`(Le. HSfUSB (labia)
`Hardware Butter
`Imagc
`Partial Data,
`Header
`Segment
`
`Partial Data ,
`Segmam
`
`
`
`Data
`Sentient 1
`I
`-
`
`2: .Dam
`(.Jozritmller
`I
`t‘egmeflt 2
`
`307
`System Mammy 42’
`Data
`
`306
`/
`
`Data
`Segment ‘3
`
`:
`u
`: SegmenM
`l
`mitt
`: transibrrmg)
`
`
`
`Segment 4
`
`Image
`I-ieadex‘
`
`Non-~voiatfle
`Me may
`
`Image
`I-ieader
`
`Sag/m em: 1
`Data
`
`36g} new 4
`Data
`
`Segment 2
`Data
`Segment 3
`Data:
`
`
`Segment 5
`
`
`
`FIG. 3
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 4 of5
`
`US 8,838,949 B2
`
`Receive image header for executabie
`seftware image for secondary reeessor
`that is store . in memory coup ed it: the
`primary processor
`
`Process image header to determine at ieas‘t one
`iocetien Wt hin system memor
`to which the
`secondary processor is coupie to store the at
`ieaet one data segment
`
`
`
`Receive the at ieaet one data segment
`
`Load the at ieest one data segment directiy to the
`determined at ieest one iocatron Within the system
`memory
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Sep. 16, 2014
`
`Sheet 5 of5
`
`US 8,838,949 B2
`
`500
`
`
`
`FiGg 5
`
`
`
`US 8,838,949 B2
`
`1
`DIRECT SCATTER LOADING OF
`EXECUTABLE SOFTWARE IMAGE FROMA
`PRIMARY PROCESSOR TO ONE OR MORE
`SECONDARY PROCESSOR IN A
`MULTI-PROCESSOR SYSTEM
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`2
`
`software may be
`In some multi-processor systems,
`required to be loaded to one processor from another proces-
`sor. For example, suppose a first processor in a multi-proces-
`sor system is responsible for storing to its non-volatile
`memory boot code for one or more other processors in the
`system; wherein upon power-up the first processor is tasked
`with loading the respective boot code to the other process-
`or(s), as opposed to such boot code residing in non-volatile
`memory of the other processor(s). In this type of system, the
`software (e.g., boot image) is downloaded from the first pro-
`cessor to the other processor(s) (e.g., to volatile memory of
`the other processor(s)), and thereafter the receiving process-
`or(s) boots with the downloaded image.
`Often, the software image to be loaded is a binary multi-
`segmented image. For instance, the software image may
`include a header followed by multiple segments of code.
`When software images are loaded, from an external device
`(e.g., from another processor) onto a target device (e.g., a
`target processor) there may be an intermediate step where the
`binary multi-segmented image is transferred into the system
`memory and then later transferred into target locations by the
`boot loader.
`
`In a system in Which the software image is loaded onto a
`target “secondary” processor from a first “primary” proces-
`sor, one way of performing such loading is to allocate a
`temporary buffer into which each packet is received, and each
`packet would have an associated packet header information
`along with the payload. The payload in this case would be the
`actual image data. From the temporary buffer, some of the
`processing may be done over the payload, and then the pay-
`load would get copied over to the final destination. The tem-
`porary buffer would be some place in system memory, such as
`in internal random-access-memory (RAM) or double data
`rate (DDR) memory, for example.
`Thus, where an intermediate buffer is used, the data being
`downloaded from a primary processor to a secondary proces-
`sor is copied into the intermediate buffer. In this way, the
`buffer is used to receive part of the image data from the
`primary processor, and from the buffer the image data may be
`scattered into the memory (e.g., volatile memory) of the sec-
`ondary processor.
`The primary processor and its non-volatile memory that
`stores the boot image for a secondary processor may be
`implemented on a different chip than a chip on which the
`secondary processor is implemented. Thus, in order to trans-
`fer the data from the primary processor’s non-volatile
`memory to the secondary processor (e.g., to the secondary
`processor’s volatile memory), a packet-based communica-
`tion may be employed, wherein a packet header is included in
`each packet communicated to the secondary processor. The
`packets are stored in an intermediate buffer, and some pro-
`ces sing ofthe received packets is then required for that data to
`be stored where it needs to go (e.g., within the secondary
`processor’s volatile memory).
`
`SUMMARY
`
`A multi-processor system is offered. The system includes a
`secondary processor having a system memory and a hardware
`buffer for receiving at a least a portion of an executable
`software image. The secondary processor includes a scatter
`loader controller for loading the executable software image
`directly from the hardware buffer to the system memory. The
`system also includes a primary processor coupled with a
`memory. The memory stores the executable software image
`for the secondary processor. The system further includes an
`interface communicatively coupling the primary processor
`
`10
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`his application claims the benefit of US. provisional
`patent application No. 61/316,369 filed Mar. 22, 2010, in the
`names of MALAMANT et al., US. provisional patent appli-
`cat'on No. 61/324,035 filed Apr. 14, 2010, in the names of
`GL PTA et al., US. provisional patent application No.
`61/324,122 filedApr. 14, 2010, in the names ofGUPTA et al.,
`anc US. provisional patent application No. 61/325,519 filed
`Ap ‘. 19, 2010, in the names of GUPTA et al., the disclosures
`of which are expressly incorporated herein by reference in
`the'r entireties.
`
`TECHNICAL FIELD
`
`'he following description relates generally to multi-pro-
`cessor systems, and more specifically to multi-processor sys-
`tems in which a primary processor is coupled to a non-volatile
`me nory storing executable software image(s) of one or more
`other processors (referred to herein as “secondary” proces-
`sors) system which are each coupled to a dedicated volatile
`me nory, wherein the executable software images are effi-
`ciently communicated from the primary processor to the sec-
`ondary processor(s) in a segmented format (e.g., using a
`direct scatter load process).
`
`BACKGROUND
`
`Processors execute software code to perform operations.
`Processors may require some software code, commonly
`referred to as boot code, to be executed for hooting up. In a
`multi-processor system, each processor may require respec-
`tive boot code for booting up.As an example, in a smartphone
`device that includes an application processor and a modem
`processor, each of the processors may have respective boot
`code for booting up.
`A problem exists on a significant number of devices (such
`as smart phones) that incorporate multiple processors (e.g., a
`standalone application processor chip integrated with a sepa-
`rate modem processor chip). A flash/non-volatile memory
`component may be used for each of the processors, because
`each processor has non-volatile memory (e.g., persistent stor-
`age) of executable images and file systems. For instance, a
`processor’s boot code may be stored to the processor’s
`respective non-volatile memory (e.g., Flash memory, read-
`only memory (ROM), etc.), and upon power-up the boot code
`software is loaded for execution by the processor from its
`respective non-volatile memory. Thus, in this type of archi-
`tecture the executable software, such as a processor’s boot
`code,
`is not required to be loaded to the processor from
`another processor in the system.
`Adding dedicated non-volatile memory to each processor,
`however, occupies more circuit board space, thereby increas-
`ing the circuit board size. Some designs may use a combined
`chip for Random Access Memory (RAM) and Flash memory
`(where RAM and Flash devices are stacked as one package to
`reduce size) to reduce board size. While multi-chip package
`solutions do reduce the needed circuit board foot print to
`some extent, it may increase costs.
`
`
`
`US 8,838,949 B2
`
`3
`and the secondary processor Via which the executable soft-
`ware image is received by the secondary processor.
`A method is also offered. The method includes receiving at
`a secondary processor, from a primary processor Via an inter-
`chip communication bus, an image header for an executable
`software image for the secondary processor that is stored in
`memory coupled to the primary processor. The executable
`software image includes the image header and at least one
`data segment. The method also includes processing, by the
`secondary processor, the image header to determine at least
`one location within system memory to which the secondary
`processor is coupled to store the at least one data segment.
`The method also includes receiving at the secondary proces-
`sor, from the primary processor Via the inter-chip communi-
`cation bus, the at least one data segment. Still further, the
`method includes loading, by the secondary processor, the at
`least one data segment directly to the determined at least one
`location within the system memory.
`An apparatus is offered. The apparatus includes means for
`receiving at a secondary processor, from a primary processor
`Via an inter-chip communication bus, an image header for an
`executable software image for the secondary processor that is
`stored in memory coupled to the primary processor. The
`executable software image includes the image header and at
`least one data segment. The apparatus also includes means for
`processing, by the secondary processor, the image header to
`determine at least one location within system memory to
`which the secondary processor is coupled to store the at least
`one data segment. The apparatus further includes means for
`receiving at the secondary processor, from the primary pro-
`cessor Via the inter-chip communication bus, the at least one
`data segment. Still further, the apparatus includes means for
`loading, by the secondary processor, the at least one data
`segment directly to the determined at least one location within
`the system memory.
`A multi-processor system is offered. The system includes a
`primary processor coupled with a first non-volatile memory.
`The first non-volatile memory is coupled exclusively to the
`primary processor and stores a file system for the primary
`processor and executable images for the primary processor
`and secondary processor. The system also includes a second-
`ary processor coupled with a second non-volatile memory.
`The second non-volatile memory is coupled exclusively to
`the secondary processor and stores configuration parameters
`and file system for the secondary processor. The system far-
`ther includes an interface communicatively coupling the pri-
`mary processor and the secondary processor Via which an
`executable software image is received by the secondary pro-
`cessor.
`
`A multi-processor system is offered. The system includes a
`primary processor coupled with a first non-volatile memory.
`The first non-volatile memory is coupled exclusively to the
`primary processor and stores executable images and file sys-
`tems for the primary and secondary processors. The system
`also includes a secondary processor. The system further
`includes an interface communicatively coupling the primary
`processor and the secondary processor Via which an execut-
`able software image is received by the secondary processor.
`A method is offered. The method includes sending, from a
`memory coupled to a primary processor, an executable soft-
`ware image for a secondary processor. The executable soft-
`ware image is sent Via an interface communicatively coupling
`the primary processor and secondary processor. The method
`also includes receiving, at
`the secondary processor,
`the
`executable software image. The method further includes
`executing, at the secondary processor, the executable soft-
`ware image.
`
`4
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For a more complete understanding of the present teach-
`ings, reference is now made to the following description taken
`in conjunction with the accompanying drawings.
`FIG. 1 is an illustration of an exemplary device within
`which aspects ofthe present disclosure may be implemented.
`FIG. 2 is an illustration of an exemplary device within
`which aspects ofthe present disclosure may be implemented.
`FIG. 3 is an illustration of an operational flow for an exem-
`plary loading process for loading an executable image from a
`primary processor to a secondary processor according to one
`aspect of the present disclosure.
`FIG. 4 is a flowchart illustrating a scatter loading method
`according to one aspect of the present disclosure.
`FIG. 5 is a block diagram showing an exemplary wireless
`communication system in which an embodiment of the dis-
`closure may be advantageously employed.
`
`DETAILED DESCRIPTION
`
`The word “exemplary” is used herein to mean “serving as
`an example, instance, or illustration.” Any aspect described
`herein as “exemplary” is not necessarily to be construed as
`preferred or advantageous over other aspects.
`Certain aspects disclosed herein concern multi-processor
`systems where one primary processor is connected to a non-
`volatile memory storing executable images of one or more
`other processors (referred to herein as “secondary” proces-
`sors) in the system. In such a multi-processor system each of
`the secondary processors may be connected to a dedicated
`volatile memory used for storing executable images, run-time
`data, and optionally a file system mirror.
`Executable images are often stored in a segmented format
`where each segment can be loaded into a different memory
`region. Target memory locations of executable segments may
`or may not be contiguous with respect to each other. One
`example of a multi-segmented image format is Executable
`and Linking Format (ELF) which allows an executable image
`to be broken into multiple segments and each one of these
`segments may be loaded into different system memory loca-
`tions.
`
`In one exemplary aspect a direct scatter load technique is
`disclosed for loading a segmented image from a primary
`processor’s non-volatile memory to a secondary processor’s
`volatile memory. As discussed further below, the direct scat-
`ter load technique avoids use of a temporary buffer. For
`instance, in one aspect, rather than employing a packet-based
`communication in which the image is communicated Via
`packets that each include a respective header, the raw image
`data is loaded from the primary processor to the secondary
`processor. In another aspect, headers are used which include
`information used to determine the target location information
`for the data.
`
`Exemplary Multi-Processor Architecture with Centralized
`Non-Volatile Memoryiwith Reduced Localized Non-Vola-
`tile Memory for File System
`FIG. 1 illustrates a block diagram of a first multi-processor
`architecture 102 in which a primary processor (application
`processor 104) hosts a primary (large) nonvolatile memory
`106 (e.g., NAND flash memory) while a second processor
`(e.g., modem processor 110) has a secondary (reduced or
`minimal) non-volatile memory 114 (e.g., NOR flash
`memory).
`In the communication device architecture 102, the appli-
`cation processor 104 is coupled to a primary non-volatile
`memory 106 and an application processor volatile memory
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`US 8,838,949 B2
`
`5
`108 (e.g., random access memory). The modem processor
`110 is coupled to a secondary non-volatile memory 114 and a
`modem processor volatile memory 112. An inter-processor
`communication bus 134 allows communications between the
`
`application processor 104 and the modem processor 110.
`A modem executable image 120 for the modem processor
`110 may be stored in the application processor (AP) non-
`volatile memory 106 together with the AP executable image
`118 and the AP file system 116. The application processor
`104 may load its AP executable image 118 into the applica-
`tion processor volatile memory 108 and store it as AP execut-
`able image 122. The application processor volatile memory
`108 may also serve to store AP run-time data 124.
`The modem processor 110 has the dedicated secondary
`reduced or minimal) non-volatile memory 114 (e.g., NOR
`flash) for its file system 128 storage. This secondary (reduced
`or minimal) non-volatile memory 114 is smaller and lower
`cost than a flash device capable of storing both the run-time
`modem executable images 120 and the file system 128.
`Upon system power-up,
`the modem processor 110
`executes its primary boot loader (PBL) from the hardware
`boot ROM 126 (small read-only on-chip memory). The
`modem PBL may be adapted to download the modem
`executables 120 from the application processor 104. That is,
`the modem executable image 120 (initially stored in the pri-
`mary non-volatile memory 106) is requested by the modem
`processor 110 from the application processor 104. The appli-
`cation processor 104 retrieves the modem executable image
`120 and provides it to the modem processor 110 via an inter-
`processor communication bus 134 (e.g., inter-chip commu-
`nication bus). The modem processor 110 stores the modem
`executable image 132 directly into the modem processor
`RAM (Random Access Memory) 112 to the final destination
`without copying the data into a temporary buffer in the
`modem processor RAM 112. The inter-processor communi-
`cation bus 134 may be, for example, a HSIC bus (USB-based
`High Speed Inter-Chip), an HSI bus (MIPI High Speed Syn-
`chronous Interface), a SDIO bus (Secure Digital I/O inter-
`face), a UART bus (Universal Asynchronous Receiver/Trans-
`mitter), an SPI bus (Serial Peripheral Interface), an I2C bus
`(Inter-Integrated Circuit), or any other hardware interface
`suitable for inter-chip communication available on both the
`modem processor 110 and the application processor 104.
`Once the modem executable image 120 is downloaded into
`the modem processor RAM 112 and authenticated, it is main-
`tained as a modem executable image 132. Additionally, the
`modem processor volatile memory 112 may also store
`modem run-time data 130. The modem Boot ROM code 126
`
`may then jump into that modem executable image 132 and
`start executing the main modem program from the modem
`processor RAM 112. Any persistent (non-volatile) data, such
`as radio frequency (RF) calibration and system parameters,
`may be stored on the modem file system 128 using the sec-
`ondary (reduced or minimal) non-volatile memory 114
`attached to the modem processor 110.
`Exemplary Multi-Processor Architecture with Centralized
`Non-Volatile Memoryiwith No Localized Non-Volatile
`Memory for File Systems
`FIG. 2 illustrates a block diagram of a second multi-pro-
`cessor architecture 202 in which a primary processor (appli-
`cation processor 204) hosts a primary (large) non-volatile
`memory 206 (e. g., NAND flash memory). The primary non-
`volatile memory 206 may store a modem-executable image
`214 and/or a modem file system 220 for the secondary pro-
`cessor (modem processor 210). The secondary processor
`(modem processor 210) may be configured to request the
`modem-executable image 214 and/or modem file system 220
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`from the primary processor 204. The primary processor 204
`then retrieves the requested modem-executable image 214
`and/or modem file system 220 from the non-volatile memory
`206 and provides it to the secondary processor 210 via an
`inter-processor communication bus 234.
`In this architecture 202, the application processor 204 is
`coupled to the non-volatile memory 206 and an application
`processor volatile memory 208 (e.g.,
`random access
`memory). The modem processor 210 is coupled to a modem
`processor volatile memory 212 but does not have its own
`non-volatile memory. The modem processor volatile memory
`212 stores a file system mirror 228, a modem executable
`image 236, and modem run-time data 230. The inter-proces-
`sor communication bus 231 allows communications between
`
`the application processor 204 and modem processor 210.
`All the executable images 214 and file system 220 for the
`modem processor 210 may be stored in the non-volatile
`memory 206 together with the AP executable image 218 and
`the AP file system 216. The application processor 204 may
`load itsAP executable image 218 into the application proces-
`sor volatile memory 208 and store it as AP executable image
`222. The application processor volatile memory 208 may also
`serve to store AP run-time data 224. The modem file system
`may be encrypted with a modem processor’s private key for
`privacy protection and prevention of subscriber identity clon-
`1ng.
`Upon system power-up, the modem Boot ROM code 226
`downloads both the modem executable image 214 and the
`modem file system 220 from the application processor 204
`into the modem processor volatile memory 212. During nor-
`mal operation, any read accesses to the modem file system
`228 are serviced from the modem processor volatile memory
`212. Any write accesses are performed in the modem proces-
`sor volatile memory 212 as well. In addition, there may be a
`background process running on the modem processor 210
`and the application processor 204 to synchronize the contents
`of the File System 228 in modem processor volatile memory
`212 with the modem file system 220 stored on the non-
`volatile memory 206.
`The primary and secondary processors may periodically
`synchronize the file system in the volatile memory for the
`secondary processor with the corresponding file system in the
`primary non-volatile memory. The first write to the modem
`file system 228 may start a timer (for example, a ten minute
`timer) in the modern processor 210. While this timer is run-
`ning, all writes to the file system 228 are coalesced into the
`modem processor volatile memory 212. Upon expiration of
`the timer, the modem processor 210 copies the file system
`image 228 from volatile memory 212, encrypts it, and alerts
`the application processor 204 that new data is available. The
`application processor 204 reads the encrypted copy and
`writes it to the non-volatile memory 206 into the modem file
`system 220. The application processor 204 then signals the
`modem processor 210 that the write operation is complete. If
`a synchronization operation fails, a present version of the
`modem file system may be used. Synchronization may occur
`periodically (for example, every ninety seconds) or after a
`certain time following a write operation by the modem to its
`file system. To prevent corruption from circumstances such as
`sudden power removal, two copies of the modem file system
`220 may be stored.
`The modem processor 210 may also initiate a “flush”
`operation of the file system mirror 228 to the application
`processor’s non-volatile memory 206. This may occur for a
`number of reasons, including phone power-off, as well as
`sending an acknowledgement message to the network to indi-
`cate acceptance and storage of incoming SMS messages.
`
`
`
`US 8,838,949 B2
`
`7
`File system read operations on the modem processor 210
`are serviced from the modem processor volatile memory 212,
`which reflects the current state of the modem file system.
`Because read operations are more frequent than write opera-
`tions, and write operations tend to occur in “bursts” of activ-
`ity, the overall system load and power consumption may be
`reduced.
`
`The application processor 204, modem processor 210, and
`Hoot loader have specific measures in place to ensure that
`there is always at least one complete file system image avail-
`able in the non-volatile memory 206 at all times. This pro-
`vides immunity to power-loss or surprise-reset scenarios.
`Application of the concepts disclosed herein are not lim-
`ited, to the exemplary system shown above but may likewise
`be employed with various other multi-processor systems.
`Zero Copy Transport flow
`Aspects of the present disclosure provide techniques for
`efficiently loading the executable software images from the
`primary processor’s non-volatile memory to the secondary
`processor’ s volatile memory. As mentioned above, traditional
`loading processes require an intermediate step where the
`binary multi-segmented image is buffered (e.g., transferred
`into the system memory) and then later scattered into target
`locations (e.g., by a boot loader). Aspects of the present
`disclosure provide techniques that alleviate the intermediate
`step of buffering required in traditional loading processes.
`Thus, aspects of the present disclosure avoid extra memory
`copy operations,
`thereby improving performance (e.g.,
`reducing the time required to boot secondary processors in a
`multi-processor system).
`As discussed further below, one exemplary aspect of the
`present disclosure employs a direct scatter load technique for
`loading the executable software images from the primary
`processor’s non-volatile memory to the secondary proces-
`sor’s volatile memory. Certain aspects of the present disclo-
`sure also enable concurrent image transfers with post-transfer
`data processing, such as authentication, which may further
`improve efiiciency, as discussed further below.
`In one aspect, the host primary processor does not process
`or extract any information from the actual image data it sim-
`ply sends the image data as “raw” data to the target, without
`any packet header attached to the packet. Because the target
`secondary processor initiates the data transfer request,
`it
`knows exactly how much data to receive. This enables the
`host to send data without a packet header, and the target to
`directly receive and store the data. In that aspect, the target
`requests data from the host as needed. The first data item it
`requests is the image header for a given image transfer. Once
`the target has processed the image header, it knows the loca-
`tion and size of each data segment in the image. The image
`header also specifies the destination address of the image in
`target memory. With this information, the target can request
`data from the ho st for each segment, and directly transfer the
`data to the appropriate location in target memory. The hard-
`ware controller for the inter-chip communication bus on the
`application processor may add its own low-level protocol
`headers, which would be processed and stripped by the
`modem processor. These low-level headers may be transpar-
`ent to the software running on both processors.
`In one aspect of the present disclosure, the loading process
`is divided into two stages, as illustrated in the exemplary flow
`shown in FIG. 3. FIG. 3 shows a block diagram of a primary
`processor 301 (which may be the application processors 104
`or 204 of FIG. 1 or 2 with their non-volatile memory 106 or
`206) and a secondary processor 302 (which may be the
`modem processor 110 or 210 of FIG. 1 or 2 with their volatile
`memory 112 or 212). In FIG. 3, an exemplary software image
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`for secondary processor 302 is stored to non-volatile memory
`of the primary processor 301. As shown in this example, the
`exemplary software image 303 is a multi-segment image that
`includes an image header portion and multiple data segments
`(shown as data segments 1-5 in this example). The primary
`processor 301 and secondary processor 302 may be located
`on different physical silicon chips (i.e. on a different chip
`package) or may be located on the same package.
`In the first stage of the exemplary loading process of FIG.
`3, the image header information is transferred to the second-
`ary processor 302. The primary processor 301 retrieves the
`data image segments, beginning with the image header, from
`non-volatile memory of the primary processor 306. The pri-
`mary processor 301 parses the image header to load indi-
`vidual image segments from non-volatile memory of the pri-
`mary processor 306 to system memory of the primary
`processor 307. The image header includes information used
`to identify where the modem image executable data is to be
`eventually placed into the system memory of the secondary
`processor 305. The header information is used by the second-
`ary processor 302 to program the scatter loader/direct
`memory access controller 304 receive address when receiv-
`ing the actual executable data. Data segments are then sent
`from system memo