`
`by
`Christopher James Kellogg
`
`Submitted to the Department of Electrical Engineering and Computer Science
`in partial fulfillment of the requirements for the degrees of
`
`Bachelor of Science
`and
`Master of Science in Computer Science
`
`at the
`
`MASSACHUSETTS INSTITUTE OF TECHNOLOGY
`
`May 1993
`
`@ Christopher James Kellogg, MCMXCIII. All rights reserved.
`
`The author hereby grants to MIT permission to reproduce and to distribute copies
`of this thesis document in whole or in part, and to grant others the right to do so.
`
`th0
`
`Au
`
`Signature redacted
`r ................ D~~~;~~~~~ ~f· El~~~;i~-~. E~~i~~;Jn; ~~~;-/&i~~~~
`/ I / I
`rn
`, 1993
`Signature redacted
`
`.
`
`Certified by .................................. -.-. .,,,. •... •v •••••• - ••••••.•.•.........
`Alex P. Pentland
`Associate Professor of Media Arts and Sciences
`_ Thesis Supervi§or
`Signature redacted
`Certified by ...................................... ·r ·. · · · · · ... 7· · -~· · · · · · · · · ..... ·, -· r
`Bruce E. Flinchbaugh
`Manager of Image Understandin~ra~h a~ruments
`/7 / // / ~s/supervisor
`- >-:? /J
`
`Signature redacted
`Accepted by ........... ~;.·;;/·')' ·07·. ·r· ·tr .. ·c~~~l;~ii 'i.· ~i~~r"i~
`
`Ch7on, ~rtmental Committee on Graduate Students
`
`MASSACHUSETTS INSTITUTE
`OF TFr.HNOLOGY
`rJUL 09 1993
`LIBRARIES
`ARCHIVE~
`
`
`
`Visual Memory
`
`by
`
`Christopher James Kellogg
`
`Submitted to the Department of Electrical Engineering and Computer Science
`on April 21, 1993, in partial fulfillment of the
`requirements for the degrees of
`Bachelor of Science
`and
`Master of Science in Computer Science
`
`Abstract
`
`Visual memory supports computer vision applications by efficiently storing and re(cid:173)
`trieving spatiotemporal information. It is a unique combination of databases, spatial
`representation and indexing, and temporal representation and indexing. This the(cid:173)
`sis designs a visual memory architecture that meets the requirements of a number
`of computer vision applications. It also presents an implementation of part of this
`design in support of a scene monitoring prototype.
`
`Thesis Supervisor: Alex P. Pentland
`Title: Associate Professor of Media Arts and Sciences
`
`Thesis Supervisor: Bruce E. Flinchbaugh
`Title: Manager of Image Understanding Branch at Texas Instruments
`
`
`
`Acknowledgements
`
`My primary thanks goes to my two thesis supervisors, Bruce Flinchbaugh at Texas
`
`Instruments and Sandy Pentland at MIT. Bruce pointed me to the visual memory
`
`project that he was starting and guided my research at Texas Instruments. Sandy
`
`provided useful feedback throughout the research stage. They were both very helpful
`
`in critiquing the thesis document.
`
`I'd also like to thank the other people at Texas Instruments who helped me with
`
`this project. Steve Ford and Tom Bannon were especially helpful in developing the
`
`visual memory design. In addition, I don't think I would have survived the bugs in
`
`PC++ without Steve's expertise. Tom Bannon and Tom O'Donnell provided a nice
`
`tracking system with which to test the visual memory prototype.
`
`Finally, I'd like to thank my family, Fred, Jeannette, and Mark Kellogg, my fiancee
`
`Christine Bailey, and my brothers at Phi Kappa Sigma for their support throughout
`
`my MIT career.
`
`3
`
`
`
`Contents
`
`1 Introduction
`
`1.1 Needs for Visual Memory.
`1.2 Goals .. ..........
`
`2 Background
`2.1 Database Research ......
`2.1.1 DARPA Open OODB
`
`2.1.2 POSTGRES.
`
`2.2 Spatial Research
`
`2.2.1 CODGER
`
`2.2.2 Core Knowledge System
`
`2.2.3
`
`ISR . . . . . . . . . . . .
`
`2.2.4
`Image Understanding Environments .
`2.2.5 PROBE ....
`2.2.6 Spatial Indices
`
`2.3 Tern poral Research
`2.3.1 TQuel ...
`2.3.2 Temporal Sequences
`
`2.3.3 Temporal Sets .
`
`2.3.4 Relative Time .
`
`2.3.5 Temporal Indices
`
`4
`
`9
`
`9
`
`10
`
`11
`
`11
`
`11
`
`12
`
`13
`
`13
`
`13
`
`14
`
`14
`
`14
`
`15
`
`15
`
`15
`
`16
`
`16
`
`17
`
`17
`
`
`
`3 Design
`
`3.1 Requirements and Considerations
`
`3.1.1 Database Considerations
`
`3.1.2 Spatial and Temporal Considerations
`
`3.1.3 Performance Considerations
`3.2 Design Overview ....
`3.3 Spatial Representations .
`
`3.3.1 Core Spatial Classes
`
`3.3.2 Relative Spatial Specification
`
`3.3.3 Uncertain Spatial Specification
`3.4 Temporal Representations ..
`3.4.1 Core Temporal Classes
`
`3.4.2 Relative Temporal Specification
`
`3.4.3 Uncertain Temporal Specification
`
`3.5 Spatiotemporal Representations
`
`3.6 Object Storage
`
`3.6.1
`
`Identity
`
`3.6.2 Storage Mechanism
`
`3.6.3 Time
`
`3.7 Queries . . .
`
`3.7.1 Query Mechanism .
`
`3.7.2 Spatial Queries
`
`3.7.3 Tern poral Queries
`
`3.7.4 Spatiotemporal Queries .
`
`3.8
`
`Indices . . . . . . .
`
`3.8.1 Mechanism
`
`3.8.2 Spatial Indices
`
`3.8.3 Temporal Indices
`
`3.8.4 Spatiotemporal Indices
`
`5
`
`18
`
`19
`
`19
`
`20
`
`20
`
`22
`
`24
`
`24
`
`29
`
`31
`
`36
`
`36
`
`40
`
`41
`
`45
`
`50
`
`50
`
`51
`
`52
`
`53
`
`53
`
`54
`
`57
`
`59
`
`64
`
`64
`
`65
`
`66
`
`66
`
`
`
`4 Implementation
`4.1 Database . . . ..........
`4.2 Spatiotemporal Representations
`
`4.3
`
`Indices . . . . . . .
`
`4.3.1 Mechanism
`
`4.3.2 Spatial Indices
`
`4.3.3 Temporal Indices
`
`4.4 Queries.
`
`4.5
`
`Input ..
`
`4.6 Graphical Query Interface
`
`5 Performance
`
`5.1 Spatiotemporal Object Storage and Retrieval .
`Index Comparison . ...............
`
`5.2
`
`6 Conclusion
`
`68
`
`68
`
`71
`
`71
`
`71
`
`72
`
`75
`
`77
`
`77
`
`78
`
`81
`
`82
`
`83
`
`88
`
`6
`
`
`
`List of Figures
`
`3-1 Spatial objects
`
`3-2 Discrete point set
`
`3-3 Abstract point set .
`
`3-4 Coordinate systems
`
`3-5 Relative spatial objects
`
`3-6 Breaking a relative spatial specification, part 1
`
`3-7 Breaking a relative spatial specification, part 2 .
`
`3-8 Uncertain edges . .
`
`3-9 Uncertain location
`
`3-10 Conflicting information
`
`3-11 Temporal element ...
`
`3-12 Overlapping temporal elements
`
`3-13 Temporal resolution in favor of version A
`
`3-14 Temporal resolution in favor of version B
`
`3-15 Relative temporal specification .
`
`3-16 Probabilistic temporal interval .
`
`3-17 Overlapping probabilistic temporal intervals
`
`3-18 Probabilistic conjunction by minimization
`
`3-19 Probabilistic disjunction by maximization
`
`3-20 Discrete spatiotemporal information .
`
`3-21 Interpolated spatiotemporal state
`
`3-22 Point set trajectory . . . . . .
`
`3-23 Coordinate system trajectory
`
`7
`
`25
`
`26
`
`27
`
`28
`
`30
`
`32
`
`32
`
`33
`
`34
`
`35
`
`38
`
`38
`
`39
`
`39
`
`40
`
`42
`
`43
`
`43
`
`44
`
`46
`
`46
`
`47
`
`48
`
`
`
`55
`
`58
`
`60
`
`61
`
`62
`
`69
`
`73
`
`74
`
`74
`
`76
`
`76
`
`78
`
`79
`
`80
`
`82
`
`84
`
`85
`
`86
`
`87
`
`3-24 Spatial queries ..
`
`3-25 Temporal queries
`
`3-26 States of a spatiotemporal object
`
`3-27 Joint spatial and temporal queries .
`
`3-28 Spatiotemporal queries . . .
`
`4-1 Scene monitoring prototype
`4-2 Fixed grid spatial index ..
`4-3 Segmented space for bucket PR quadtree
`
`4-4 Data structure for bucket PR quadtree
`
`4-5 Temporal segment tree
`
`4-6 Temporal B+ tree . . .
`
`4-7 Graphical query interface viewing region
`
`4-8 Specification of query times and classes
`
`4-9 Graphical query results . . . . . . . .
`
`5-1 Spatiotemporal update performance .
`
`5-2 Spatial update performance
`
`5-3 Temporal update performance
`5-4 Spatial query performance ..
`5-5 Temporal query performance .
`
`8
`
`
`
`Chapter 1
`
`Introduction
`
`Visual memory supports computer vision applications by efficiently storing and re(cid:173)
`
`trieving spatiotemporal information. It is a unique combination of databases, spatial
`
`representation and indexing, and temporal representation and indexing. Visual mem(cid:173)
`
`ory provides representational flexibility and high-performance information access to
`
`meet the requirements of a variety of computer vision applications.
`
`1.1 Needs for Visual Memory
`
`Applications use spatiotemporal data in many different ways and place many different
`
`demands on a visual memory. Studying possible uses helps to clarify the concept of
`
`a visual memory and to identify the functionality it provides.
`
`Visual memory could serve as the repository for static information, such as ob(cid:173)
`
`ject descriptions, maps, and environment models, that applications reference during
`
`execution. For example, a vehicle navigator could store maps and images to help it
`
`later recognize its location. A large amount of such information could be established
`
`prior to application execution, and the visual memory would subsequently provide an
`
`application with efficient access to desired pieces of information.
`
`An application could store dynamic information in the visual memory. For ex(cid:173)
`
`ample, a vehicle navigator's input systems could maintain in the visual memory a
`
`description of the vehicle's local environment, updating it as the vehicle moved. The
`
`9
`
`
`
`visual memory could provide the navigator's planning processes with information
`
`about the vehicle's latest state and could analyze its progress to help determine a
`
`course of action. The high performance of the visual memory allows it to handle the
`
`frequent updates and queries needed by such dynamic, real-time systems.
`
`Visual memory could manipulate spatiotemporal information about objects and
`
`collections of objects too large to fit into volatile memory. For example, a computer(cid:173)
`
`aided design and modeling system could use the visual memory in building up a large
`
`design layout and simulating its execution over time; a photo interpretation system
`
`could similarly construct in the visual memory a complex representation of a scene.
`
`The visual memory would retrieve into main memory only a manageable part of a
`
`large representation at a time.
`
`Visual memory could act as the interface between inputs and applications m a
`
`computer vision system. For example, computer vision algorithms for a security
`
`system could analyze data provided by various cameras and store information in the
`
`visual memory. Applications could then retrieve this data to track objects, watch for
`
`suspicious events, and respond to user queries. The visual memory would coordinate
`
`the information from its inputs and eliminate the need for full connectivity between
`
`inputs and applications.
`
`Finally, visual memory could serve as a means for data transfer. A computer
`
`vision application could store spatiotemporal information in the visual memory for
`
`other applications to retrieve at any time in the future. To run comparative studies,
`
`different algorithms could use common data stored in the visual memory.
`
`1.2 Goals
`
`This thesis explores visual memory design and implementation. The primary goal
`
`of the thesis is to design a visual memory architecture that meets the requirements
`
`of various computer vision applications. A secondary goal is to implement a visual
`
`memory prototype to support a real-time scene monitoring prototype.
`
`10
`
`
`
`Chapter 2
`
`Background
`
`Visual memory builds on research in database design, spatial representation and
`
`indexing, and temporal representation and indexing. While there has been significant
`
`research in each of these areas, no previous project has combined them in this manner.
`
`The visual memory design uses knowledge gained from research projects in all these
`
`areas. This chapter summarizes and discusses some especially relevant projects.
`
`2.1 Database Research
`
`Visual memory must address concerns that a great deal of database research has
`
`already investigated. It must provide everything from information storage techniques
`
`to concurrency control for multiple inputs and outputs. Visual memory should build
`
`on the results of research into these topics. Presented here are two databases that
`
`address a number of the issues important to visual memory and that could be the
`
`basis for a visual memory system.
`
`2.1.1 DARPA Open OODB
`
`The DARPA Open Object-Oriented Database (Open OODB) project at Texas In(cid:173)
`
`struments outlines an extensible architecture that allows " ... tailoring database func(cid:173)
`
`tionality for particular applications in the framework of an incrementally improvable
`
`system .... " [25] The architecture meets functional requirements such as an object
`
`11
`
`
`
`data model and concurrent access, along with "meta requirements" including open(cid:173)
`
`ness and reusability. The open architecture lets separate modules handle extensions
`
`to the basic storage mechanism. These extensions cover standard database issues
`
`such as transactions, versions, and queries.
`
`The Open OODB architecture is very suitable for visual memory. The object(cid:173)
`
`oriented model can flexibly and intuitively represent the information used by computer
`
`vision applications. Following the Open OODB architecture, visual memory could
`
`avoid confronting standard database issues by letting other modules support those
`
`features. Instead, visual memory would consist only of those extensions necessary to
`
`support efficient manipulation of spatiotemporal information. If new features were
`
`needed, extra modules could easily be added to the architecture.
`
`2.1.2 POSTGRES
`
`The POSTGRES database [23] expands the relational database model to meet the
`
`needs of complex applications. Because it builds on traditional relational databases, it
`
`provides a number of standard features, such as transactions, a query language, and
`
`recovery processing. In addition, it allows applications to specify new data types,
`
`operators, and access methods. POSTGRES supports active databases and rules,
`
`letting applications set up daemons in the database that react to changes in the data.
`
`A versioning mechanism keeps track of old data and works with the query language
`
`to let applications retrieve this information. Finally, the POSTGRES storage server
`
`can "vacuum" old data onto archival media.
`
`POSTGRES supplies many features useful to a visual memory, such as transac(cid:173)
`
`tions, queries, and application-defined access methods. However, the relational model
`
`might not be sufficiently expressive to meet the representational needs of complex
`
`computer vision applications. In addition, the POSTGRES design does not support
`
`application-specific extensions to the database, so it would be hard for the visual
`
`memory to expand to meet future requirements.
`
`12
`
`
`
`2.2 Spatial Research
`
`There are many ways to describe spatial objects and to handle their storage and
`
`retrieval. Visual memory must consider how well different spatial models meet the
`
`representational needs of computer vision applications and how efficiently information
`
`in these models can be stored and retrieved.
`
`2.2.1 CODGER
`
`Researchers at Carnegie Mellon University developed the CODGER (COmmunica(cid:173)
`
`tions Database with GEometric Reasoning) "whiteboard" database and communica(cid:173)
`
`tion system to support the autonomous NAVLAB vehicle [20]. CODGER stores data
`
`to be communicated among the various modules that control vehicle navigation. It
`
`represents this information as tokens consisting of attributes and values.
`
`CODGER uses a fairly simple spatial model. Token attributes represent basic
`
`spatial information such as position and object extent. The tokens support some
`
`standard geometric operations like area calculation. A query mechanism can answer
`
`some spatial queries like the proximity query "Return the tokens with location within
`
`5 units of ( 45,32)." CODGER does not provide an indexing mechanism, and spatial
`
`operations and queries are performed in memory.
`
`2.2.2 Core Knowledge System
`
`The Core Knowledge System (CKS) [24], developed at SRI International, stores in(cid:173)
`
`formation for a robot. Like CODGER, it encodes this information as attribute-value
`
`tokens. CKS introduces special support for the uncertainty that results from incon(cid:173)
`
`sistent or incomplete information provided to the database. Its query mechanism
`
`includes keywords such as apparently and possibly to discern multiple opinions. Since
`
`spatial information is often imprecise, this support for uncertainty would be very use(cid:173)
`
`ful in a visual memory context. However, CKS does not provide any special spatial
`
`operations or query constructs.
`
`13
`
`
`
`2.2.3
`
`ISR
`
`The ISR project at the University of Massachusetts at Amherst [3] defines a spatial
`
`representation (the Intermediate Symbolic Representation) and a management system
`
`for accessing data represented this way. The intermediate symbolic representation
`
`includes tokens for basic spatial objects such as lines, regions, and sets of parallel
`
`lines, but not for higher-level spatial objects such as people and vehicles. The data
`
`management system manipulates these tokens in an efficient manner. Applications
`
`built with ISR perform classification and in-memory spatial indexing.
`
`2.2.4
`
`Image Understanding Environments
`
`The Image Understanding Environments (IUE) program [16] specifies a spatial rep(cid:173)
`
`resentation to meet the needs of a wide variety of computer vision applications. An
`
`IUE spatial object is defined by a set of points; this point set can be concrete ( a list
`
`of all the points) or abstract ( an equation defining the points in the object). IUE
`
`spatial objects are manipulated through set operations - complex objects can be con(cid:173)
`
`structed through conjunction and disjunction of point sets. In addition to its point
`
`set, each spatial object also defines a bounding box, a centroid, and other attributes
`
`for different, and perhaps more efficient, methods of spatial manipulation. The IUE
`
`specification only briefly discusses data transfer and does not provide database sup(cid:173)
`
`port for storage and retrieval of spatial information.
`
`2.2.5 PROBE
`
`The PROBE database [15], developed at the Computer Corporation of America,
`
`extends an object-oriented database management system to meet the requirements
`
`of a variety of computer vision applications.
`
`It implements a number of spatial
`
`data types and supports operations on sets of points. It outlines a query language
`
`with some support for spatial queries. To provide more efficient spatial access, it
`
`also provides what the authors call approzimate geometry, a limited form of spatial
`
`indexing.
`
`14
`
`
`
`2.2.6 Spatial Indices
`
`A large number of spatial data structures can provide efficient access to spatial in(cid:173)
`
`formation. Samet [18] describes a number of these, including quadtrees, hash tables,
`
`grid files, range trees, and R trees. Each index is specialized for specific storage and
`
`retrieval characteristics; visual memory would benefit from including a number of
`
`different indices to efficiently manipulate data for different applications.
`
`2.3 Temporal Research
`
`Databases manipulate two different types of time: transaction time, specifying when
`
`updates for events are stored in the database, and valid time, specifying when events
`
`actually happen. Rollback databases implement transaction time, historical databases
`
`implement valid time, and temporal databases implement both. Sometimes historical
`
`and rollback databases are informally called temporal databases to indicate their con(cid:173)
`
`cern with time. Since the computer vision applications discussed in the Introduction
`
`are concerned with the times at which events happen, visual memory should be a
`
`historical database.
`
`A number of different historical and temporal databases represent and store tem(cid:173)
`
`poral information. Each addresses a different set of concerns, and some designs suit
`
`visual memory better than others. The following research projects address many of
`
`the issues that visual memory must consider.
`
`2.3.1 TQuel
`
`The temporal database TQuel [21] is a temporal extension to a relational database.
`
`TQuel associates with each database record the slots valid-from and valid-to, defining
`
`an interval during which the record is valid. For example, the Employees relation
`
`might have three records for Frank, one valid from O to 1/1/93, another valid from
`
`1/1/93 to 5/7 /93, and a third valid from 5/7 /93 to oo. If Frank were changed on
`
`8/7 /93, then the third record's valid-to slot would be changed to 8/7 /93, and a new
`
`15
`
`
`
`record valid from 8/7 /93 to oo would be added.
`
`TQuel extends the query language Quel [12] to support temporal access of records.
`
`A temporal query specifies an interval of interest; the database retrieves any record
`
`whose valid interval overlaps that interval. A query can also ask for records before,
`
`after, or as of a given moment. TQuel provides operators such as overlaps and eztend
`
`to form complex query intervals.
`
`2.3.2 Temporal Sequences
`
`The temporal database outlined in [19] models object state changes with temporal
`
`sequences. A temporal sequence can be discrete or continuous; for example, sales
`
`per month could be modeled as a discrete temporal sequence, while the voltage in
`
`alternating current could be modeled as a continuous temporal sequence. A temporal
`
`sequence is always represented by a set of state snapshots; interpolating functions
`
`estimate continuous sequences. Characteristics such as granularity and regularity of
`
`state snapshots define each temporal sequence. Functions including selection, ag(cid:173)
`
`gregation, and accumulation operate on sets of time sequences. The database also
`
`includes a powerful SQL-like [1] query language for retrieving temporal sequences.
`
`2.3.3 Temporal Sets
`
`Researchers at the University of Houston proposed some temporal additions [8] to
`
`the Extended Entity-Relationship Model. The basic temporal representation in this
`
`temporal model is a finite union of time intervals; for example, a particular state
`
`could be valid during the set of time intervals { [50,60), [90,230), [231,239) }. The
`
`database stores with each object a temporal element denoting its valid time. Basing
`
`temporal representation on sets of intervals preserves closure under set operations
`
`and provides a standard means for manipulating temporal information and querying
`
`the database.
`
`This model was later augmented to better represent temporal uncertainty [13].
`
`The extended model preserves the definition of a temporal element but modifies the
`
`16
`
`
`
`definition of a temporal interval. Each endpoint in an interval specifies a valid time
`
`method that returns an ordered set of time points. The endpoint belongs to this set,
`
`but in order to allow for uncertainty, it is not explicitly specified. The model also
`
`modifies the standard set operations to manipulate uncertain temporal elements.
`
`2.3.4 Relative Time
`
`Some applications, such as computer-aided design systems, know how events are
`
`ordered but not the actual times of the events. Chaudhuri [5) proposes a temporal
`
`model to handle these cases. This model represents time as a graph rather than as
`
`a time line. Events are ordered with binary relations like before and simultaneously.
`
`These relations must obey properties such as transitivity and antisymmetry so that
`
`the database can navigate through a graph and infer additional relationships. The
`
`model supports temporal queries about event relations; for example, a query could
`
`ask for a lower time bound on an event or for common ancestors of two events. This
`
`capability could be useful in a visual memory to support efficient handling of temporal
`
`information for some applications.
`
`2.3.5 Temporal Indices
`
`Much of the spatial indexing research also applies to temporal indexing. For example,
`
`interval trees can store intervals in space or in time. To handle more complex, spe(cid:173)
`
`cialized temporal representations, however, requires additional research. Some of the
`
`databases described above provide their own temporal indices; [22) references many
`
`other systems with temporal indices.
`
`17
`
`
`
`Chapter 3
`
`Design
`
`This chapter presents a design for a visual memory system. It examines reqmre(cid:173)
`
`ments and considerations that the design must take into account. It discusses key
`
`visual memory topics such as representation and indexing of spatial, temporal and
`
`spatiotemporal information. This chapter outlines a concrete, implementable system;
`
`the next chapter presents the prototype implementation of this design.
`
`18
`
`
`
`3.1 Requirements and Considerations
`
`The design of a visual memory must address a number of concerns. Some of these
`
`come from anticipated uses of the visual memory, while others are common themes
`
`in spatial, temporal, and database research. This section covers a number of these
`
`requirements and considerations.
`
`3.1.1 Database Considerations
`
`One database issue relevant to visual memory is how to represent and store informa(cid:173)
`
`tion. There are several standard models, including the relational model, the entity(cid:173)
`
`relationship model, and the object-oriented model. The visual memory should use an
`
`object-oriented model to meet the broad representational requirements of a variety of
`
`applications. An object-oriented approach is intuitive and highly extensible, allowing
`
`applications to define new, complex objects at any time.
`
`Another important consideration is concurrency control. The visual memory must
`
`be able to handle multiple, dynamic inputs and outputs. For example, in a scene(cid:173)
`
`monitoring system, many different cameras could update the visual memory simulta(cid:173)
`
`neously. The visual memory must ensure data consistency.
`
`Much database research involves well-defined program interfaces, including ex(cid:173)
`
`plicit storage mechanisms and query algebras. Applications using the visual memory
`
`do not need to know how it achieves its results, but they should know what results to
`
`expect. For example, performance-enhancing measures such as indexing and caching
`
`do not affect the objects returned by a query and can be added without affecting the
`
`query algebra.
`
`Recoverability is another database issue important to some visual memory ap(cid:173)
`
`plications. The visual memory must work to guarantee that, even in the case of a
`
`system crash, it does not lose stored information. In addition, it must be able to
`
`remove inconsistencies resulting from system failure during information storage.
`
`19
`
`
`
`3.1.2 Spatial and Temporal Considerations
`
`The purpose of visual memory is to store information about the history of a visual
`
`environment. Visual memory is not just a generic database -
`
`it must have spa(cid:173)
`
`tiotemporal concerns at the heart of its design.
`
`A visual memory must provide representational flexibility. Rather than forcing
`
`one spatiotemporal representation on all applications, the visual memory should be
`
`tailorable to an application's needs. Applications can trade off between representa(cid:173)
`
`tional power and performance.
`
`A visual memory must handle dynamic objects. Some computer vision applica(cid:173)
`
`tions need to update spatial information in response to changes in the environment.
`
`The visual memory must define spatiotemporal representations to effectively handle
`
`such changes. It must provide a versioning mechanism to store and retrieve different
`
`state snapshots of objects.
`
`A visual memory must provide a flexible, expressive query mechanism with exten(cid:173)
`
`sive spatiotemporal support. This query mechanism should support a wide variety of
`
`spatiotemporal queries. For example, a security system might ask the visual memory
`
`to retrace a person's path over the past five minutes, a vehicle navigator might ask
`
`it to watch for objects entering the field of view, and a CAD system might ask for
`
`simulation results for everything electrically connected to a specific chip. The visual
`
`memory should let applications conveniently express such queries.
`
`3.1.3 Performance Considerations
`
`High performance is one of the key requirements for a visual memory. Some visual
`
`memory applications, such as a vehicle navigator, need to store and retrieve infor(cid:173)
`
`mation very quickly. Many spatial and temporal models in the literature are very
`
`expressive but do not provide the necessary information throughput. A visual mem(cid:173)
`
`ory must be both expressive and fast enough to meet the demands of its applications.
`
`Indexing can help a visual memory achieve high performance by quickly identifying
`
`objects satisfying given constraints. Visual memory indices should be conservative,
`
`20
`
`
`
`never mistakenly omitting objects that satisfy a query. In this manner, indices can
`
`improve query performance but are guaranteed to not affect the results.
`
`A visual memory must provide a variety of indices to meet the needs of different
`
`applications. For example, a real-time scene monitoring system could set up an index
`
`to track the centroids of moving objects, while a photo interpretation system could
`
`index the areas covered by objects. A visual memory indexing mechanism should be
`
`extensible, handling additional application-defined indices.
`
`A visual memory must let applications control which objects are indexed. For
`
`example, an application could establish one index on all objects, another index on
`
`everything in the current session, and yet another index only on certain objects of
`
`interest. This would prevent the visual memory from wasting time and space updating
`
`unimportant indexing information.
`
`Caching and look-ahead techniques can increase the performance of a visual mem(cid:173)
`
`ory. Caching improves storage performance by not requiring the visual memory to
`
`wait for information to be written to disk. Both caching and look-ahead improve
`
`retrieval performance by reducing the number of disk accesses.
`
`Visual memory performance can be increased by letting applications tailor the
`
`visual memory to their specific requirements. For example, some applications can
`
`afford to lose a small amount of data, so they could eliminate recoverability infor(cid:173)
`
`mation. Other applications could optimize specific storage and retrieval cases; for
`
`example, a vehicle navigator could optimize its real-time performance by sacrificing
`
`some historical performance.
`
`21
`
`
`
`3.2 Design Overview
`
`The visual memory design consists of a set of extensions to an open database architec(cid:173)
`
`ture like DARPA Open OODB [25]. An open architecture allows the visual memory
`
`to add spatiotemporal customizations to the database. The visual memory can take
`
`full advantage of other modules implementing features such as concurrency control,
`
`caching, and versioning, without having to handle these capabilities directly.
`
`The visual memory design follows the object-oriented model discussed in the previ(cid:173)
`
`ous section. A class hierarchy defines representations for spatiotemporal information.
`
`Abstract superclasses define the interfaces for manipulating spatiotemporal informa(cid:173)
`
`tion, and their subclasses extend the definitions to represent more specific types of
`
`objects. This document denotes classes in italics; for example, SpatialObject is the
`
`class representing spatial objects. A concrete member of this class is referred to as
`
`"a SpatialObject instance" or informally just as "a spatial object."
`
`The visual memory design specifies a number of classes for representing spatiotem(cid:173)
`
`poral information. These classes provide methods through which computer vision
`
`applications and the visual memory can manipulate them. For example, the spatial
`
`class Square could include a method to return its area, the temporal class Temporal(cid:173)
`
`Interval could have a method to determine its duration, and the spatiotemporal class
`
`Person could implement a method plotting its space-time trajectory. Applications
`
`can design their own classes inheriting from these classes and extending them to meet
`
`additional needs.
`
`The visual memory design extends the database's storage mechanism. It provides
`
`a mechanism for object identity and maintains a history for each object. Each version
`
`of an object specifies when it was valid, and the visual memory can manipulate
`
`versions based on valid time. The design lets applications customize the database
`
`storage server based on characteristics of the data they typically store.
`
`The visual memory design extends the database's query mechanism to provide
`
`spatiotemporal support. The additional spatiotemporal constructs allow computer
`
`vision applications to :flexibly and expressively specify objects of interest.
`
`22
`
`
`
`To achieve suitable query performance, the visual memory provides spatiotempo(cid:173)
`
`ral indices that can efficiently identify objects satisfying query conditions. A visual
`
`memory index is an object that maintains information about other objects, allowing
`
`it to efficiently indicate those objects that meet certain constraints. For example, a
`
`visual memory spatial index might store object