`Jean-Marie Normand, Myriam Servières, Guillaume Moreau
`
`To cite this version:
`Jean-Marie Normand, Myriam Servières, Guillaume Moreau. A new typology of augmented reality
`applications. AH ’12 Proceedings of the 3rd Augmented Human International Conference, Mar 2012,
`Megève, France. 10.1145/2160125.2160143. hal-01521375
`
`HAL Id: hal-01521375
`https://hal.archives-ouvertes.fr/hal-01521375
`Submitted on 24 Nov 2017
`
`HAL is a multi-disciplinary open access
`archive for the deposit and dissemination of sci-
`entific research documents, whether they are pub-
`lished or not. The documents may come from
`teaching and research institutions in France or
`abroad, or from public or private research centers.
`
`L’archive ouverte pluridisciplinaire HAL, est
`destinée au dépôt et à la diffusion de documents
`scientifiques de niveau recherche, publiés ou non,
`émanant des établissements d’enseignement et de
`recherche français ou étrangers, des laboratoires
`publics ou privés.
`
`Niantic's Exhibit No. 1010
`Page 001
`
`
`
`A new typology of augmented reality applications
`
`Jean-Marie Normand
`LUNAM Université, École
`Centrale Nantes
`CERMA UMR 1563. BP92101
`44321 Nantes Cedex 3,
`France
`jean-marie.normand@ec-nantes.fr
`
`Myriam Servières
`LUNAM Université, École
`Centrale Nantes
`CERMA UMR 1563. BP92101
`44321 Nantes Cedex 3,
`France
`myriam.servieres@ec-nantes.fr
`
`Guillaume Moreau
`LUNAM Université, École
`Centrale Nantes
`CERMA UMR 1563. BP92101
`44321 Nantes Cedex 3,
`France
`guillaume.moreau@ec-nantes.fr
`
`ABSTRACT
`In recent years Augmented Reality (AR) has become more
`and more popular, especially since the availability of mo-
`bile devices, such as smartphones or tablets, brought AR
`into our everyday life. Although the AR community has
`not yet agreed on a formal definition of AR, some work fo-
`cused on proposing classifications of existing AR methods
`or applications. Such applications cover a wide variety of
`technologies, devices and goals, consequently existing tax-
`onomies rely on multiple classification criteria that try to
`take into account AR applications diversity. In this paper
`we review existing taxonomies of augmented reality appli-
`cations and we propose our own, which is based on (1) the
`number of degrees of freedom required by the application,
`as well as on (2) the visualization mode used, (3) the tem-
`poral base of the displayed content and (4) the rendering
`modalities used in the application. Our taxonomy covers
`location-based services as well as more traditional vision-
`based AR applications. Although AR is mainly based on the
`visual sense, other rendering modalities are also covered by
`the same degree-of-freedom criterion in our classification.
`
`Categories and Subject Descriptors
`H.5.1 [Information Interfaces and Presentation]: Mul-
`timedia Information Systems—Artificial, augmented, and vir-
`tual realities
`
`General Terms
`Theory
`
`Keywords
`Augmented Reality, Taxonomy, Multi-modality, Degrees of
`Freedom
`
`1.
`
`INTRODUCTION
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`AH’12, March 8–9, 2012, Megève, France
`Copyright 2012 ACM 978-1-4503-1077-2/12/03 ...$10.00.
`
`1:
`Figure
`from [20].
`
`The Reality-Virtuality continuum
`
`Unlike Virtual Reality (VR) which only focuses on dis-
`playing and interacting with virtual environments, Augment-
`ed Reality (AR) aims at interweaving reality with a virtual
`world. Indeed, although AR is based on techniques devel-
`oped in VR [1] the display and interaction of an AR appli-
`cation has a degree of interdependence with the real world.
`The main challenges of AR consist of the introduction of ar-
`tificial, computer generated objects at a location specified in
`real world coordinates. This requires determining the loca-
`tion of the AR interface in the real world (and not only the
`user position with respect to the interface as in VR) and in-
`cluding artificial objects in the field of view of the observer.
`Beyond the technological challenge of this collocation prob-
`lem (also called registration by Azuma [1]), the reproduction
`of virtual objects, their fidelity and their consistency with
`the real world are still open research questions.
`Milgram et al. [19, 20], defined the well-known “Reality-
`Virtuality continuum”, cf. Fig. 1, where “Reality” and “Vir-
`tual Reality” (both being at one end of the continuum) sur-
`round “Mixed Reality” (MR), a subclass of VR technologies
`that involve the merging of real and virtual worlds. Mixed
`Reality itself is decomposed into “Augmented Reality” (AR)
`and “Augmented Virtuality” (AV). The main difference is
`that AR implies being immersed in reality and handling or
`interacting with some virtual “objects”, while AV implies
`being primarily immersed in a virtual world increased by
`reality where the user mainly manipulates virtual objects.
`Nevertheless, the boundary between the two remains tenu-
`ous and will depend on applications and uses.
`As stated in [11], “augmenting” reality is meaningless in
`itself. However, this term makes sense as soon as we refocus
`on the human being and on his perception of the world.
`Reality can not be increased but its perceptions can. We
`will however keep the term “Augmented Reality” even if we
`understand it as an “increased perception of reality”.
`In the remainder of this paper, we will give an overview of
`
`Niantic's Exhibit No. 1010
`Page 002
`
`
`
`existing AR taxonomies, discuss their specificities and limi-
`tations. Then, we will propose our own taxonomy, based on
`four criteria: (1) the number of degrees-of-freedom required
`for the tracking, (2) the visualization mode, i.e. the augmen-
`tation type used by the applications, (3) the temporal base
`of the content displayed and (4) the rendering modalities
`used by the AR application. Before drawing a conclusion,
`we will discuss the benefits and limitations of our approach
`and will use our typology to classify existing applications.
`
`2. BACKGROUND
`Even though a clear definition of augmented reality has
`not been agreed on by the community, stating whether an
`application uses some kind of augmented reality or not is
`easier to decide. What remains more difficult to achieve is
`to classify the different approaches or applications using AR
`into a meaningful taxonomy.
`Existing taxonomies differ in the criteria they use to clas-
`sify applications, we chose to divide them into:
`• technique-centered,
`• user-centered,
`• information-centered,
`• interaction-centered.
`
`Each category has its characteristics, benefits and draw-
`backs, which we will present in the following.
`2.1 Technique-centered taxonomies
`In [19, 20] the authors propose a technical taxonomy of
`Mixed Reality techniques by distinguishing the types of vi-
`sual displays used. They propose three main criteria for
`the classification: Extent of World Knowledge (EWK), Re-
`production Fidelity (RF) and Extent of Presence Metaphor
`(EPM). EWK represents the amount of information that
`a MR system knows about the environment (for example
`about where to look for interesting information in the image
`– a region of interest for tracking – or what the system should
`be looking for – the 3D model of an object). The RF crite-
`rion represents the quality with which the virtual environ-
`ment (in case of AV) or objects (in case of AR) are displayed
`ranging from wireframe object on a monoscopic display to
`real-time 3D high fidelity, photo-realistic objects. Finally,
`the EPM criterion evaluates the extent to which the
`user feels present, that is how much the user experi-
`ences presence, within the scene. As a consequence, EPM
`is minimal when the used display is monoscopic and maxi-
`mum with high-end head-mounted displays (HMD) that can
`display real-time 3D graphics and offer see-through capabil-
`ities.
`In [18], the Reality-Virtuality continuum and some of the
`elements presented in [19] lay the groundwork for a global
`taxonomy of mixed reality display integration. The classi-
`fication is based on three axis: the reality-virtuality con-
`tinuum, the centricity of the type of display used (egocen-
`tric or exocentric) and the congruency of the control-display
`mapping. The idea behind the last criterion is that, de-
`pending on the means provided and the circumstances, a
`user can effect changes in the observed scene either congru-
`ently with, or, to varying degrees, incongruently with re-
`spect to the form, position and orientation of the device(s)
`
`provided. Instinctively, a highly congruent control-display
`relationship corresponds with a natural, or intuitive control
`scheme, whereas an incongruent relationship will compel the
`user to perform a number of mental transformations in order
`to use it.
`Based on the proposal of a general architecture of an aug-
`mented reality system presented in [30], Braz and Pereira [4]
`developed a web based platform called TARCAST which
`aimed at listing and characterizing AR systems. The six
`classification criteria (i.e. the six so-called classical subsys-
`tems of an AR system) used in TARCAST are: the Real
`World Manipulator subsystem, the Real World Acquisition
`subsystem, the Tracking subsystem, the Virtual Model Gen-
`erator subsystem, the Mixing Realities subsystem and finally
`the Display subsystem. Each criterion is composed of a num-
`ber of features allowing to distinguish different AR systems.
`TARCAST uses an XML like syntax to describe each fea-
`ture for each subsystem of an AR system and offers a web
`interface which allowed users to browse the list of all AR
`systems included in TARCAST, registered users could also
`insert new TARCAST characterizations via a specific web-
`based interface. However, TARCAST does not propose ac-
`tual criteria but offers a long list of features for each system,
`hence is not really discriminative. Additionally, TARCAST
`does not seem to be maintained anymore.
`The technique-centered taxonomies presented here do not
`take into account any of the mobile AR techniques com-
`monly used nowadays. Milgram’s work was innovative at
`the time it was published but the authors could not predict
`how mobile AR would arise. Besides, we believe that pres-
`ence cannot exactly be a common discriminative criterion
`as it does not refer to the same concept in virtual and real
`worlds.
`
`2.2 User-centered taxonomies
`Lindeman and Noma [14] propose to classify AR applica-
`tions based on where the mixing of the real world and the
`computer-generated stimuli takes place. They integrate not
`only the visual sense but all others as well, since their ”axis of
`mixing location” is a continuum that ranges from the phys-
`ical environment to the human brain. They describe two
`pathways followed by a real world stimulus on its way to the
`user: a direct and a mediated one. In the direct case, a real
`world stimulus interacts through (a) the real environment
`before reaching (b) a sensory subsystem where it is trans-
`lated into (c) nerve impulses and finally transmitted to (d)
`the brain. In the case of AR applications, some computer
`graphics elements can be inserted into this path in order to
`combine the real world and the computer generated elements
`into one AR stimulus on its way to the brain. The authors
`refer to the different places (a) through (d) where computer
`generated elements can be inserted as ”mixing points”. In
`the mediated case, the real world stimulus travels through
`the environment, but instead of being sensed by the user, it
`is captured by a sensing device (e.g. camera, microphone,
`etc.). Then, the stimulus might be post-processed before
`being merged with computer generated elements and then
`displayed to the user at one of the mixing points through
`appropriate hardware (depending on the sense being stim-
`ulated). The authors state that the insertion of computer
`generated elements should happen as early as possible in the
`pathway (i.e. at the (a) mixing point) in order to take ad-
`vantage of the human sensory system which process the real
`
`Niantic's Exhibit No. 1010
`Page 003
`
`
`
`world stimulus. Based on the location of the mixing points
`in the process of a stimulus, the authors build their classifi-
`cation for each sense based on a set of existing techniques.
`Wang and Dunston [31] propose an AR taxonomy based
`on the groupware concept. They define groupware as: compu-
`ter-based systems that support groups of people engaged in
`a common task (or goal) and that provide an interface to
`a shared environment. The goal of groupware is to assist
`a team of individuals in communicating, collaborating and
`coordinating their activities. Based on generic groupware
`concepts, they isolated three main factors for classifying AR
`systems for construction use: mobility, number of users and
`space.
`Hugues et al. [11] propose a functional taxonomy for AR
`environments based on the nature of the augmented per-
`ception of reality offered by the applications and on the
`artificiality of the environment. The authors divide aug-
`mented perception into five sub-functionalities: augmented
`documentation, reality with augmented perception or un-
`derstanding, perceptual association of the real and virtual,
`behavioural association of the real and virtual, substitution
`of the real by the virtual or vice versa. The functionality
`to create an artificial environment is subdivided into three
`main sub-functionalities: imagine the reality as it could be
`in the future, imagine the reality as it was in the past and
`finally, imagine an impossible reality.
`While the first axis of the taxonomy proposed by Hugues
`et al. covers most of the goals of AR applications, the sec-
`ond axis based on the creation of an artificial environment
`is less convincing since it does not take into account any
`alteration of the “present” reality, e.g. applications such as
`Sixth Sense [21] or Omnitouch [10]. Moreover their tax-
`onomy is limited to vision based approaches and does not
`handle other modalities. The groupware taxonomy of Wang
`and Dunston only takes into account collaborative AR and
`limits itself to construction-based AR applications. Finally,
`Lindeman and Noma propose an interesting taxonomy based
`on the integration of the virtual stimuli within multi-modal
`AR applications. Nevertheless, their proposal might not be
`discriminative enough, since very different methods like mo-
`bile see-through AR can be classified in the same category
`as a projector-based AR application. Furthermore, it only
`deals with each sense individually and does not offer any
`insight on how to merge them together.
`
`2.3 Information-centered taxonomies
`In [27], Suomela and Lehikoinen propose a taxonomy for
`visualizing location-based information, i.e. digital data which
`has a real-world location (e.g. GPS coordinates) that would
`help developers choosing the correct approach when design-
`ing an application. Their classification is based on two main
`factors that affect the visualization of location-based data:
`the environment model used (ranging from 0D to 3D) and
`the viewpoint used (first person or third person perspec-
`tive to visualize the data). Based on these two criteria, the
`authors define a model-view number MV(X,Y) that corre-
`sponds to a combination of the environment model (X) and
`the perspective (Y) used. Each MV(X,Y) class offers dif-
`ferent benefits and drawbacks and the authors suggest to
`choose a class depending on the final application targeted,
`the available hardware or sensors on the targeted devices.
`In [29], T¨onnis and Plecher divide the presentation space
`used in AR applications based on six classes of presentation
`
`principles: temporality (i.e. continuous or discrete presen-
`tation of information in an AR application), dimensional-
`ity (2D, 2.5D or 3D information presentation), registration,
`frame of reference, referencing (distinction between objects
`that are directly shown, information about the existence
`of concealed objects, often using indirect visualization, and
`guiding references to objects outside the field of view that
`might be visible if the user looks towards that direction) and
`mounting (differentiates where a virtual object or informa-
`tion is displayed in the real world, e.g. objects can be hand-
`mounted, head-mounted, connected to another real object
`or lying in the world, etc.). This current work-in-progress
`taxonomy use nearly 40 publications taken from ISMAR’s
`recent conferences in order to test their taxonomy based on
`those six presentation classes.
`Suomela and Lehikoinen propose a taxonomy that can
`only be applied to location-based applications, thus oriented
`towards mobile AR. Moreover they do not tackle multi-
`modal mobile AR applications. Nevertheless, we found the
`degrees of freedom approach to be interesting and we decided
`to generalize it in our own proposed taxonomy. T¨onnis and
`Plecher propose an interesting complete taxonomy but they
`do not deal with the multi-modality that can be used in AR
`applications and some of the criteria presented are somehow
`vague (e.g. the mounting criterion).
`
`2.4 Interaction-centered taxonomies
`Mackay [15] proposed a taxonomy which is neither based
`on the technology used, nor on the functionalities nor the
`application domain. The criterion used to classify AR ap-
`proaches is rather simple: the target of the augmentation.
`Three main possibilities are listed in the paper: augment
`the user, when the user wears or carries a device to obtain
`information about physical objects; augment the physical
`object, the object is changed by embedding input, output
`or computational devices on or within it and augment the
`environment surrounding the user and the object.
`In the
`latter case, neither the user nor the object is directly af-
`fected, independent devices provide and collect information
`from the surrounding environment, displaying information
`onto objects and capturing information about the user’s in-
`teractions with them.
`This taxonomy is not very discriminative. For example,
`one can notice that every single mobile AR technique falls
`into the first category, while the last category regroups only
`projection based methods. As in most of the taxonomies
`presented here, this work does not tackle the multi-modality
`issue.
`In [7], Dubois et al. propose a framework for classifying
`AR systems and use Computer Aided Medical Intervention
`(CAMI) systems in order to illustrate their classification.
`Their approach, called OPAC, is based on four components:
`the System, the Object of augmentation, the Person (the
`user) and the Adapters (input or output devices) and dis-
`tinguish between two “main” tasks of the user depending
`on whether the task has to be performed in the real world
`(i.e.
`in AR) or in the virtual world (i.e.
`in AV). Based on
`this distinction and on Milgram and Kishino’s [19] Reality-
`Virtuality continuum, the authors propose two different con-
`tinua ranging respectively from Reality to Virtuality (R→V)
`and vice versa (V→R) where, along the V→R axis, they po-
`sition different interaction principles proposed by Fishkin et
`al. [8].
`
`Niantic's Exhibit No. 1010
`Page 004
`
`
`
`In [6], Dubois et al. propose an extension, called ASUR,
`of their previous work, where the OPAC components are
`slightly modified into Adapters, System, User and Real ob-
`ject, where inputs and outputs adapters are more clearly
`distinguished in the link they create between the System
`and the real world (composed of the User and the Real Ob-
`ject). In this paper, the authors define relationships between
`the four components that aim at helping the developers of
`such systems to reflect upon the combination of the real and
`virtual worlds as well as the boundaries between those two
`worlds, while designing mixed reality applications.
`The OPAC and ASUR methods presented by Dubois et
`al. aim at reasoning on Mixed Reality systems, thus they
`do not classify AR methods strictly speaking. Indeed, the
`components and relationships presented in their work help
`modeling the AR and AV systems, rather than characteriz-
`ing different methods and classify them into categories.
`
`3. PROPOSAL
`We now propose our own taxonomy, based on four axis:
`• the first axis is based on the number of degrees of free-
`dom of the tracking required by the application and
`the tracking accuracy that is required. Frequency and
`latency of tracking can also be taken into account.
`• the second axis represents the augmentation type, i.e.
`whether it consists in augmenting the whole world or
`whether it is linked to the user (including an artefact
`for some kind of mediation).
`• the third axis is application-based and covers the tem-
`poral base of the content displayed by the application.
`• the fourth axis covers other rendering modalities that
`go beyond visual augmented reality. It remains rather
`limited today, but it can be taken into account by the
`same degrees-of-freedom system. As a consequence,
`this last axis should be considered, as for now,
`as an optional sub-axis for the classification.
`3.1 Tracking
`The main originality of our taxonomy resides in
`this first classification axis, namely the tracking de-
`grees of freedom. With this term we do not imply
`vision-based tracking as in the classical computer
`vision sense (e.g. marker tracking or features track-
`ing) but rather tracking in a broader sense. In our
`taxonomy, tracking could be instantiated based on
`the applications requirements, for example tracking
`can be seen as user-tracking in location-based ap-
`plications where the important information is the
`position and orientation of the user in the world,
`but on the other hand in a classical vision-based ap-
`plication tracking can be indeed seen as tracking of
`a marker. Hence, we want to focus on the number of
`freedom required for localizing the “interaction de-
`vice” – which could be either the user or the camera,
`tablet, smartphone, etc., depending on the applica-
`tion – with respect to the environment.
`On this first axis, we sort applications by the number of
`degrees of freedom they require and the spatio-temporal ac-
`curacy requirements where applicable. If we look through-
`out current applications, they can be divided into 4 classes:
`
`1. 0D applications: although it is questionable whether
`these kind of applications can be considered as AR ap-
`plications, we find in this class applications that detect
`a marker (such as a QR-code [5]) and display addi-
`tional information about this marker. For this cate-
`gory of application, the displayed information has no
`relation with the real world position and orientation of
`the marker. A typical example for this kind of applica-
`tion would be to detect a QR code on an advertisement
`which will then open the manufacturer’s web page on
`your mobile device. Tracking accuracy is very limited
`since it only requires correct marker detection in one
`frame, indeed, once detected the marker is not tracked
`in the following frames. As a consequence of this lack
`of tracking, latency and update rates are no issues.
`
`2. 2D applications: this is the class for so-called Location-
`based services, i.e. applications that provide informa-
`tion about a given location, such as nearby restaurants,
`etc. Tracking accuracy is generally decametric and the
`tracking method is often an embedded-GPS (altitude
`information is not used, updates rates around 1Hz).
`A typical example of a 2D application is a Google
`Maps [9] like application which only uses a 2D map
`in order to help the user finding his way in a city.
`
`3. 2D+θ applications: this class is also for location-based
`services that include an orientation information which
`allows to show a relative direction to the user. All nav-
`igation systems are based on this principle, accuracy
`is most often metric. Note that a GPS alone cannot
`provide an orientation in static position. Orientation
`can be computed by differences between positions or
`can be given by a embedded magnetic compass as in
`modern smartphones. Required accuracy is also met-
`ric, update rates typically ranging from 1 to 10Hz. A
`typical example of a 2D + θ application is the Metro
`Paris [24] application which helps you locating nearby
`metro stations and other points of interests (restau-
`rants, bars, etc.).
`
`4. 6D applications: this last class covers what is tradi-
`tionally called augmented reality by computer vision
`scientists who usually work on tracking technologies.
`Several types of sensors can be used individually or
`all together (optical cameras, depth cameras, inertial
`sensors, etc.). Various precision classes exist depend-
`ing on application types (e.g. marker-based vs. mark-
`erless) and on the working volume size (e.g.
`indoors
`vs. outdoors) and accuracy is relative to this size. Up-
`date rates are much more critical here, a minimum
`refresh rate would be around 10Hz, and can go up to
`100Hz. At this point, continuous tracking must be
`distinguished from initial localization for which there
`exists fewer works [3, 26].
`
`We believe this axis to be very important because
`it offers a high discriminative power in terms of
`applications type since tracking is a very impor-
`tant feature in most AR applications and we con-
`sider it can determine different classes of applica-
`tions.
`Indeed, the tracking degrees-of-freedom we
`presented above allowed us to distinguish between
`generic types of AR applications, such as location-
`based, which groups a whole set of applications shar-
`
`Niantic's Exhibit No. 1010
`Page 005
`
`
`
`ing common requirements. Moreover, to the best of
`our knowledge this is the first time this classification
`criterion is proposed for a taxonomy.
`3.2 Augmentation type
`For the second axis of our taxonomy, which represents the
`use of augmentation, we distinguish between two possibili-
`ties: (a) “mediated augmentation” and (b) “direct augmen-
`tation”.
`The first one is dedicated to (active) observation appli-
`cations. It includes two main categories depending on the
`device used for the mediated observation of the environment:
`• Optical see-through (OST) applications: there are most-
`ly found in head-up displays (HUD) where they are
`mostly in the 2D+θ class (for HUDs fixed to a vehi-
`cle) or in the 6D class where optical information are
`projected on lenses of see-through glasses (or for worn
`HUDs). These applications so far remain lab proto-
`types (centimetric accuracy) or can be found in the
`army (fighter pilots helmet based displays) where they
`are used to display relative position and speed of op-
`ponents as well as some navigational aid.
`• Video see-trough (VST) applications where a device
`equipped with a back-located camera (such as a tablet
`or a smartphone) is filming the real environment and
`the video is reproduced on its display augmented with
`artificial, computer generated, images. These appli-
`cations are often called magic windows or “video see-
`through” [19]. Another metaphor called magic mirror
`is a specific case of the magic window where the cam-
`era and the screen point in the same direction (e.g. a
`front-located camera on a smartphone).
`
`The “direct augmentation” application type, also called
`Spatially Augmented Reality (SAR) [2, 25] consists in adding
`information to the real world, not simply adding informa-
`tion between the observer’s eye and the real world. This
`is achieved by using projectors that display the computer
`generated artificial images directly on top of the real world
`objects. These applications have a better potential for col-
`laborative multi-user work (even if some occlusion problems
`might appear when a user stands in front of one of the pro-
`jectors) since it is easier for the users to interact with real
`worlds objects since the visualization of the augmentation
`does not require the user to wear or to use any additional
`device. SAR applications are often large scale applications
`where the projectors usually do not move, but they can also
`be highly mobile applications such as Sixth Sense [21] or
`OmniTouch [10].
`3.3 Temporal base
`Our third axis is based on Hugues’ work [11] and is more
`application-based as it deals with the temporal base of the
`content displayed in the application. We distinguish be-
`tween:
`• < t0 applications that represent past situations such
`as archaeological applications,
`• t0 applications devoted to augmenting the world with
`present information,
`• > t0 applications that are dedicated to foreseeing the
`future state of a given location (e.g. a future building
`inserted in its environment),
`
`• ∞ applications that represent some full imaginary ap-
`plications.
`
`3.4 Rendering modalities
`The last sub-axis of our taxonomy refers to the modali-
`ties involved in AR applications. Although the visual sense
`is by far the most important when talking about AR, some
`work has been carried out in order to mix the real world and
`computer graphics images across multiple modalities [14].
`While the addition of sound in AR applications seems quite
`straightforward and common, it is much more unusual to see
`AR applications that provide with real 3D sound. Haptic
`feedback integration for augmented reality is also relatively
`common, especially for medical or training based applica-
`tions, although, for mobile AR it is difficult to be able to give
`the user a better haptic feedback than the one provided by a
`vibrator (e.g. on a mobile phone). Olfactory and gustatory
`senses are much more rarely used in AR applications [22].
`Nevertheless, we believe that multi-modality should be
`taken into account in a typology of AR-based applications,
`and that our degrees-of-freedom approach provides for the
`integration of multiple modalities. Indeed, as for sound, we
`stipulate that a simple monoscopic sound such as a signal
`represents 0D sound, stereoscopic accounts for 1D (azimuth)
`and binaural corresponds to location-based sound (distance
`and azimuth). Hence, our degrees-of-freedom based classifi-
`cation would take into account the audio modality. Nonethe-
`less, it has to be noted that in the presence of moving sound-
`generating objects or user, 3D audio real-time feedback be-
`comes very complex.
`As for the haptic modality, we propose a similar approach.
`A simple vibration, (e.g. provided by a mobile phone vibra-
`tor) is a 0D stimulus, while the use of specific devices could
`account for higher dimensions of the haptic modality. For
`example, the use of a PHANTOM [16] device would account
`for 3D haptic modality (since the basic PHANTOM has 3
`degrees of freedom haptic feedback).
`Concerning the olfactory and the gustatory modalities, we
`assume that a non-directional stimulus (or at least a stimu-
`lus whose origin cannot be determined such as an ambient
`smell) is also 0D. As gustatory senses are only touch-based
`sensors, we limit our typology here for them. If a smell di-
`rection can be identified, it is only in azimuth and we call
`it 1D. Other sensors (thermal sensors of the skin for exam-
`ple) available in the human body could also be classified this
`way. At the moment, it is technically impossible to directly
`stimulate proprioceptive sensors, they remain absent from
`our classification.
`As mentioned before, the integration of real multi-modal
`user feedback requires some extra devices that presently
`prevent them from being used in most mobile AR applica-
`tions. This is why we recommend using the rendering
`modalities criterion as a sub-axis of the taxonomy.
`Using this criterion could nevertheless be needed in
`future applications and we believe it is worth keep-
`ing it in mind.
`Collaborative AR has not yet been extensively tackled in
`the literature, of course some work exist on multi-user AR
`but so far mono-user AR is much more investigated. Mobile
`collaborative AR raises some interesting problems in terms
`of registration, update, synchronization or user interfaces of
`the current state of applications for users that could late-join
`the application.
`
`Niantic's Exhibit No. 1010
`Page 006
`
`
`
`3.5 Classifying AR applications
`In this section we illustrate our proposal by creating a
`3D representation of some representative AR applications
`within our taxonomy axis, cf. Fig. 2. In order to be able to
`create a representation, we decided not to take into account
`the multi-modal axis. As mentioned before, although multi-
`modality remains currently anecdotal in AR applications,
`we believe it may become more widely used in the future
`and that this axis remains valid. But for simplicity sakes of
`representation, we decided to focus only on the first three
`axis of our proposal, namely: tracking degrees-of-freedom,
`augmentation type and temporality.
`Corresponding to our previous descriptions, those axes
`have respectively four (0D, 2D, 2D+θ, 6D), three (OST,
`VST, SAR) and four units (< t0, t0, > t0, ∞). Each applica-
`tion classified in our taxonomy is r