Scroll to:

# Visual Coherence for Augmented Reality

https://doi.org/10.23947/2687-1653-2023-23-2-180-190

### Abstract

**Introduction. **The 2020s were marked by the emergence of a new generation of computer simulators using augmented reality. One of the promising advantages of augmented reality technology is the ability to safely simulate hazardous situations real-world. A prerequisite for realizing this advantage is to provide the visual coherence of augmented reality scenes: virtual objects must be indistinguishable from real ones. All IT leaders consider augmented reality as a next “big wave”; thus, the visual coherence is becoming a key issue for IT in general. However, it is in aerospace applications that the visual coherence has already acquired practical significance. An example is Boeing's development of an augmented reality flight simulator, which began in 2022. Visual coherence is a complex problem, one of the aspects of which is to provide the correct overall coloration of virtual objects in an augmented reality scene. The objective of the research was to develop a new method of such tinting.

**Materials and Methods. **The developed method (called spectral transplantation) uses two-dimensional spectral image transformations.

**Results. **A spectral transplantation technology is proposed that provides direct transfer of color, brightness, and contrast characteristics from the real background to virtual objects. An algorithm for automatic selection of the optimal type of spectral transformation has been developed.

**Discussion and Conclusion.** Being a fully automatic process without recording lighting conditions, spectral transplantation solves a number of complex problems of visual coherence. Spectral transplantation can be a valuable addition to other methods of providing visual coherence.

#### For citations:

Gorbunov A.L.
Visual Coherence for Augmented Reality. *Advanced Engineering Research (Rostov-on-Don)*. 2023;23(2):180-190.
https://doi.org/10.23947/2687-1653-2023-23-2-180-190

**Introduction.** Modern simulators actually by default imply the use of virtual reality (VR). The advantages of this approach are well known; therefore, we will not dwell on them, but we will note a number of significant and, more importantly, insurmountable disadvantages due to the very nature of virtual reality technology. VR is a digital, discrete technology, while the real world is continuous. Therefore, modeling the real world in VR is inevitably associated with errors, which reduces the efficiency of training. However, for training systems, an even more serious negative aspect is that human decisions are largely based on subconscious consideration of numerous details of the real picture of the world. This process is fundamentally impossible to reproduce using purely computer technologies (e.g., VR) for two reasons: we still do not know (and are unlikely to ever know) what the mechanism of the human brain is. The latest speculations on the topic of artificial intelligence only confirm this. The details of the real world taken into account when making decisions are almost infinite in number, they arise randomly and are of quite a different nature (visual, acoustic, tactile ...).

The emergence of augmented reality (AR) training systems in the 2020s reduced the severity of this problematic situation. Examples are the development by Boeing of an augmented reality pilot simulator based on the well-known R6 ATARS project, which began in the fall of 2022, as well as a similar project launched by British BAE Systems or an air traffic control training simulator from this article. All the information wealth of the world around us in AR is presented explicitly and does not require modeling. But it is needed to solve the problem of visual coherence (VC) to realize the advantages of AR associated with the parallel presence of real and virtual objects in scenes: virtual objects must be indistinguishable from real ones. This article proposes a method for solving the problem of visual coherence in the framework of a project on the development of a training system for air traffic controllers.

AR is a derivative form of VR. AR retains all the features of VR, but, in addition, as a hybrid technology, it has significant advantages arising from the parallel coexistence of virtual and real objects, which attracts the attention of developers to VC. Moreover, studies [1] show that among the negative psychophysiological consequences of using augmented reality devices, optical discomfort dominates, which occurs due to the difference in perception of real and virtual objects in the same scene due to the absence of VC. IT industry leaders see AR as the next “big wave” of revolutionary changes in digital electronics. Therefore, the VC problem is becoming a key one for IT as a whole, and these leaders show a growing interest in methods of solving it [2]. However, the problem of visual coherence has already acquired practical significance in aerospace applications. The authors encountered a VC problem when developing a training system for air traffic controllers: the rapid increase in the intensity of air traffic at airports caused an increase in the frequency of collisions of aircraft with other aircraft and airfield transport during ground maneuvering (>50 cases worldwide in 2018 before the outbreak of the pandemic). Air traffic controllers working on airport towers are not always ready to respond adequately to such emergency situations, which requires additional training. The most effective form of such training involves presenting the dispatcher with a situation of hazardous proximity of objects on the airfield, which is impossible with real objects, but can be absolutely safely implemented in augmented reality scenes. In our application, emergency situations were safely simulated using AR at a real airfield, while the virtual aircraft used should be indistinguishable from real ones.

An exhaustive overview of the known VC methods can be found in [3]. According to the author's classification, all VC methods can be divided into two main classes: with the measurement of lighting parameters, and with the assessment of lighting conditions. In the first case, a mandatory procedure is a preliminary measurement of illumination conditions, carried out with the help of special equipment. This procedure is a long and labor-intensive process. It seems to be impossible if a pre-obtained image or video of the real world is used. In the second case, the complexity of reconstructing the lighting pattern from images causes assumptions and limitations, which makes the results ambiguous. Therefore, despite the impressive results obtained by researchers using the methods mentioned in the review [3], the VC level is still often insufficient, specifically, in AR scenes with real natural landscapes under ambient lighting conditions, which are typical for aviation applications. As the review of publications below shows, there is a shortage of research of this kind.

This work was aimed at developing a universal and automatic method to provide direct transfer of color, brightness and contrast characteristics from a real background to virtual objects without digital 3D modeling, which was required in existing VC approaches. The method is based on the mathematical apparatus of two-dimensional spectral transformations, we called it “spectral transplantation”.

The key results of this study are:

- basic scheme for the spectral transplantation method, which provides a direct transfer of color, brightness and contrast characteristics from the real background to virtual objects. The method involves replacing a part of the spectrum of the image of the virtual world with the same part of the spectrum of the image of the real world, followed by an inverse transformation of the spectrum with the transplanted part;
- algorithm for automatic selection of the optimal type of spectral transformation for use in spectral transplantation.

It is important to note that VC depends on many factors: lighting, shadows, color tone, mutual reflections, surface texture, optical aberrations, convergence, accommodation, etc. Accordingly, various AR visualization techniques were used. In our case, VC is provided only for the factors of general illumination and coloring of virtual objects in AR. This is one of the VC challenges, especially for outdoor scenes. Therefore, spectral transplantation should be used in combination with other VC methods to achieve full VC.

The list of sources in [3] includes 175 positions; this review includes almost all approaches to achievements in VC (with the exception of the latter, based on neural networks, discussed below). Therefore, here we will briefly describe some characteristic examples that correspond to the mentioned basic classes.

*Measurement of lighting conditions*

Using a light probe with diffuse bands between mirror spherical quadrants, P. Debevec and others [4] demonstrated how the full dynamic color range of a scene could be reconstructed from a single exposure. Based on the image obtained with the probe, the intensity of several light sources could be estimated by solving a simple linear system of equations. The results were used to render a virtual diffuse sphere.

A. Alhakamy and M. Tuceryan [5] estimated the direction of incident light (direct illumination) of a real scene using computer vision techniques with a 360° camera attached to an AR device. The system simulated the light reflected from surfaces when rendering virtual objects. Then, the shadow parameters for each virtual object were determined.

*Assessment of lighting conditions*

S.B. Knorr and D. Kurtz [6] proposed a scheme for assessing lighting conditions in the real world based on a photo of a human face. The method was based on training a model of the type of face based on a database of faces with known lighting. The authors then reconstructed the most plausible lighting conditions in the real world in the basis of spherical harmonics for the captured face.

We should mention work [7], which described a combination of measurement and evaluation of illumination. The authors measured the reflective properties of real objects using depth maps and color images of a rotating object on a turntable using an RGB-D camera. The shape of the object was reconstructed through integrating images of the depth of the object obtained from different viewpoints. The reflectivity of an object was determined by evaluating the parameters of the reflection model from reconstructed images of shape and color.

The closest analogues of the proposed method are approaches that, like spectral transplantation, do not involve preliminary measurements of lighting and simulation of lighting conditions, scene geometry, surface reflection, and also provide for automatic processing.

Among such analogues, there are methods of color transfer from image to image. Paper [8] presented a method for automatic transferring color statistics (averages and standard deviations) from the reference image to the target image. Additional parameters were used to avoid manual processing, which was required to determine the features of color transmission in cases where images had a strong difference in the color palette. These additional parameters combined the variances of the reference and target images. The authors of the article claimed that, although manual modification of these parameters was extremely rare, it was nevertheless sometimes necessary. In addition, the statistical nature of the method raised questions about the type and scope of statistics. Also, the ability of the method to process certain types of images (containing shiny objects, shadows) was not obvious.

Xuezhong Xiao and Lizhuang Ma [9] presented an algorithm to solve the problem of color transmission reliability in terms of scene details and colors. The authors considered the preservation of the color gradient as a necessary condition for the authenticity of the scene. They formulated the problem of color transfer as an optimization problem and solved it in two stages — histogram matching and gradient-preserving optimization. A metric was proposed for an objective assessment of the efficiency of color transfer algorithms based on examples.

The advantages of the developed method, in comparison to [8][9] and their numerous analogues, are its versatility, fully automatic nature, and the ability to transfer not only color, but also all the main characteristics of the image using one simple procedure.

The proposed method uses two-dimensional spectral transformations. Various types of images are optimally described by different types of spectral transformations (“optimally” — in the sense of matching visual perception for real and virtual objects). Actively used in digital image processing since the advent of digital television are the Discrete Fourier Transform, Discrete Cosine Transform, Hadamar Transform, S-Transform, and Karhunen-Loeve Transform.

**Materials and Methods.** The scheme of the spectral transplantation method (the version using the Fourier transform [10]) is shown in Figure 1. Frames of the real world (world frame — WF) and virtual world (virtual frame — VF) are used as input data (Fig. 2). This is natural for AR “video” (when the real world is observed through a video camera). For “transparent” AR, when the real world is observed through transparent glasses, real images are captured using cameras located on AR glasses.

Fig. 1. Scheme of the spectral transplantation method.

A version using the Fourier transform

Fig. 2. Real world (WF) and virtual world (VF) video frames:*a* — WF, airport, cloudy weather; *b* — WF, airport, sunny weather; *c* — VF, virtual airplane.

WF are small fragments (<25%) of images

published on websites sydneyairport.com.au and 6sqft.com.

The goal of this method is to transfer the main characteristics of the image from WF to VF. The scheme of the method is very simple, although the operations have a large computational volume. The method is implemented in five stages (Fig. 1):

1) Selection of color (RGB) channels for WF — WFr, WFg, WFb and for VF — VFr, VFg, VFb. The RGB model is used because of its generality and correlation between channels, which is specific to this model.

2) Calculation of two-dimensional direct Fourier transform (direct Fourier transform — DFT): DFT(WFr), DFT(WFg), DFT(WFb), DFT(VFr), DFT(VFg), DFT(VFb). The DFT formula is given below:

(1)

where *c** = **R**, **G**, **B* — index for red, green and blue color image channels; *M**, **N* — row and column numbers of the pixel matrix of the transformed image; *k**, **l* — spatial frequency arguments; *x _{c}*

*(*

*m*

*,*

*n*

*)*— pixel value with spatial coordinates

*(*

*m*

*,*

*n*

*)*in channel

*c*

*;*

*X*

_{c}*(*

*k*

*,*

*l*

*)*— complex numbers.

3) This is a key stage. Transplantation of low-frequency part (LFP) is carried out between pairs of WF and VF spectra for each of the red, green and blue channels. This means that VF LFP is replaced by the corresponding WF LFP. The idea of spectral transplantation is based on the following property of DFT: the general character of the image (i.e., color hue, brightness, contrast) depends on the spatial frequencies contained in LFP (including the constant component) of its two-dimensional spectrum.

Thus, by transplanting WF LFP into VF spectrum, we transfer the main characteristics of the image from WF to VF. For this, it is more convenient to use a centered form of a two-dimensional spectrum, where the constant component is located in the center of the matrix of spectral coefficients, and the low-frequency components are symmetrically arranged around the constant component. In a centered spectrum, LFP is the central part of the DFT matrix, and LFP has the size Ml × Nl (Ml <M, Nl <N). If Ml =Nl, then the notation for the square matrix LFP can be LFP(012..F), where 0 — constant component, F — number of the largest spatial frequency in LFP matrix.

The size of the LFP for transplantation depends on the size of the transformed image (this size determines the spectral resolution) and on the volume of image characteristics that should be borrowed from WF. At this stage of research, the size of LFP is determined empirically. For example, the best visual results for 512×512-pixel images were obtained using LFP (012345).

4) Restoring RGB channels for VF using a two-dimensional reverse Fourier transform (RFT). While at this stage, the characteristics of WF and VF are mixed. As a result, RGB channels of the VF image are obtained with the main color, brightness and contrast characteristics of the WF, as well as with characteristics inherited from the original VF.

5) Restoring the corrected VF color through merging the RGB channels obtained at the previous stage, cutting out virtual objects and building an AR scene by superimposing the cut virtual objects on WF.

Obviously, if this method is used to process the WF video stream, then there is no need to calculate DFT, and, accordingly, LFP for each frame of the real world, since the main characteristics of the image are changed only with a radical change in the recorded scene. Such changes can be easily detected by jumps in the average pixel value. At these moments, it is needed to recalculate the spectral transformation for LFP.

Since various types of images are optimally described by different types of spectral transformations mentioned above, it is reasonable to develop an automatic algorithm for selecting the optimal type of transformation for use in spectral transplantation.

We propose to estimate the difference between the visual perception of VF and WF by the RMS distance *Δ* between the LFP power spectra of their images (for all color channels):

(2)

where *P _{V}* and

*P*— two-dimensional power spectra of VF and WF, respectively. For example, in the case of the Fourier transform, the formula for

_{W}*P*has the form:

(3)

We propose to determine the optimal type of spectral transformation by the proximity of the vectors *Δ* and the mean vector calculated by the criterion of the minimum sum of squares of the distances between the mean vector and the vectors *Δ* for all transformations under consideration.

Let *Δ _{j}(Δ_{j}R, Δ_{j}G, Δ_{j}B)* be the normalized vector of the distance between spectrum VF and WF LFPs for conversion

*j*. Let

*Δ*be mean vector, and

_{a}(Δ_{a}R, Δ_{a}G, Δ_{a}B)*D*— distance between

_{j}*Δ*

_{j}and

*Δ*. Then, the sum of

_{a}*S*squared distances from the vectors

*Δ*of all transformations under consideration to the mean vector is equal to:

_{j}(4)

Coordinates *Δ _{aR}, Δ_{aG}, Δ_{aB}* of the mean vector are calculated as the solution to a system of partial differential equations:

(5)

The selection of the optimal type of spectral transformation is determined by the proximity condition:

(6)

Another obvious criterion for selecting the optimal type of transformation is the length of the vectors *Δ*. However, the extremes of such a criterion may be related to the ability or inability of certain transformations to correctly detect the difference between certain types of WF and VF. Therefore, we consider the use of the mean vector as a more reliable method of selection.

Similar to the DFT calculation for WF, the optimal type of transformation is selected only once at the beginning of spectral transplantation, unless WF is changed dramatically.

**Research Results. **The proposed method was tested using the Fourier transform without selecting the optimal type of transformation. WF (real airport scene) and VF (virtual airplane model) had a size of 512=12 pixels and 24-bit colors. Two different conditions were investigated:

1) WF — photo of the airport in cloudy weather (Fig. 2 *a*);

2) WF — photo of the airport in sunny weather (Fig. 2 *b*).

In both cases, VF contained a 3D model of the aircraft shown in Figure 2 *c*. LP(0), LP(01), LFP(012), LFP(0123), LFP(01234), LFP(012345) transplants were tested. Some of the test results are shown in Figure 3. The best visual results were obtained using LFP(012345). In Figure 3, the images after spectral transplantation are intentionally shown without other VC effects (shadows, lighting, etc.) to demonstrate the pure results of this method.

Fig. 3. AR-scene: *a *— consisting of WF and VF without LFP transplantation; *b* — АР-scene after transplantation LFP(0123); *c *— AR-scene after transplantation LFP(012345)

The upper and lower rows in Figure 3 correspond to the opposite conditions for WF: light and dark WF with different shades. Experiments with any other WF will not add significantly new information since they will have conditions between those already presented in Figure 3.

Numerical simulation was carried out to demonstrate the mechanism of spectral transplantation. Figure 4 shows Fourier transplantation using a small (8×8) pixel matrix representing one of the color channels WF and VF. Such a small size of the matrix enables to clearly illustrate the transplantation procedure. In this example, the WF matrix can be associated with an image with a vertical gradient fill, and the VF matrix — with an image with a horizontal gradient fill. Another difference between WF and VF is the range of pixel values: 8-15 for WF (“lighter image”, 8 is a constant component) and 0–7 for VF (“darker image”). LFP(01') transplantation is shown, where 1' means part of the first spatial frequency component (used because of the very low resolution of the 8×8 matrix). The 3D form of the VF matrix after transplantation indicates the transfer of properties from WF to VF: the edge of the surface has risen; the first pixel has received the value of the constant WF component. This example demonstrates how, as a result of spectral transplantation, VF starts to acquire a vertical gradient and a constant component.

Fig. 4. Numerical simulation of spectral transplantation for 8×8-pixel matrices

Spectral transplantation provides several options for changing the parameters of this procedure: changing LFP size; selecting individual components of the spectrum for transplantation; using different transplant coefficients for various components to be transplanted.

Figure 5 shows the effect of transplantation with different parameters for various types of virtual objects — virtual aircraft models that differ in surface texture, markings, and gloss. Figures 5 *a* and *b* depict a virtual airplane with complex textures, text symbols and reflections of virtual light sources. Figures 5 *d*, *e* and *f* show a virtual plane with simple contrasting colors. Parts *a *and *d* contain virtual objects without transplantation; *b* and *e* contain virtual objects after LFP transplantation(0123); *c* and *f* contain virtual objects after LFP transplantation(012345). Virtual objects are intentionally shown without other VC effects (shadows, lighting, etc.) to demonstrate the pure results of the method.

Fig. 5. Scenes with cloudy WF: *a*, *d* — AR-scenes consisting of WF and VF without LFP transplantation;*b*, *d* — AR-scenes composed after LFP transplantation (0123); *c*, *f* — AR-scenes composed after LFP transplantation (012345)

It is important to emphasize that the presented figures illustrate the possibilities of tuning the proposed method, and not the final result, since it requires tuning to specific WF. The demonstration of well-executed, but incomplete results, as is often practiced in VC works, does not seem correct to us.

**Discussion and Conclusions.** The key complicating factor for the described method, presented in Figure 1, is the high computational costs. The most promising way to solve this problem is to directly convert WF LFP parameters into VF rendering parameters. This eliminates the cumbersome procedures of three DFT and three RFT calculations at the second and fourth processing steps, and requires only three WF DFT calculations, once for each section of the WF flow without significantly changing the basic characteristics of WF. This approach provides processing the VF flow in real time.

Another problem is selecting the optimal LFP size. As the volume of spatial frequencies used increases, they begin to hold information about the WF contents. Therefore, limiting the size of LFP is needed to eliminate the effect of a hybrid image [11]. The complexity of the optimal selection is conditioned by its association with both the LFP size and the nature of the image. Recent advances in deep learning suggest that a new approach related to visual coherence through spectral transplantation could be the use of generative adversarial networks (GAN) to transmit realistic lighting information from the source image to the target image in the same way that GAN do to transmit image style. In particular, it would be interesting to compare the performance of GAN in the case of data sets consisting of either RGB images or images represented in the frequency domain using DFT. We believe that the latter approach will help to select the optimal LFP. GAN are already widely used in VC study [2] as are neural networks in general [12].

In further research related to the topic of this paper, the following issues will be considered:

- automatic determination of the optimal LFP size for transplantation with a given volume of characteristics borrowed from WF;
- automatic detection of the exact moments when it is required to calculate new WP LFP for transplantation when processing WF video in real time (as mentioned above, this must be done if the basic characteristics of WF are radically changed);
- using the same approach in reverse (from virtual to real) to apply virtual lighting to real scenes (how virtual lighting affects the environment).

As a fully automatic process without measuring illumination, the proposed spectral transplantation method solves a number of complex VC problems. Let us say, how to best align the color, brightness, and contrast characteristics between real and virtual components in AR scenes. All these tasks are solved through one simple procedure without modeling lighting conditions, AR-scene geometry or BRDF, which eliminates the inevitable modeling errors. The proposed method can be a valuable addition to other VC tools.

## References

1. Hughes CL, Fidopiastis C, Stanney KM, et al. The Psychometrics of Cybersickness in Augmented Reality. Frontiers in Virtual Reality. 2020;1:602954. https://doi.org/10.3389/frvir.2020.602954

2. Somanath G, Kurz D. HDR Environment Map Estimation for Real-Time Augmented Reality. Cupertino, CA: Apple Inc.; 2020. URL: https://arxiv.org/pdf/2011.10687.pdf (accessed: 17.06.2022).

3. Kronander J, Banterle F, Gardner A, et al. Photorealistic Rendering of Mixed Reality Scenes. Computer Graphics Forum. 2015;34(2):643–665. https://doi.org/10.1111/cgf.12591 (accessed: 17.06.2022).

4. Debevec P, Graham P, Busch J, et al. A Single-Shot Light Probe. SIGGRAPH ’12: ACM SIGGRAPH 2012 Talks. 2012;10:1–19. URL: https://vgl.ict.usc.edu/Research/SSLP/A_Single_Shot_Light_Probe-SIGGRAPH2012.pdf (accessed: 17.06.2022).

5. Alhakamy A, Tuceryan M. CubeMap360: Interactive Global Illumination for Augmented Reality in Dynamic Environment. In: Proc. IEEE SoutheastCon. Huntsville, AL: IEEE; 2019. https://doi.org/10.1109/SoutheastCon42311.2019.9020588

6. Knorr SB, Kurz D. Real-Time Illumination Estimation from Faces for Coherent Rendering. In: Proc. IEEE Int. Symposium on Mixed and Augmented Reality (ISMAR). Munich: IEEE; 2014. P. 113–122. https://doi.org/10.1109/ISMAR.2014.6948483

7. Seiji Tsunezaki, Ryota Nomura, Takashi Komuro, et al. Reproducing Material Appearance of Real Objects using Mobile Augmented Reality. In: Proc. 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). Munich: IEEE; 2018. P. 196–197. https://doi.org/10.1109/ISMAR-Adjunct.2018.00065

8. Reinhard E, Akyuz AO, Colbert M, et al. Real-Time Color Blending of Rendered and Captured Video. In: Proc. Interservice/Industry Training, Simulation and Education Conference (I/ITSEC). Orlando, Florida: National Training and Simulation Association; 2004. P. 1–9. URL: https://user.ceng.metu.edu.tr/~akyuz/files/blend.pdf (accessed: 17.06.2022).

9. Xuezhong Xiao, Lizhuang Ma. Gradient-Preserving Color Transfer. Computer Graphics Forum. 2009;28(7):1879–1886. https://doi.org/10.1111/j.1467-8659.2009.01566.x

10. Gorbunov AL, et al. Sposob formirovaniya izobrazheniya dopolnennoi real'nosti, obespechivayushchii sovpadenie vizual'nykh kharakteristik real'nykh i virtual'nykh ob"ektov. RF Patent No. 2667602. 2019. (In Russ.)

11. Oliva A, Torralba AJ, Schyns PhG. Hybrid Images. ACM Transactions on Graphics. 2006;25(3):527–532. https://doi.org/10.1145/1179352.1141919

12. Kán P, Kafumann H. DeepLight: Light Source Estimation for Augmented Reality Using Deep Learning. The Visual Computer. 2019;35:873–883. https://doi.org/10.1007/s00371-019-01666-x

### About the Author

**A. L. Gorbunov**Russian Federation

Andrey L. Gorbunov, Cand.Sci. (Eng.), associate professor, Air Traffic Control Department; Director-General

20, Kronshtadtskii blvd, Moscow, 125993, RF

5, Zagoryevskaya St., Moscow, 115372, RF

### Review

**Supporting agencies:**the author would like to thank Alessandro Terenzi, Inglobe Technologies Srl, Ceccano, Italy, for his support in software development.

#### For citations:

Gorbunov A.L.
Visual Coherence for Augmented Reality. *Advanced Engineering Research (Rostov-on-Don)*. 2023;23(2):180-190.
https://doi.org/10.23947/2687-1653-2023-23-2-180-190