Perceptual and Crossmodal Rendering
The scientific achievements of REVES for the objective of perceptual and crossmodal rendering were performed mainly in the context of the CROSSMOD EU project, which we coordinated from 2005-2008. They can be split into three main categories: unimodal graphics perceptual rendering results, crossmodal perceptual results, and sound rendering algorithmic improvements which were developed as part of the overall perceptual rendering effort.
Unimodal Graphics Perceptual Rendering. We first investigated different perceptual models such as the threshold map. Using a fast GPU implementation from the Un. of Erlangen, we developed an algorithm based on “masking layers” to select perceptually-driven levels of detail for complex objects [DBDLSV07]. We demonstrated that this approach improves performance while maintaining perceptually equivalent levels of quality. This result was verified with a perceptual study performed with guidance from our neuroscience collaborators.
Crossmodal Perceptual Rendering. The first problem concerned sound rendering of complex environments. Our goal was to determine if visuals had an effect on the quality of sound when using our clustering algorithm [TGD04] for sound rendering of complex scenes. We performed a psychophysical study to investigate this question; the result was that it is best to place more clusters in the visible frustum. We subsequently developed an algorithm which appropriately influences the clustering algorithm to optimize for this goal [DBDLSV07].
Collisions and explosions naturally generate large numbers of sound sources and involve an inherently audio-visual phenomenon. We developed a novel algorithm which computes impact sounds in the frequency domain, utilizing the natural spectral sparseness of modal synthesis [BDTVJ08]. To improve performance we used the perceptual effect which shows that we do not perceive delays of sound with respect to the corresponding visual impact. This allowed us to delay sound processing, thus smoothing out computation load over time.
We performed a perceptual study with the goal of understanding whether audio can influence the per¬ception of visual quality of materials and vice versa. Our study demonstrated that higher quality audio can improve the perceived similarity of a given material to a (hidden) reference [BSVD10]. To our knowledge, this is the first time a bimodal audio-visual perceptual effect has been demonstrated for mate-rial perception. The results of this study were applied to a crossmodal audio-visual algorithm [GBWAD09] which optimizes the choice of audio and material quality rendering based on the result of the perceptual study of [BSVD10].
One of the applications of CROSSMOD was the use of audiovisual rendering results to study the effect of virtual environments on emotion. A virtual environment was developed for subjects with cynophobia, which allowed us to study the consequences of audio and visual effects on emotion [VZBSWND08]. We are currently following up this work in the context of the ARC NIEVE collaborative project.
Sound Rendering. In the context of the solutions developed in CROSSMOD, we developed several “pure” sound-related algorithms, in particular in the goal of rapid simulation of HRTFs.
We first developed an efficient approximation to first order sound scattering from complex surfaces [TDLD07] based on the Helmoltz-Kirchoff integral. The approach leverages programmable graphics hardware (similar to reflection shadow maps) to efficiently sample an acoustic surface scattering integral on complex 3D meshes. For 1st order scattering and in far field from the surfaces, the approach gives results that compare favorably with boundary element techniques. We applied this technique to model the scattering of a 3D model of the head and torso for a given subject to individualize 3D audio rendering over headphones [DPTAS08]. This was used to develop a technique to fit a proxy head, ears and torso geometry to a set of photographs of a subject and compute individualized binauralization filters.
Finally, we developed a novel approach to rendering reverberation effects in acoustically coupled environments comprising multiple rooms [STC08]. We precompute “form-factor” propagation filters between a point in each room and the connecting openings/portals. Propagation paths are topologically determined by traversing the graph defined by rooms and portals and the filters are convolved along each propagation paths. Contributions from all paths are added to obtain the final response. The approach allows for modeling dynamic environments with opening or closing portals.
Figure 1: On the left we show a screenshot from a complex scenes where impacts sounds are computed using [BDTVJ08].
On the right, we show a screenshot of the perceptual experiment [BSVD10].
Other perception based work. We performed perceptual evaluations in two other projects since the end of CROSSMOD. In the context of the hair reflectance parameter extraction project [BPVDD09] we evaluated the success of our metric for selecting appropriate feature vectors in our data set using a perceptual experiment. Similarly, in our recent work on non-photorealistic rendering [BLVLDT10], we performed a perceptual evaluation of different NPR techniques to determine the relative strengths and weaknesses of each approach.
Combining Data-Driven and Simulation Approaches
In this objective we worked on data-driven methods using images for graphics and recordings for audio. Our work can be grouped into relighting and reflectance estimation, sound re-rendering and grain-based synthesis and also texture-synthesis solutions. We introduced a new axis of research within this objective: interactive solutions for content creation.
Relighting and reflectance estimation. We developed a solution for relighting of tree canopy photographs. The input is simply a set of photographs at a single time of day. We use information from the images to develop a single scattering volumetric rendering model approximation. By using this model and combining it with an analytical sun/sky model, we are able to relight the tree photographs at any other target time of day. We also developed a solution to estimate hair reflectance parameters from photographs [BPVDD09]. To do this, we use a learning approach based on synthetic rendering of an analytical model to find the appropriate parameters for an interactive variant of the Marschner hair reflectance model.
Fig. 2: Our solution allows relighting of tree canopies by using only a single input photograph,
which contains the lighting condition (top left image).
On the first row we see ground truth sparse time lapse images of a tree.
In the second row we see the results of our relighting algorithm.
The last row shows the "volumetric" reconstruction of the tree.
Sound re-rendering and grain-synthesis. We developed acoustical scene modeling from recordings inspired by computer vision techniques. Using a set of non-coincident microphones, we developed an approach allowing the extraction of information about the position of the different sound sources present in the recording. The original recordings can then be dynamically warped to re-render the acoustical scene from different viewpoints not originally captured. Sources can also be moved and occlusion from virtual walls can also be simulated [GTL07, GT07].
Synthesizing contact sounds is a hard and important problem. While simulation-based methods produce good results, they often lack the richness of real sounds. We developed a ”sound texture”-based approach to the synthesis of rolling and scraping sounds using recordings; our approach allows us to re-target example contact sounds to different simulations. Our method extracts short audio elements or ”grains” from the original recordings. The grains can then be re-triggered appropriately based on contact information reported by an interactive physics engine. The original recordings can also be easily time-stretch to match a new simulation scenario and the timbre and temporal events can be decoupled [PTF08, PTF09, PFDK09].
Texture synthesis and rendering. As discussed in the objectives set out four years ago, we considered texture synthesis and procedural techniques to be a very promising challenge in the domain of data-driven methods.
A typical way to give an appearance to a surface is to define colors in a volume surrounding the object. The volume is then sampled in every point of the surface to determine its color. We developed an algorithm to automatically generate such volumes of colors from a set of 2D example images. Our algorithm has the unique ability to restrict computations around the surface, while enforcing spatial determinism [DLTD08]. In the work “Texturing from Photographs” [ELS08] we extract the input example from any surface from within an arbitrary photograph. This introduces several challenges: Only parts of the photograph are covered with the texture of interest, perspective and scene geometry introduce distortions, and the texture is non-uniformly sampled during the capture process.
In recent work we developed a new texture synthesis algorithm targeted at architectural textures. While most existing algorithms support only stochastic textures, ours is able to synthesize new images from highlystructured examples such as images of architectural elements (facades, windows, doors, etc.). In addition, our approach is designed so that results are compactly encoded and quickly accessed from the graphics processor during the rendering process. Thus, when used as textures our synthesis results use little memory compared to the equivalent images they represent [LHL10] (see Fig. 3).
Figure 3: (a) From a source image our synthesizer generates new textures fitting surfaces of any size.
(b) The user has precise control over the results.
(c) Our algorithm synthesizes textures by reordering strips of the source image.
Each result is a path in a graph describing the space of synthesizable images.
Only the path need to be stored in memory. Decoding is performed during display, in the pixel shader.
We have also used texture synthesis to perform rendering based on a single image, we do this by combining an image-analogies type approach and a reprojection scheme [BVLD10]. Based on only a single segmented image, we are able to move around at near-interactive rates. While the results are still preliminary, this direction is very promising.
Interactive solutions for content creation. A novel axis in this objective is the development of interactive solutions for content creation. We have developed a solution for interactive modeling of textured architectural models and for texture assignment.
Our interactive modeling approach [CLDD09], allows the user to manipulate a small number of vertices, and imposes a set of constraints (e.g., preserve angles during manipulation). We also analyse texture of the model, allowing the texture detail to be preserved during stretching. We use a sparse system solver, allowing interactive performance. In recent work we developed an algorithm to help modelers texture large virtual environments [CLS10]. Modelers typically manually select a texture from a database of materials for each and every surface of a virtual environment. Our algorithm automatically propagates user input throughout the entire environment as the user is applying textures to it. After choosing textures for only a small subset of the surface, the entire scene is textured.
Fig. 4: Our solution allows the creation of novel 3D models (show on the right side) by combining
and reshaping existing ones, shown here on the left side.
Textures are also appropriately reshape in conjunction with the geometry.
Plausible to High-Quality Continuum for Rendering
Our goals for this objective were to work on algorithms varying from high-quality realistic rendering solutions, to fast but plausible approximations. We developed solutions for lighting, sound and texture within this objective. Effective data structures are a key for this objective, and we developed two new solutions in this context. Interactive Global Illumination As proposed in the objectives for the previous period, we developed a novel solution for interactive global illumination [DSDD07] (see Fig. 5). To achieve this goal, we reformulated the rendering equation by removing visibility from the kernel. This led us to define a new quantity which we called “Antiradiance” which is propagated in parallel to radiance. We demonstrated that the new solution solves the same equation, and that the iterative scheme we proposed converges to the correct solution for radiance
Figure 5: Left, we show a screenshot of the Antiradiance solution [DSDD07]
with four bounces of global illumination, running at 9 frames/sec.
Right, some examples of the textures produced by Gabor noise [LLDD09].
Sound rendering. We developed a novel solution for progressive rendering of impact sounds, based on a new hierarchical data structure for solid objects [PFDK09]. The hierarchical nature of the spatial data structure allows us to progressively improve the quality of the impact sound approximation, since the modal analysis is performed on the simplified tetrahedral approximation to the original surface. To our knowledge, this is the first progressive approach to impact sound synthesis.
As a followup to the work in CROSSMOD, we developed a complete pipeline for 3D audio synthesis which incorporates all our scalable sound rendering algorithms[GBWAD09], including our frequency-based impact sound solution. Initially this effort was part of our attempt to spin-off our audio technologies; however after a one year market survey, we concluded that the commercial potential of the audio technologies developed did not justify the investment for full-fledged startup. Nonetheless, an INRIA funded “young engineer” was hired to create a high-quality audio library (APF – audio processing framework). APF is now available for the group.
Procedural texture methods. Noise is an essential tool for texturing and modeling. We developed a new noise function, called Gabor noise, based on sparse convolution and the Gabor kernel [LLDD09]. Gabor noise offers a unique combination of properties not found in other noise functions: accurate spectral control with intuitive parameters for easy texture design, setup-free surface noise without surface parametrization for easy application on surfaces, and analytical anisotropic filtering for high-quality rendering. By varying the number of impulses used, we achieve a quality/speed tradeoff. Gabor noise can be used as an interactive tool for designing procedural textures based on noise. We have extended this approach to non-photorealistic rendering, which poses challenging questions between 2D and 3D representations [BLVLDT10].
Data Structures. Texture mapping with atlases suffer from several drawbacks: Wasted memory, seams, and uniform resolution. We proposed new data-structures to overcome these limitations [LD07]. The Tile-Tree stores square texture tiles into the leaves of an octree surrounding the surface. At rendering time the surface is projected onto the tiles, and the color is retrieved by a simple 2D texture fetch into a tile map. Our method is simple to implement, does not require long pre-processing time, nor any modification of the textured geometry. We also proposed a new, highly compressed adaptive multiresolution hierarchy to store spatially coherent graphics data. Our data-structure provides very efficient random access – required by rendering algorithms – by using a compact primal tree structure [LH07].
|