1
00:00:00,150 --> 00:00:04,600
High resolution surface reconstruction from overlapping multiple-views 

2
00:00:04,650 --> 00:00:08,500
by Nader Salman and Mariette Yvinec from the Geometrica team-project in INRIA.

3
00:00:09,250 --> 00:00:19,680
Three dimensional reconstruction of real world scenes from sequences of overlapping images is a topic of much interest in Computer Vision, Computer Graphics and Computational Geometry.

4
00:00:19,750 --> 00:00:23,450
Its application areas range from preservation of historical heritage

5
00:00:23,550 --> 00:00:30,000
to virtual reality walkthroughs through computer animation, movie production and e-commerce.

6
00:00:31,150 --> 00:00:38,550
The challenge is to recover simplicial models of complex scenes from a set of images taken from various viewpoints around the scene. 

7
00:00:38,600 --> 00:00:44,000
The models sought after are either surface meshes or 3D meshes of objects in the scene.

8
00:00:44,850 --> 00:00:49,200
The first step of the process consists of using a stereo algorithm

9
00:00:49,300 --> 00:00:51,400
to extract a set of 3D points from the images. 

10
00:00:51,450 --> 00:00:53,100
To speed up the extraction phase,

11
00:00:53,200 --> 00:00:56,300
the GYROVIZ project, produces sequences of located images

12
00:00:56,400 --> 00:01:01,000
describing for each image, the position of the camera as well as the shooting parameters.

13
00:01:01,150 --> 00:01:16,500
Most matchmoving softwares require highly coherent image sequences and generate sparse point clouds. 

14
00:01:16,700 --> 00:01:32,000
The software of [ENPC/Certis-CSTB/Mod-EVE] we use requires located frames and generates quasi dense point clouds.

15
00:01:32,150 --> 00:01:36,350
In both cases the output of the image processing step is a set of tracks,

16
00:01:36,450 --> 00:01:41,500
where each track is a 3D point associate to a list of images where the point is visible.

17
00:01:42,150 --> 00:01:46,150
The set of images and tracks constitute the input of our reconstruction algorithm.

18
00:01:46,900 --> 00:01:48,350
The extracted tracks are not perfect,

19
00:01:48,400 --> 00:01:54,500
they include redundancies, outliers and noise.

20
00:01:54,900 --> 00:02:02,000
First, some multiple tracks correspond to the same 3D feature point and must be merged.

21
00:02:02,850 --> 00:02:06,850
As tracks are coming from characteristic points detected in the images,

22
00:02:06,950 --> 00:02:10,500
their 3D locations should be densely spread over the surface of the scene objects.   

23
00:02:10,600 --> 00:02:16,500
Thus, we detect outliers by computing the average squared distance to the k-nearest neighbors for each track.

24
00:02:17,000 --> 00:02:20,750
Another criterion is based upon the camera positions.

25
00:02:20,800 --> 00:02:24,150
For each track P we compute the smallest cone with apex at P

26
00:02:24,200 --> 00:02:26,800
that contains all cameras C that observed point P.

27
00:02:26,900 --> 00:02:31,150
Tracks with small aperture cone angles are discarded.

28
00:02:32,00 --> 00:02:35,600
The final preprocessing step is straightforward

29
00:02:35,650 --> 00:02:47,150
as it smoothes the remaining tracks using a jet-fitting and reprojection algorithm.

30
00:02:47,700 --> 00:02:52,800
At this stage the goal is to extract from the preprocessed tracks and the input images,

31
00:02:52,900 --> 00:02:57,100
a set of higher level primitives conforming to the surface of the objects in the scene.

32
00:02:57,200 --> 00:03:03,150
The final reconstruction sought after must respect the sharp creases of the scene.

33
00:03:03,950 --> 00:03:10,100
To detect those edges we perform into the input images a simple contour extraction algorithm

34
00:03:10,150 --> 00:03:12,800
based on Frei-Chen edge convolution masks.

35
00:03:12,950 --> 00:03:15,850
We will use the detected contours to create in each image

36
00:03:15,950 --> 00:03:17,150
a constrained Delaunay triangulation.

37
00:03:18,000 --> 00:03:20,600
Let us first define the constraints.

38
00:03:20,700 --> 00:03:24,500
In each image, we consider the projection of the tracks observed in this image,

39
00:03:24,550 --> 00:03:26,850
and perform a classification of these tracks

40
00:03:26,950 --> 00:03:29,700
depending on their proximity to the extracted contours.

41
00:03:29,900 --> 00:03:33,550
We now consider the complete graph over all the contour tracks

42
00:03:33,650 --> 00:03:47,150
and select the edges of this graph which lay over the detected contours.

43
00:03:49,150 --> 00:03:54,900
At this stage the selected edges may not form a correct input for a constrained 2D triangulation.

44
00:03:55,050 --> 00:03:56,500
To avoid collinear segments,

45
00:03:56,600 --> 00:04:00,400
when two selected edges form a small angle we eliminate the longest one.

46
00:04:00,600 --> 00:04:01,950
Now, to avoid intersections,

47
00:04:02,000 --> 00:04:05,300
when two selected edges intersect we eliminate the longest one.

48
00:04:05,400 --> 00:04:09,000
The remaining edges are used as input segments to 

49
00:04:09,050 --> 00:04:11,350
a 2D constrained Delaunay triangulation

50
00:04:11,400 --> 00:04:15,150
with all projected tracks as vertices.

51
00:04:16,500 --> 00:04:17,900
The triangles of this triangulation are then

52
00:04:17,950 --> 00:04:21,050
lifted in 3D space using the 3D coordinates

53
00:04:21,100 --> 00:04:25,500
computed by the stereo algorithm. 

54
00:04:26,000 --> 00:04:29,000
This process is repeated for each image.

55
00:04:29,650 --> 00:04:33,550
The output is a 3D triangle soup in which the vast majority of the triangles correspond to the

56
00:04:33,600 --> 00:04:35,900
actual surface regions in the scene.

57
00:04:36,000 --> 00:04:38,550
However, at occlusion boundaries the present scheme may produce

58
00:04:38,600 --> 00:04:41,350
triangles which connect foreground to background surfaces.

59
00:04:41,650 --> 00:04:42,950
Two filtering steps are applied to the created triangle soup to

60
00:04:43,050 --> 00:04:44,700
further enforce visibility consistency

61
00:04:44,750 --> 00:04:47,750
and to remove erroneous triangles.

62
00:04:47,500 --> 00:04:50,800
If a triangle belongs to the final surface then

63
00:04:50,850 --> 00:04:53,300
it should not be intersected by any sight segment

64
00:04:53,350 --> 00:05:05,550
joining a track 3D position to one of the cameras observing this track.

65
00:05:05,900 --> 00:05:08,000
We now filter the big triangles;

66
00:05:08,100 --> 00:05:11,200
where the longest edge is more than a given fraction of the bounding box.

67
00:05:11,300 --> 00:05:14,550
Among these triangles we distinguish anamorphic and well-shaped triangles.

68
00:05:14,650 --> 00:05:15,900
Anamorphic triangles

69
00:05:15,950 --> 00:05:18,200
are likely to lie in free space and should be discarded.

70
00:05:18,300 --> 00:05:19,600
Whereas well-shaped big triangles

71
00:05:19,650 --> 00:05:21,600
may or not lie on the surface of the scene,

72
00:05:21,700 --> 00:05:25,150
and are classified by measuring the cross correlation between images where they have been seen.

73
00:05:25,900 --> 00:05:29,350
We generate the final scene surface by

74
00:05:29,400 --> 00:05:42,150
computing the Delaunay triangulation of the tracks restricted to the triangle soup.

75
00:05:43,100 --> 00:05:50,500
Outdoor scene example 1.

76
00:05:51,200 --> 00:06:00,500
Outdoor scene example 2.