Literature Review

 
 
 

Because of the multidisciplinary content of the proposed research, this section will review three main bodies of literature:
First, however, the motivation for the use of robotic manipulators in the pedicle-screw insertion procedure is outlined.

Motivation for Pedicle Screw Insertion

Studies have shown, that screws are placed outside the pedicle in 25% of cases (Gertzbein, 1990), 21% (Weinstein et al., 1988), and 6% (McGowan, l991). Post-operative complications have been reported as high as 25% and mortality as high as 1%.
Moreover, unlike other interventions, in the case of pedicle screw insertion, a single CT image in the longitudinal plane of the pedicle may falsely imply the best path for a given diameter screw through the pedicle. From the unpublished work of Berlemann, the best path through the pedicle is determined from the three-dimensional reconstruction of the pedicle (Nolte, 1995).

3D Vision Systems

Many techniques have been developed to extract three-dimensional information from images of the scene (Kanande, 1987). However, keeping a versatile and non-invasive approach as a main directive, the natural choice is computer vision. Significant interest has been devoted in the past few years to create a robust computer vision system that is both flexible and easy to manipulate.
Many commercial systems have been developed for this purpose, such as the ProphecyTM and the CognexTM machine vision systems. These differ in their flexibility, ranging from task specific production control systems to versatile programmable vision systems. A more detailed discussion of the topic is found in Mrad et al. (1993). Although many have been successfully used to achieve the set out tasks (Bennet et al. 1991, Mrad et al. 1993), even the most flexible ones are still very hardware dependent, because of the high computational cost of the vision algorithms. This results in some limitations on other computational tasks that could be needed in a certain project. In addition, very few systems support 3D vision without mechanical or digital adjustments to the system.
Recent research has been aimed at developing faster and more flexible models for 3D-acquisition of data for vision purposes. Three main methods are currently being investigated to extract the desired 3D characteristics from an image: monocular shape analysis, photometric analysis, and the binocular or stereo vision approach.

The monocular shape methods rely on a viewpoint analysis of a matched model, and are limited to a set of generic shapes. Images are broken down to spheres and cylinders (Shiu et al. 1989), or to a more general combination of straight lines and arcs (Murkerjee 1991, Shiu et al. 1990). In addition studies entirely devoted to the matching of 3D curves (Heisterkamp et al., 1996) can be found.

The photometric stereo is a method to estimate surface vector normals based on the input images (Woodham, 1980). It estimates vector normals of objects taken from the same viewpoint but under different lighting conditions. The main disadvantage of such systems is the cast shadows that prevent the correct reconstruction of the 3D shape. Some attempts have been made to eliminate this problem by the correct planning of the light sources and cameras (Sakane et al., 1991).

Binocular stereo vision follows the same principles as the human 3D-vision system. It has gained the most of attention in the 3D-analysis domain, because of its versatility. Its main drawback is the problem of image correspondence, which has become the main concern when addressing stereo vision (Sonka et al., 1993). Two main resolutions were proposed as solutions to this problem: model matching and increasing image correlation. The former draws back to the case of monocular analysis, whereas the latter imposes additional setup and/or computational requirements. In general, the second option is the most prominent, with typical applications with structured light, laser scans, multi-baselines and using more than two cameras (Kang et al. 1994), which are all ad-hoc solution to facilitate the matching procedure. These algorithms begin with a preprocessing of the image to isolate only certain features (e.g., the edges) to be used in the analysis, then use some correlation technique to match corresponding image pairs. The main problem in these algorithms is their lack of robustness, especially near occlusions. Many attempts to correct this problem where made, of which is the one proposed by Lan and Mohr (1997).

Finally, it should be noted that the accuracy of the described acquisition methods depends heavily on the camera acquisition system, and can at most be as precise as the latter is, unless subpixel image restoration is applied. A more detailed discussion of subpixel image acquisition will be included in the accuracy analysis.

Model Matching

Because model matching has numerous applications in model-based object recognition and object localization, a large amount of research has already been reported on the subject. In 2D, model matching can be carried out quickly and accurately, as described for instance in Aaron et al. (1997). However, in 3D, the problem has not been solved yet, and different solutions are still being tried out. In the following, focus will be on surface matching schemes most suitable for medical imaging.
The simplest surface matching approach is the pair-wise feature match, where a small number of features points are isolated on both surfaces, and an optimal rotation-translation is directly obtained. An improvement over the pair-wise match is to perform a least square fit when calculating the rotation-translation transformation; of course, this necessitates a larger number of feature points. Finally, pair-wise matching can be generalized to what is called the indexing methods, where a large number of feature points are isolated in each surface, and different transformations are tried out. The choice of the best transformation is based on voting tables that evaluate some matching criteria of the indexes associated with each image after each transformation is carried out. In addition, transformation invariants are made use of to isolate the translational and rotational match. A robust and efficient indexing method was proposed by Baraquet et al. (1997). These methods depend heavily on the shape of the surfaces, and are computationally demanding.
It remains however that the most intuitive and most widely used approach in medical image matching (Focus Imaging 1998) is to minimize a function of distance between the two surfaces: a computationally efficient algorithm was originally proposed by Besl et al. (1992). Although conceptually sound, the algorithm suffered from a poor convergence rate, and was prone to false local minima. For these reasons, several modifications have been proposed, of which an enumeration and short description can be found in Hilton (1997). The survey by Focus Imaging (1998) addresses the efficiency of those modifications used for medical imaging.

Robotic-Assisted Surgery and Pedicle-Screw Insertion Attempts

In the orthopedic field, probably the most well-known application is Robodoc, an image-directed surgical robot that was developed to help surgeons prepare a cavity for a prosthesis in a total hip replacement (THR) surgery. Its developers have reported twenty-six successful robot-assisted operations on dogs, and human clinical trials are ongoing at three centers. The system uses digital data from CT scan of the femur. The developers (Mittelstadt et al. 1993) report a great increase in accuracy and precision of the joint replacement procedure.
In what concerns pedicle-screw insertions, several feasibility studies and experimental trial have already been reported, these are next enumerated and the image registration in each discussed:
Abdel-Malek et al. (1995) use a mold material to get an impression of the vertebra structure, which is then CT scanned and compared to the original CT scans of the spine. This method has been recognized to be too invasive, and is the reason behind the research on which this thesis is based. Nolte et al. (1995) use an optoelectronic motion analysis system to recognize and track the orientation of each vertebra. The matching with the CT data is done using a pair-wise point match of three to six predetermined anatomical landmarks, and if this fails, surface matching between the two data using 30 to 60 points is carried out. The validation and time requirements of the matching algorithm are not developed. A variation of this procedure had been conceived by Lavallee et al. (1994), with the difference of using uncalibrated range finders. Peshkin et al. (1995) rely on steel balls mounted on the end effector of the robotic manipulator that is going to do the drilling. The location of the balls is used to determine the amount of displacement needed to reach a desired point in three different orthographic projection of the vertebra: namely the transverse, A/P and sagittal view. Finally, Amiot et al. (1995), in a feasibility study, describes a manual intervention with a pointing device to approximate the location of 5 predetermined anatomical landmarks on the vertebra. The exact location is calculated using a probabilistic single-value decomposition algorithm, and the most probable rotation and translation are obtained.