Literature Review
Because
of the multidisciplinary content of the proposed research, this section
will review three main bodies of literature:
Work
related to robotic-assisted surgery, in which the different attempts to
carry out robot assisted pedicle-screw insertion are reviewed.
Work
related to 3D vision
Work
related to model matching
First,
however, the motivation for the use of robotic manipulators in the pedicle-screw
insertion procedure is outlined.
Motivation for Pedicle Screw Insertion
Studies
have shown, that screws are placed outside the pedicle in 25% of cases
(Gertzbein, 1990), 21% (Weinstein et al., 1988), and 6% (McGowan,
l991). Post-operative complications have been reported as high as 25% and
mortality as high as 1%.
Moreover,
unlike other interventions, in the case of pedicle screw insertion, a single
CT image in the longitudinal plane of the pedicle may falsely imply the
best path for a given diameter screw through the pedicle. From the unpublished
work of Berlemann, the best path through the pedicle is determined from
the three-dimensional reconstruction of the pedicle (Nolte, 1995).
3D Vision Systems
Many
techniques have been developed to extract three-dimensional information
from images of the scene (Kanande, 1987). However, keeping a versatile
and non-invasive approach as a main directive, the natural choice
is computer vision. Significant interest has been devoted in the past few
years to create a robust computer vision system that is both flexible and
easy to manipulate.
Many
commercial systems have been developed for this purpose, such as the ProphecyTM
and the CognexTM machine vision systems. These differ in their
flexibility, ranging from task specific production control systems to versatile
programmable vision systems. A more detailed discussion of the topic is
found in Mrad et al. (1993). Although many have been successfully
used to achieve the set out tasks (Bennet et al. 1991, Mrad et
al. 1993), even the most flexible ones are still very hardware dependent,
because of the high computational cost of the vision algorithms. This results
in some limitations on other computational tasks that could be needed in
a certain project. In addition, very few systems support 3D vision without
mechanical or digital adjustments to the system.
Recent
research has been aimed at developing faster and more flexible models for
3D-acquisition of data for vision purposes. Three main methods are currently
being investigated to extract the desired 3D characteristics from an image:
monocular shape analysis, photometric analysis, and the binocular or stereo
vision approach.
The
monocular shape methods rely on a viewpoint analysis of a matched model,
and are limited to a set of generic shapes. Images are broken down to spheres
and cylinders (Shiu et al. 1989), or to a more general combination
of straight lines and arcs (Murkerjee 1991, Shiu
et al. 1990). In
addition studies entirely devoted to the matching of 3D curves (Heisterkamp
et
al., 1996) can be found.
The
photometric stereo is a method to estimate surface vector normals based
on the input images (Woodham, 1980). It estimates vector normals of objects
taken from the same viewpoint but under different lighting conditions.
The main disadvantage of such systems is the cast shadows that prevent
the correct reconstruction of the 3D shape. Some attempts have been made
to eliminate this problem by the correct planning of the light sources
and cameras (Sakane et al., 1991).
Binocular
stereo vision follows the same principles as the human 3D-vision system.
It has gained the most of attention in the 3D-analysis domain, because
of its versatility. Its main drawback is the problem of image correspondence,
which has become the main concern when addressing stereo vision (Sonka
et
al., 1993). Two main resolutions were proposed as solutions to this
problem: model matching and increasing image correlation. The former draws
back to the case of monocular analysis, whereas the latter imposes additional
setup and/or computational requirements. In general, the second option
is the most prominent, with typical applications with structured light,
laser scans, multi-baselines and using more than two cameras (Kang et
al. 1994), which are all ad-hoc solution to facilitate the matching
procedure. These algorithms begin with a preprocessing of the image to
isolate only certain features (e.g., the edges) to be used in the analysis,
then use some correlation technique to match corresponding image pairs.
The main problem in these algorithms is their lack of robustness, especially
near occlusions. Many attempts to correct this problem where made, of which
is the one proposed by Lan and Mohr (1997).
Finally,
it should be noted that the accuracy of the described acquisition methods
depends heavily on the camera acquisition system, and can at most be as
precise as the latter is, unless subpixel image restoration is applied.
A more detailed discussion of subpixel image acquisition will be included
in the accuracy analysis.
Model Matching
Because
model matching has numerous applications in model-based object recognition
and object localization, a large amount of research has already been reported
on the subject. In 2D, model matching can be carried out quickly and accurately,
as described for instance in Aaron et al. (1997). However, in 3D,
the problem has not been solved yet, and different solutions are still
being tried out. In the following, focus will be on surface matching schemes
most suitable for medical imaging.
The
simplest surface matching approach is the pair-wise feature match, where
a small number of features points are isolated on both surfaces, and an
optimal rotation-translation is directly obtained. An improvement over
the pair-wise match is to perform a least square fit when calculating the
rotation-translation transformation; of course, this necessitates a larger
number of feature points. Finally, pair-wise matching can be generalized
to what is called the indexing methods, where a large number of feature
points are isolated in each surface, and different transformations are
tried out. The choice of the best transformation is based on voting tables
that evaluate some matching criteria of the indexes associated with each
image after each transformation is carried out. In addition, transformation
invariants are made use of to isolate the translational and rotational
match. A robust and efficient indexing method was proposed by Baraquet
et
al. (1997). These methods depend heavily on the shape of the surfaces,
and are computationally demanding.
It
remains however that the most intuitive and most widely used approach in
medical image matching (Focus Imaging 1998) is to minimize a function of
distance between the two surfaces: a computationally efficient algorithm
was originally proposed by Besl et al. (1992). Although conceptually
sound, the algorithm suffered from a poor convergence rate, and was prone
to false local minima. For these reasons, several modifications have been
proposed, of which an enumeration and short description can be found in
Hilton (1997). The survey by Focus Imaging (1998) addresses the efficiency
of those modifications used for medical imaging.
Robotic-Assisted Surgery and Pedicle-Screw Insertion
Attempts
In
the orthopedic field, probably the most well-known application is Robodoc,
an image-directed surgical robot that was developed to help surgeons prepare
a cavity for a prosthesis in a total hip replacement (THR) surgery. Its
developers have reported twenty-six successful robot-assisted operations
on dogs, and human clinical trials are ongoing at three centers. The system
uses digital data from CT scan of the femur. The developers (Mittelstadt
et
al. 1993) report a great increase in accuracy and precision of the
joint replacement procedure.
In
what concerns pedicle-screw insertions, several feasibility studies and
experimental trial have already been reported, these are next enumerated
and the image registration in each discussed:
Abdel-Malek
et
al. (1995) use a mold material to get an impression of the vertebra
structure, which is then CT scanned and compared to the original CT scans
of the spine. This method has been recognized to be too invasive, and is
the reason behind the research on which this thesis is based. Nolte et
al. (1995) use an optoelectronic motion analysis system to recognize
and track the orientation of each vertebra. The matching with the CT data
is done using a pair-wise point match of three to six predetermined anatomical
landmarks, and if this fails, surface matching between the two data using
30 to 60 points is carried out. The validation and time requirements of
the matching algorithm are not developed. A variation of this procedure
had been conceived by Lavallee et al. (1994), with the difference
of using uncalibrated range finders. Peshkin et al. (1995) rely
on steel balls mounted on the end effector of the robotic manipulator that
is going to do the drilling. The location of the balls is used to determine
the amount of displacement needed to reach a desired point in three different
orthographic projection of the vertebra: namely the transverse, A/P and
sagittal view. Finally, Amiot et al. (1995), in a feasibility study,
describes a manual intervention with a pointing device to approximate the
location of 5 predetermined anatomical landmarks on the vertebra. The exact
location is calculated using a probabilistic single-value decomposition
algorithm, and the most probable rotation and translation are obtained.