Model Matching

To successfully register acquired images of the vertebra bone with respect to the CT cross-sections, surface matching has to be carried out. Moreover, the matching procedure must have the following characteristics:

Precision: to allow an accurate localization of the surface.

Robustness: to allow for a margin of errors in the acquisition system.

Computational efficiency: to allow near real time implementation. For the surgeon in the operating room, the proposed procedure must not add dead time to the surgical intervention. The registration must be done within an estimate time of 20 minutes.

The above requirements will next be refined, and an implementation of a matching algorithm will be presented.

Difficulties

The main problem when dealing with medical images is the large data size, which makes the storage and manipulation difficult.

Storage

Typically, for an operation such as the pedicle screw insertion, the patient's spine is scanned with a high resolution, 1024 by 1024 pixels of 1mm thickness cuts, and 0 spacing between them. Three vertebra (approximately 20 cm) would require 1024?1024?200 bytes of storage (200 Mbytes), assuming all operations needed on the image can be handled by the minimal 8-bit integer type. Clearly, manipulating two 200 Mbytes models on a medium sized machine is not feasible in a near real-time fashion.

Computational Complexity

Many sophisticated refinements are constantly being made to improve the quality of surface matching. These additions however would inevitably increase the overall complexity of the system. In the context of robotic surgery, such add-hock solutions cannot be accepted.

Conceptual Modeling

The proposed algorithm relies on the following representation of 3D data (thereafter referred to as Grid-Model): Each point will be represented by a 1 in one or more 3D matrices originally filled with 0's, as illustrated in Figure 5.1.

Figure 5.1 3D representation of data, the colored cubes represent a value of one at indexes equal to their coordinates, the rest represents the zeros.

Two 3D Matrices

In the case of CT data, this representation conserves the accuracy of the system if the following conditions are met:

The dimensions of the matrices are at least equal to that of the CT image.

The number of filled matrices is the equal to the number of CT cuts.

Any spacing between the cuts must be modeled by an appropriate number of zero matrices, as illustrated in Figure 5.2.

Figure 5.2 If 9 cuts are made with 2mm thickness and 10 mm spacing, then 4 empty matrices have to be inserted between each filled pair.

Correlation

To correlate the two models, the two 3D matrices are overlapped, and a term-comparison is used:

A match is counted when two 1's are in the same grid position.

An error is counted when a 1 and a 0 are in the same grid position.

The total number of matches, and that of error are normalized and used both as indicators of the quality of the fit. Figure 5.3 depicts an example over a single cut.

Figure 5.3 When comparing a green cut with a blue cut, the overlapping brown square are counted as matches, and the rest of the colored squares as errors.

Different combinations can now be attempted to formulate the matching criteria, an example of which is the as follows:

criterion = (match)² - (error)² (5.3)

Rotation-Translation

Originally, each model is centered in its 3D grid, with the two grids being of equal size. A transformation is then applied to one of the models before it is correlated with the other one. Either-one of the models may be used as long as the choice remains consistent and only one model is always moved. Moreover, each grid must incorporate enough clearance to allow the model to be moved without being discarded. Therefore, the use of the CT scans as the moving model would be preferable, because the acquired model may be surrounded by noise, making clearance estimation confusing. Finally, the roll, pitch, and yaw coordinates will be used for rotational movements.

Translation

Translation in the x, y, or z directions corresponds to shifting all the ones according to the grid x, y, or z directions respectively. Figure 5.4 shows two models with a 10-unit translation in the x direction between them.

Figure 5.4 The two models are shifted by 10 units in the x direction.

Pitch and Yaw

The procedure for performing a pitch or yaw rotation is carried out can best be described by Figure 5.5, which shows a filled Grid-Model tilted incrementally with a shearing motion by 1,2...,6 units.

Figure 5.5 A Face view of a filled Grid-Model incrementally pitched.

Yaw and pitch values used the above transformations are related to the actual angles by:

(5.1)

where y is the true yaw or pitch angle, n is the number of cuts, and p is the yaw or pitch value used in the algorithm.

Roll

Rolling is done by transforming to polar coordinates, adding the roll angle, and reverting back. Figure 5.6 shows a front view for a rotation of 25°.

Figure 5.6 A front view of a 25° roll.

Model Implementation

Aspects of the implementation of the Grid-Model under MATLAB® (see Appendix C for a complete listing) will next be addressed.

Storage Problem

Storage requirements can be reduced using the two following properties of the data and the Grid-Model:

The data is binary.

The matrices of the Grid-Model are sparse.

The second property suggests the use of sparse matrices to store the data. However, three-dimensional sparse matrices are not supported in commercial programming environments. An attempt to model 3D sparse matrices was presented by Hemker and Zeeuw (1993), but their work remained as a prototype for PASCAL structures and FORTRAN. In addition, sparse 3D matrices do not make use of the first property, and would thus be at most 12.5% (1 bit out of eight used) efficient in terms of storage.

The proposed implementation represents each 52 cuts in a single floating point 2D matrix (thereafter referred to as Grid-Matrix), where the binary representation of each entry will correspond to the points in the z direction. For example, the entry 132 (0...001000010) in columns 56 of row 45 will mean that there are tow points at [45 56 4] and [45 56 128]. Moreover, the Grid-Matrices can be made sparse to account for the extra clearance that has to be introduced to allow for grid movement. Returning to the example of 5.1.1, the 200 cuts would now need 1024?1024?4 bytes, or 4 Mbytes in full matrix form, and at most 1 Mbytes in sparse matrix form.

Correlation

The matches and error are counted by applying a bit-wise AND and XOR operator respectively on the pairs of matrices. Bit-wise AND is implemented as a mex file under MATLAB, which is an executable C-code file. The correlation procedure was written in MATLAB, but would be more efficient if written directly in C. As an illustration, consider the following Grid-Matrices that hold two 2x2x8 Grid-Models:

(5.3)

(5.4)

The corresponding Grid-Models are shown in Figure 5.7.

Figure 5.7 Comparing A (red) with B (blue): The green dots are matched values and the yellow dots are errors.

The matches M and errors E are calculated as:

(5.5)

(5.6)

The number of ones in M and E can now be calculated, giving 9 matches and 3 errors.

A Final consideration is in the method the number of binary ones are extracted from each entry in M and E. The usual approach would be to test for each bit separately, yielding an efficiency of O(52ⁿ), where n is the number of entries. Clearly, this is not acceptable. The proposed solution is to store a form with its entries equal to the number of ones of its index. The first eight terms of such a matrix would be:

(5.7)

The efficiency of this method is equal to the indexing efficiency of the language, and is therefore maximal. However, the size of the proposed matrix would grow to 2⁵²-1. A good compromise between speed and storage is to divide each entry in four 13-bit numbers, the required matrix would then occupy 65 Kbytes only.

Rotation-Translation

Although the proposed implementation of the Grid-Model is efficient in terms of storage, the implementation is not usable if the efficiency of the translation and rotation are not time efficient.

Translation

A translation in the X or Y directions, would simply be the re-indexing of the corresponding Grid-Matrices. The efficiency for such an operation is clearly optimal

Yaw and Pitch

Tilting the Grid-Model by t is equivalent to dividing the Grid-Model to t parts, then shifting each part by 1 to t/2 depending on how close it is to the center. If t is even, the central part will not move.

The division of the Grid-Model can be achieved by multiplying with an appropriate mask of 1's. For example, isolating the lower fifth part is done by applying bit-wise AND of binary

0000000000000000000000000000000000000000001111111111

to all the entries.

The shift of each entry would now correspond to a translation in the X direction for a pitch angle, and in the Y direction for a yaw angle. Finally, the t parts are recombined using bit-wise OR. The efficiency of the algorithm is linearly proportional to t and the size of the matrix.

Roll

Unfortunately, the roll of the matrix cannot be approximated by some set elementary movements, because of the high distortion it introduces to the image, and therefore no approximation mask can be convoluted with the Grid-Matrix. This would mean transforming each matrix entry to polar coordinates, manipulating, calculating the new Cartesian coordinates, and then interpolating a new position. To save time, all the calculations for the new rotated coordinates are performed off-line and stored in numbered matrices. When a roll operation is invoked on a certain matrix, a reordering of its elements will be carried out according to the appropriate roll matrix. As an illustration, a 6 by 6 roll matrix for 45° is shown:

The interpolation used is the nearest neighbor, which is the fastest interpolation scheme. The distortive implications of this type of interpolation will be discussed, along with the other effects of the transformations in chapter 6.

Search Space

The search space of the matching algorithm represents the different transformations that may be carried out in order to reach the goal: a fit between the two models. The next paragraphs contain a description of the search space using the Grid-Model approach.

Absolute Search Space

A brute force conception of the search space is to start from a zero state, denoted by [0 0 0 0 0 0], where the six zeros correspond to zero values of the roll, pitch, yaw rotations and x, y, z translations, respectively. Each new node will then be represented by a variation of the above parameters, corresponding to the appropriate transformation. An example of a reduced portion of the space is shown in Figure 5.8.

Figure 5.8 A sample of the search space

If the roll, pith, and yaw are constrained to ±45°, and the x, y, z translations to ±45 units, then the possible values correspond to any combination of column entries from the matrix:

such as

[24 -12 41 0 20 13]

The span (size) of such a space is equal to:

Alternative Search Space

A more efficient and intuitive search space can be constructed using the node shown in Figure 5.9, where each of the twelve arrows indicates a certain incremental transformation.

Figure 5.9 The essential node used in the alternative search space.

The new search space still has the same span; however, its new diffused shape makes it easier to "walk" it; i.e., to search for the goal position. A sample of the alternative search space is shown in Figure 5.10. Finally, it should be noted that the new space is a closed graph; i.e., any node (state) can be reached from any other state.

Figure 5.10 A sample of the alternative search space.

Search Mode

Searching for the goal solution is done by walking the above referenced graph, starting from the [0 0 0 0 0 0] state to reach the goal state. Figure 5.11 shows an example of a binary search space with 5 variables; i.e., one that contains a combination of the 32 states of:

Figure 5.11 Representation of a binary 5 variables search space with a few sample points (graph borrowed from Mathematica®)

The most crucial characteristic of a matching algorithm is the way it reaches the goal. To illustrate the concept of the size of a search, consider the goal:

[2 2 -2 1 0 2 -1]

A breadth-first search; i.e., trying to vary all the parameters from lowest absolute value to highest, yields:

[0 0 0 0 0 1] [0 0 0 0 1 0] [0 0 0 1 0 0] [0 0 1 0 0 0] [0 1 0 0 0 0] [1 0 0 0 0 0]

[0 0 0 0 0 -1] [0 0 0 0 -1 0] [0 0 0 -1 0 0] [0 0 -1 0 0 0] [0 -1 0 0 0 0] [-1 0 0 0 0 0]

[1 0 0 0 0 1] [1 0 0 0 1 0] [1 0 0 1 0 0] [1 0 1 0 0 0] [1 1 0 0 0 0]

[1 0 0 0 0 -1] [1 0 0 0 -1 0] [1 0 0 -1 0 0] [1 0 -1 0 0 0] [1 -1 0 0 0 0] ...

where reaching the goal state takes at least 800 trials; therefore, although the breadth-first approach is safest to walk such a space, because the initial conditions are assumed to be 'close' to the goal state, it is still time consuming.

Steepest-Hill

A well-know solution to the goal-reaching problem in artificial intelligence is the steepest-hill climb (Rich and Knight, 1991). As the name implies, a steepest-hill search will always consider the most promising alternative at each node, where the value attributed to each node is the correlation function between the transformed CT model and the acquired one. Ideally, such a climb results in a shortest path (optimal) to the goal. Referring to the example at the beginning of this section, it would take 10 trials instead of 800 to reach the desired match.

Unfortunately, the correlation function does not give such a consistent indication of how close the current state is to the target, as is the case for Figure 5.12, which represents a function with similar behavior. The peaks in Figure 5.12 represent high values of the correlation function.

Figure 5.12 The correlation function exhibits many local maxima near the goal state (graph borrowed from MATLAB®).

There exists many classical solution to the above problem, the most commonly used are the following:

To increase the transformation step, to reach convergence, and then to refine with a smaller step starting from the new state.

To decrease the resolution of the Grid-Model, to reach convergence, and then to refine with a higher resolution starting from the new state.

To use more than one correlation function.

All of the above solutions were attempted, but none showed satisfactory results, especially in the presence of noise.

Volume versus Surface Matching

A proposed method to overcome the problem of poor correlation is by transforming the surfaces to be matched to volumes. The beneficial effect on the correlation function can be visualized by examining Figure 5.13. The best transformation for the blue cylinder to match the red cylinder is to pitch towards the x-axis, or to slide in the directions of the y-axis. At that point, the volume intersection clearly increases. This is not the case for surface intersection, which is not predictable.

Figure 5.13 The volume intersection of the two cylinder yields a better correlation than the surface intersection.

Transforming the CT scans to a volume is clearly a simple task, performed off-line. However, the acquired cuts which contain some noise, will not have a closed shape. Based on the above discussion, the following final procedure is proposed.

The CT scans are transformed into a volume by flooding the inside of the cuts.

The acquired image is kept as a "cloud" of surface points.

The two models are matched using a least square correlation criterion with a large step size.

The step size is lowered and step 3 is repeated until the step equals one, which corresponds to the precision limit of the model.

Perform a constant step match of the acquired model with the original surface CT model.

The use of a varying step size is to accelerate even further the matching, and can also be considered as an extra precaution against local maxima. Moreover, a last step has been added to account for possible a false maximum introduced by volume-to-surface matching. In other words, after the algorithm converges with volume matching, surface matching is performed to validate and possibly refine the macth.

An analysis of the effect of noise on such a matching approach will be addressed in Chapter 6. Figure 5.14 shows the CT model with gaussian noise added to it and Figure 5.15 shows the results of matching with the original CT model.

Figure 5.14 CT model with gaussian noise.

Figure5.15 MATLAB® output of matching a noisy model at [10 6 15 -12 -6 15]. The step size and current state are shown at every instant.

The matching was performed in 19 steps, which is even less than the predicted shortest path of 64 using a non-varying step size. No surface match was necessary in this case, mainly because of the well-behaved nature of the introduced noise.