





| |
Collaborators:
Cián Shaffrey (University of Cambridge, UK), Nick Kingsbury (University
of Cambridge, UK).
Key words:
database retrieval, segmentation, indexing, psychovisual evaluation.
Resume:
To advance a science, methodologically well-defined evaluation
techniques are necessary. In the case of database retrieval systems,
they have not always been used, and there seems to be some confusion
about what such techniques should be. The goal of this work was to
analyse the problem of evaluation abstractly and then apply the analysis
to two completely different databases: one, kindly provided by the IGN
(French National Geographic Institute), consisting of aerial images of
the Ile-de-France region around Paris; the second, kindly provided by
BAL (Bridgeman Art Library) in the UK, consisting of fine art images.
The first step was thus a methodological analysis of the problem of
evaluation in scenarios with differing amounts of knowledge about the
image semantics. The main conclusions were:
-
In situations in which the image semantics S is well defined, the
human interpretation h is available, and in which image processing
techniques are close to reproducing the human interpretation,
retrieval is not the issue. It is easier and better defined to check
that the image processing reproduces the correct semantics, i.e.,
that the diagram below nearly commutes. This is the IGN case,
discussed further below.
-
In situations in which the image semantics S is not well defined,
and consequently the human interpretation h is not available, the
only choice is to use human subjects to compare the outputs of the
image processing arrow for different methods. Note again that the
emphasis is on the outputs "making sense", that is, on the
semantics, and not on retrieval as such. This is the BAL case,
discussed further below.
-
Query-by-example is ill-defined as a retrieval method, in the sense
that the expected output cannot be known. In conjunction with the
use of "relevance classes" for evaluation, things get even
worse. The success of many of the evaluations in the literature says
more about the databases used than the retrieval methods themselves.
-
Semantics is inherently linguistic, and must be defined as such.
Reproducing the human interpretation h means good retrieval. Not
reproducing it means bad retrieval.
|
 |
We are thus lead to the use of two very different methods for the two
databases.
For the IGN database, the semantics is well defined, consisting of
conjunctions of statements such as "Region R contains forest".
The human interpretation exists, in the form of land use maps compiled
from existing cartography and field studies, and kindly provided to us
by IAURIF, the Urban Planning Institute for the Ile-de-France region. In
addition, segmentation algorithms can get close to the correct results.
As promised, we are thus in the first situation listed above. Work
continues on this database with the lengthy task of registering the
original data with the land use maps. Once this is done, relatively
simple metrics can be used to measure how close different segmentation
results are to correct, thus measuring their usefulness for retrieval. |
 |
For the BAL database, it is another story. The image semantics is
extremely complicated, in fact practically unbounded, and thus
impossible to define. In addition, human interpretation is very varied
and difficult to characterize, and image processing algorithms have no
hope of actually reproducing whatever sub-semantics can be defined. We
are thus, as promised, in the second situation listed above.
Consequently, we used human subjects and psychovisual experiments to
evaluate various segmentation algorithms.
The results of these experiments show that there is a degree of
consensus among the subjects about which segmentations are better than
others, and also about which segmentation methods are better than
others. One indication is that even thought the users ranked the
segmentation methods pairwise, the results are consistent with a single
total order on the schemes (the graph below has no cycles). The results
indicate that there may be obtainable criteria for what constitutes a
good segmentation from the point of view of human subjects. |
 |
 |
One of the BAL images. |
Pairwise ordering of segmentation schemes. |
Publications:
- “Psychovisual Evaluation of Image Segmentation Algorithms”,
Cián W. Shaffrey, Ian H. Jermyn and Nick G. Kingsbury. To appear in Proceedings
of Advanced Concepts for Intelligent Visual Systems (ACIVS),
Ghent, Belgium, September 2002. (PDF)
- "Evaluation Methodologies for Image Retrieval Systems",
Ian H. Jermyn, Cián W. Shaffrey and Nick G. Kingsbury. To appear in
Proceedings of Advanced Concepts for Intelligent Visual Systems (ACIVS),
Ghent, Belgium, September 2002. (PDF)
|
|