Title: Multiple text interview clustering with the help of a hierarchical classification algorithm
Author: Stéphane Lapalut
Reference: (to appear in UCIS'96, Poitier, France, septembre 1996)
article (compressed postscript file (87071 bytes)

Abstract: Besides the factual knowledge written in books, lot of knowledge can be obtained by questioning experts. This produces a large amount of rich but ill-structured texts. To deal with them, we propose the use of a statistical method, the top-down hierarchical classification and a new interpretation of its results. The analysis gives two kinds of results: first a segmentation of texts that reflects their "semantic contexts", which we use to raise structures of texts, and second, classes of significant terms related to these contexts that help to interpret classes meaning. We briefly describe the method with its formal background and then we applied it on an example borrowed from the knowledge acquisition field, the processing of a corpus of text interview from several experts. We give the main conclusion from the analysis. We conclude with the usefulness and developments of the method.

Keywords: knowledge acquisition, interview exploitation, text clustering, text structuring, top-down hierarchical classification, statistical method.