Title: Text Clustering to Help Knowledge Acquisition from Documents
Author: Stéphane Lapalut
Reference: Advances In knowledge Acquisition, 9th European Knowledge Acquisition Workshop, Nothingam, UK, 1996, LNCS 1076, N. Shadbolt, K. O'Hara and G. Scheriber (Eds.), pp. 115-130, Springer-Verlag, 1996
Abstract: At the earlier stage of the knowledge acquisition process, interviews of experts produce a large amount of rich but ill-structured texts. Knowledge engineers need some tool to help them in the exploitation of all these texts. We propose the use of a statistical method, the top-down hierarchical classification and a new interpretation of its results. The initial statistical analysis proposed by M. Reinert \cite{reinert79,reinert92} gives two kinds of results: first a segmentation of texts that reflects their ``semantic contexts'' that we use to raise structures of texts, and second, classes of significant terms belonging to these contexts, which can be related to the experts or to these specialities. In this paper, we describe the method, its empirical validity and a comparison with similar approaches, its uses with examples and results. We conclude with some research directions to extend the exploitation of the analysis results.
Keywords: knowledge acquisition, text clustering, text structuring, top-down hierarchical classification, statistical method.