« Summarizing a data stream by means of knowledge extraction »
Keywords : Data Mining, KDD, Data Streams, Clustering, Pattern Extraction.

Team:
AxIS is a team of INRIA Sophia Antipolis – Méditerranée, which works on the improvement of Information Systems (IS) by means of usage analysis. Data mining is one of the methods used by AxIS to understand the users of an IS. As the data are becoming dynamic and complex, AxIS is providing new tools for data mining intended to enhance the performances of existing technologies before validating the proposed methods into various fields of application.

Scientific context:
Recently, data mining had to adapt to a new kind of data, “the data streams”, which is extremely constrained. In a data stream, new data are produced in a potentially infinite stream, at a very high rate and it is not allowed to perform blocking operations.
It is not possible to record the data of a stream and data mining methods have to face two main challenges:

The storage space is limited and it forces to choose what to store and how to store it. Monitoring systems and security or just behavior analysis methods rely on the data streams management systems and the data mining methods that will be proposed for these streams.

General objective of the thesis:
The goal of this thesis is to propose summarizing methods for streaming data. These methods will be based on data mining techniques such as clustering or frequent pattern extraction. There is a strong link between knowledge extraction and the management of the history of the extracted knowledge. This history management will be the center of this study and a guideline for the summarizing methods.

Expected results:

The new algorithms will be integrated in the FOCUS platform, which groups a set of tools designed by our team and will allow researchers and interested industry to better analyze and understand their usage data.

What is required :

Supervisors: Florent Masseglia (INRIA) and Yves Lechevallier (INRIA).

Contact: Florent.Masseglia@sophia.inria.fr

This thesis is part of the MIDAS project of ANR (French National Research Agency). The student will work on the French Riviera, in the INRIA center of Sophia-Antipolis.

Net salary: 1537 €/month the first two years and 1619 €/month the 3rd year.

Send a detailed CV (including results of the M.Sc), a letter presenting your motivation for this Ph.D and at least two reference letters.

Useful links:

[1] Marascu, A. and Masseglia, F. 2006. Mining sequential patterns from data streams: a centroid approach. J. Intell. Inf. Syst. 27, 3 (Nov. 2006), 291-307.

[2] C. Giannella, Jiawei Han, Jian Pei, Xifeng Yan, and P. S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In Proceedings of the NSF Workshop on Next Generation Data Mining, November 2002.

AxIS team Web site