« Summarizing
a data stream by means of knowledge extraction »
Keywords
: Data Mining, KDD, Data Streams, Clustering, Pattern Extraction.
Team:
AxIS is a team of
INRIA Sophia Antipolis – Méditerranée, which works on
the improvement of Information Systems (IS) by means of usage
analysis. Data mining is one of the methods used by AxIS to
understand the users of an IS. As the data are becoming dynamic and
complex, AxIS is providing new tools for data mining intended to
enhance the performances of existing technologies before validating
the proposed methods into various fields of application.
Scientific
context:
Recently,
data mining had to adapt to a new kind of data, “the data streams”,
which is extremely constrained. In a data stream, new data are
produced in a potentially infinite stream, at a very high rate and it
is not allowed to perform blocking operations.
It
is not possible to record the data of a stream and data mining methods
have to face two main challenges:
How to extract knowledge from such an environment?
How to manage the history of the extracted knowledge?
The storage space is limited and it forces to choose what to store and how to store it. Monitoring systems and security or just behavior analysis methods rely on the data streams management systems and the data mining methods that will be proposed for these streams.
General
objective of the thesis:
The
goal of this thesis is to propose summarizing methods for streaming
data. These methods will be based on data mining techniques such as
clustering or frequent pattern extraction. There is a strong link
between knowledge extraction and the management of the history of the
extracted knowledge. This history management will be the center of
this study and a guideline for the summarizing methods.
Expected results:
New data mining algorithms for mining data streams (clustering, frequent pattern extraction).
A new model for the management of knowledge history.
The new algorithms will be integrated in the FOCUS platform, which groups a set of tools designed by our team and will allow researchers and interested industry to better analyze and understand their usage data.
What is required :
A M.Sc. in Computer Science
Strong knowledge of algorithms and complexity
Good skills for programming
Good proficiency in English
Supervisors: Florent Masseglia (INRIA) and Yves Lechevallier (INRIA).
Contact: Florent.Masseglia@sophia.inria.fr
This thesis is part of the MIDAS project of ANR (French National Research Agency). The student will work on the French Riviera, in the INRIA center of Sophia-Antipolis.
Net salary: 1537 €/month the first two years and 1619 €/month the 3rd year.
Send a detailed CV (including results of the M.Sc), a letter presenting your motivation for this Ph.D and at least two reference letters.
Useful links:
[1] Marascu, A. and Masseglia, F. 2006. Mining sequential patterns from data streams: a centroid approach. J. Intell. Inf. Syst. 27, 3 (Nov. 2006), 291-307.
[2] C. Giannella, Jiawei Han, Jian Pei, Xifeng Yan, and P. S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In Proceedings of the NSF Workshop on Next Generation Data Mining, November 2002.