Florent MASSEGLIA

INRIA Sophia-Antipolis

Research themes

My research themes are mainly related to Knowledge Discovery (KDD) from Data in general and particularly to Data Mining (which is a step of the whole KDD process).

For more information on KDD or data mining, you may have a look to the following sites:

  • KDNuggets (maintained by G. Shapiro).
  • SIGKDD (a main group and conference in that domain).
  • Wikipedia has a unique article about both KDD and data mining.

More precisely, my main research subjects in this ocean are the following:

 

Mining Data Streams

This is a recent subject, motivated by huge data production rates that are available today. This is typically a dream for data miners. We want either to extract relevant knowledge from static data or extract OK knowledge from huge data or data streams. Of course, we all would like to extract relevant knowledge from data streams. Unfortunately, these streams are such that available machines are not able to run existing algorithms for static data on them. We thus have to use approximation (you may ignore some data or compute only a part of your result, for instance).

(image is from http://datalib.ed.ac.uk/GRAPHICS/blue_data.gif)
     
 

Clustering

Clustering is a subject I investigated late after my Ph.D since I was involved in pattern extraction during my Ph.D studies. Then, at Inria, I tried to mix pattern extraction with clustering and it gave pretty good results. That door led me to clustering and there are so many interesting problems in this room!

 

(image is from http://img83.imageshack.us/img83/700/penguenvo4.jpg)
     
 

Data Mining for Security

This is a subject that involves monitoring ofstreaming data (past data might also be analyzed for audit, but security is a matter that involves real time) and also classification of incoming data according to specific characteristics. Therefore, this field involves data streams, clustering and classification. However, automatic detection systems (of intrusions for instance) are either a little blind or paranoid and research still has to improve in that field. I was responsible of two projects on this subject in 2007 and 2009: ARC SéSur and Color MUTAN.

 

(image is from http://www.network24.co.uk/security/)
     
 

Pattern Mining

That is a classical problem of data mining. Do you know the example of this association between Beer, Cookies and Diapers? ("Although this is only an example that professors use to illustrate the concept to students, the explanation of this imaginary phenomenon might be that fathers that are sent out to buy diapers often buy a beer as well, as a reward" - Wikipedia).

 

(image is from http://www.bucchow.com/2008/01/06/footprints-in-the-sand/)
     
 

Web Usage Mining

Well... You take one half of data mining, one half of Web trace and one half of imagination (yes I know, that's three...). You mix everything and you obtain (hopefully) nice knowledge about the users behavior on a Web site. When data mining was a young research field, most research teams had only (or easily) this kind of data to perform their experiments, then...

(image is from http://www.astrosurf.com/luxorion/Physique/web3d.jpg)
     

Webmaster:Florent Masseglia