Since October 2008, I am a researcher (chargé de recherche) at CNRS in the joint project COATI , I3S (CNRS/UNSA) and INRIA Sophia Antipolis. In particular, I participated to the ANR SPREADS on the analysis of a peer-to-peer storage system and inside the ANR DIMAGREEN on energy efficiency.
Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data storage at low cost. To achieve high durability, such P2P systems encode the user data in a set of redundant fragments and distribute them among the peers (see Figure 1). This redundancy has to be constantly monitored and maintained by a reconstruction process due to the occurrence of peer failures or departures. Indeed, the system performance depends on many parameters that need to be well tuned, such as the factor of redundancy, the code scheme used, the frequency of data repair and the size of a data block. These parameters impact the amount of resources (e.g., bandwidth, storage space, etc.) needed to obtain a certain reliability (e.g., probability to lose data). This thesis aims at providing tools to analyse and predict the performance of large scale data storage systems. Thus, different techniques are studied and applied. They range from formal analysis (using Markov chains and fluid models), simulations, and experimentation (using Grid5000 platform). By comparing to simulations, we show that Markov chain models give correct approximations of the system average behavior, but fails to capture its variations over time. Our main contribution is a new stochastic model based on fluid approximation, that capture the deviations around the mean behavior. We also use queuing models to estimate the repair time under limited bandwidth. We additionally study different ways to distribute fragments (placement strategies), and study a hybrid code that reduces the repair bandwidth consumption. Lastly, we conducted real experimentation on the Grid5000 platform to validate our results.
The objectives of the project are to introduce and analyze energy-aware network designs and managements in order to increase the life-span of telecommunication hardware and to reduce the energy consumption together with the electricity bill. The project is decomposed into three main tasks:
Task 1: Performance evaluation and Measurements. In order to evaluate the energy utilization of a whole network, the power consumption of the different elements of the network, such as a router, has to be studied. This study will also be the basis to propose specific realistic cost functions for the network devices.
Task 2: Network Design. The power consumption of a network is strongly influenced by its topology. We plan to study several topologies and their influence on the power consumption. Also, where to place the servers and what is the optimal number of servers for a domain are interesting questions. For the design of the networks, we plan to use some of the results on the router power consumption found during Task 1.
Task 3: Network Management. This last task is of major importance when the network has already been designed and when we can play mostly only on the routing to save energy. The goal here will be to find energy-aware routing policies that take into account the new characteristics of traffic.
From January 2007 to February 2008, I did a postdoc in the Intel Research Laboratories in Berkeley (US). I worked on methods for providing security to end hosts in typical enterprise environments. The research is focused on making end host security customizable and adaptable by exploring design profiles based on the end host's communication traffic and using these for anomaly detection (see the poster for Intel open house). The first phase consisted of an analysis of various end-host behaviours using a large-scale trace collection of more than 300 corporate machines. The study led a to two publications [GCI+08], [ACG+07] and to two technical reports that are under submission [GCT+07], [GCI+07]. In the current second phase of the project, we are working on new detection algorithms that would be able to face the new threats for networks, e.g. botnets [GCT+08], [BGT+08]. This study just led to a submitted patent disclosure.
I obtained a PhD in Computer Science, Telecommunications and Electronics from the University Paris 6: Networks, algorithmics and analytic combinatorics of very large data sets, defense November 29th 2006. My advisors were Philippe Flajolet (INRIA) and Michèle Soria (Paris 6). The jury was also composed by Anne Doucet (Paris 6), Fabrice Guillemin (France Telecom), Christian Lavault (Paris 13) and Jean-Claude Bermond (CNRS).
During my PhD, I worked in the ALGO Project of INRIA Rocquencourt on probabilistic algorithms for estimating the number of distinct elements (or cardinality) of very large data sets. This project is sponsored by the French National Research Agency. I introduced new families of estimates based on order statistics in [Gir05] and [Gir08]. These algorithms attain a precision of C/sqrt(M) using M units of memory to compare with the linear memory of exact algorithms. These estimates were extensively validated with an optimized version of the algorithm (project page). In a joint work with E. Fusy, we adapt these algorithms to monitor data streams [FuGi07]. Applications for such algorithms are numerous in networking area, for example to monitor the traffic of a router or to detect Denial of Service attacks (see [Gir05] and [FuGi07]), as well as in bioinformatics, for example to detect coding regions in DNA (see [Gir06]).
I am also collaborating with the MASCOTTE Project on fault tolerant networks. The problem was posed by Alcatel Space Industry in order to decrease the launch cost of their satellites. We proposed solutions for small networks in [ABG+06,BGP07]. More generally, these networks correspond to a generalization of selectors. We proved tight linear bounds for large networks and studied a general property of graphs, namely the expansion for subsets of bounded size, in [AGHP08].
In 2002, during an internship of 6 months in the IP group of the Sprint Laboratories of Research in Burlingame California, I worked on IP protection for backbone networks. We proposed a new algorithm based on Tabu Search Meta-heuristic, to optimally map the IP topology on a fiber infrastructure. The optimal mapping maximizes the robustness of a network by minimizing fiber sharing, while simultaneously maintaining ISP delay requirements. In addition the algorithm takes into consideration constraints that are faced by backbone administrators, such as a shortage of wavelengths or priorities among links (see [GNTD03]). This study led to two patents, of which I am coauthor, for Sprint.
In 2003, during my master, I worked on bioinformatics issues with Mireille Régnier. They are four main steps to sequence a genome: duplicate the DNA, cut it into small pieces, read each of them and reassemble them. One of the most difficult problem during the fourth step is to distinguish between pieces of the numerous nearly identical repeats. We proposed a new algorithm to separate repeats using an analysis with generating functions. The same kind of techniques may be used to find single nucleotide polymorphisms (SNPs) that are crucial, for example, to understand genetic diseases ([Gir03], in French).