Since the early days of the internet, from the email to the Web, the client-server architecture has been used for data transfer. However, in a few years, the peer-to-peer architecture has changed our way to share information. At the present time, peer-to-peer communications account for between 40% to 80% of the internet backbone traffic. The peer-to-peer architecture deployment has followed a rare model in the history of the internet. Whereas, most of the time, even the smallest improvement requires years of academic evaluations and experimentations, before a real large scale deployment, the peer-to-peer applications were deployed at large scale based on an empirical process. The understanding of these new applications is incipient and starts to be a very active research area.
This project focuses on the understanding and improvement of peer-to-peer content delivery. Indeed, we believe that the value of peer-to-peer comes from its ability to distribute contents to a large number of peers without any specific infrastructure, and within a delay that is logarithmic with the number of peers.
We have performed [LUM_IMC06] an experimental evaluation of BitTorrent, which is the only one popular peer-to-peer protocol to focus on efficient content delivery. We evaluated its two core mechanisms: its piece selection mechanism called rarest first, and its peer selection algorithm called choke algorithm. We show that the rarest first algorithm guarantees a diversity of the pieces among peers close to the ideal one. In particular, on our experiments, a replacement of the rarest first algorithm with a source or network coding solution cannot be justified. We also show that the choke algorithm in its latest version fosters reciprocation and is robust to free riders. In particular, the choke algorithm is fair and its replacement with a bit level tit-for-tat solution is not appropriate.
Focusing on the properties of the choke algorithm [LLKZ_07], we showed that it enables clustering of similar-bandwidth peers, ensures effective sharing incentives by rewarding peers who contribute with high download rates, and achieves high upload utilization for the majority of the download duration. We also examined the properties of the new choke algorithm in seed state and the impact of initial seed capacity on the overall BitTorrent system performance. In particular, we showed that an underprovisioned initial seed does not enable clustering of peers and does not guarantee effective sharing incentives. However, we showed that even in such a case, the choke algorithm guarantees an efficient utilization of the available resources by enforcing fast peers to help other peers with their download. Based on our observations, we offered guidelines for content providers regarding seed provisioning, and discussed a tracker protocol extension that addresses an identified limitation of the protocol.
We are also exploring the BitTorrent overlay structure [ALC_07] and its impact on BitTorrent traffic locality [LBLD_09]. Indeed, keeping BitTorrent Traffic local enables to significantly reduce the inter-ISP traffic without impacting (and often even improving) the peers download completion time. Also, we have shown that at the scale of the Internet, a high BitTorrent locality has the potential to reduce the inter-ISP traffic by 40%.
I have instrumented the BitTorrent mainline client in its version 4.0.2 released in May 2005. I also commented parts of the code (in particular the parts on the peer and piece selection). However, these comments were intended for my own use and may not be clear out of context. Moreover, they represent my understanding of the code, which may be wrong.
You can freely download and use my instrumentation of the mailine client, as long as you acknowledge its source. This is the instrumented client used for the experiments performed in the following publications: [HLLB_09, LBLD_09, MLLK_08, ALC_07, LLKZ_07, LUM_IMC06, LUM_05]. You can find the instrumented client here: Instrumented_BT_mainline_4.0.2_V2.zip. In order to run the client you need to:
A file logfileYYYYMMDDhhmmss.log will be generated and will contain the trace of the experiment. The header of this file will be a legend that explains the format of the file. Thus, each log file should be self contained.
If you have any comments or questions you can send me an email at: arnaud.legout@inria.fr