Content Distribution and Storage
This is an outline of my thesis. You can download the PDF.
The exponential growth of Internet traffic, reported to be as high as 41% in peak throughput in 2012 alone, continues to pose challenges to all interested parties. Transfer rates over fiber are approaching physical limits. Therefore other ways to accommodate the growth are needed.
The basic protocols of the Internet are point-to-point in nature. However, the traffic is largely broadcasting, with projections stating that as much as 80-90% of it will be video by 2016. This discrepancy leads to an inefficiency, where multiple copies of essentially the same messages travel in parallel through the same links. In this thesis we study multiple approaches to mitigating this inefficiency.
The contributions are organized by layers and phases of the network life. We look into optimal cache provisioning during network design. Then, we look into putting devices to sleep mode, using caching and cooperation with Content Distribution Networks, when managing an existing network. In the application layer, we look into maintaining balanced trees for media broadcasting. Finally, we analyze data survivability in a distributed backup system, a shift from dissemination to storage application. In the appendix, we consider a graph problem motivated by design of satellite networks.
Chapter 1: Energy Efficient Cache Provisioning
First, we look into a long-term solution for the network layer. We propose to augment all IP routers with transparent caches. For any sufficiently big network, this is bound to take considerable time to roll out. Therefore our proposition concentrates on the network provisioning phase, when an operator is planning deployments for midterm future. We study the problem of reducing power consumption in an Internet Service Provider (ISP) network by designing the content distribution infrastructure managed by the operator. We propose an algorithm to optimally decide where to cache the content inside the ISP network. We evaluate our solution over two case studies driven by operators feedback. Results show that the energy-efficient design of the content infrastructure brings substantial savings, both in terms of energy and in terms of bandwidth required at the peering point of the operator. Moreover, we study the impact of the content characteristics and the power consumption models. Finally, we derive some insights for the design of future energy-aware networks.
The results presented in this chapter have been accepted to [GlobeCom2013].
Chapter 2: Energy Efficient Routing
For more immediate results, it may be sufficient to put caches only into core routers. Their number is usually quite small, even in big ISPs. As we operate on already deployed networks, energy can be saved only by putting devices to sleep mode. In this model we take into account an additional aspect: we study the impact of using in-network caches and content delivery network (CDN) cooperation on an energy-efficient routing. We formulate this problem as Energy Efficient Content Distribution and propose an integer linear program (ILP) and an efficient heuristic algorithm to solve it. The objective is to find a feasible routing, so that the total energy consumption of the network is minimized subject to satisfying all the demands and link capacity. We exhibit the range of parameters (size of caches, popularity of content, demand intensity, etc.) for which caches are useful. Experimental results show that by placing a cache on each backbone router to store the most popular content, along with well choosing the best content provider server for each demand to a CDN, we can save about 20% of power in the backbone, while 16% can be gained solely thanks to the use of caches.
Chapter 3: Maintaining Balanced Trees For Structured Distributed Streaming Systems
Then, we move to the application layer, still keeping to content distribution. We propose and analyze a simple localized algorithm to balance a tree. The motivation comes from live distributed streaming systems in which a source diffuses a content to peers via a tree, a node forwarding the data to its children. Such systems are subject to a high churn, peers frequently joining and leaving the system. It is thus crucial to be able to repair the diffusion tree to allow an efficient data distribution. In particular, due to bandwidth limitations, an efficient diffusion tree must ensure that node degrees are bounded. Moreover, to minimize the delay of the streaming, the depth of the diffusion tree must also be controlled. We propose here a simple distributed repair algorithm in which each node carries out local operations based on its degree and on the subtree sizes of its children. In a synchronous setting, we first prove that starting from any $n$-node tree our process converges to a balanced tree in $O(n^2)$ turns. We then describe a more restrictive model, adding a small extra information to each node, under which we adopt our algorithm to converge in $\Theta(n \log n)$ turns. We then exhibit by simulation that the convergence is much faster (logarithmic number of turns in average) for a random tree.
The results presented in this chapter have been accepted to [Sirocco2013].
Chapter 4: Analysis of the Repair Time in Distributed Storage Systems
Leaving the topic of content distribution, we move to a simple application over the Internet. Particularly, we look into backup storage. Distributed or peer-to-peer storage systems introduce redundancy to preserve the data in case of peer failures or departures. To ensure long-term fault tolerance, the storage system must have a self-repair service that continuously reconstructs lost fragments of redundancy. The speed of this reconstruction process is crucial for the data survival. This speed is mainly determined by available bandwidth, a critical resource of such systems. We propose a new analytical framework that takes into account the correlation of concurrent repairs when estimating the repair time and the probability of data loss. Mainly, we intro- duce queuing models in which reconstructions are served by peers at a rate that depends on the available bandwidth. The models and schemes proposed are validated by mathematical analysis, extensive set of simulations, and experimentation using the Grid'5000 test-bed platform.
The results presented in this chapter have been accepted to [Globe2013].
Appendix: Weighted Improper Colouring
Finally, we move to the link layer, optimizing the deployment of satellite networks. We study a colouring problem motivated by a practical frequency assignment problem and, up to our best knowledge, new. In wireless networks, a node interferes with other nodes, the level of interference depending on numerous parameters: distance between the nodes, geographical topography, obstacles, etc. We model this with a weighted graph $(G,w)$ where the weight function $w$ on the edges of $G$ represents the noise (interference) between the two end-vertices. The total interference in a node is then the sum of all the noises of the nodes emitting on the same frequency. A weighted $t$-improper $k$-colouring of $(G,w)$ is a $k$-colouring of the nodes of $G$ (assignment of $k$ frequencies) such that the interference at each node does not exceed the threshold $t$. We consider here the Weighted Improper Colouring problem which consists in determining the weighted $t$-improper chromatic number defined as the minimum integer $k$ such that $(G,w)$ admits a weighted $t$-improper $k$-colouring. We also consider the dual problem, denoted the Threshold Improper Colouring problem, where, given a number $k$ of colours, we want to determine the minimum real $t$ such that $(G,w)$ admits a weighted $t$-improper $k$-colouring. We first present general upper bounds for both problems; in particular we show a generalisation of Lovász's Theorem for the weighted $t$-improper chromatic number. We then show how to transform an instance of the Threshold Improper Colouring problem into another equivalent one where the weights are either one or $M$, for a sufficiently large $M$. Motivated by the original application, we then study a special interference model on various grids (square, triangular, hexagonal) where a node produces a noise of intensity 1 for its neighbours and a noise of intensity 1/2 for the nodes at distance two. We derive the weighted $t$-improper chromatic number for all values of $t$. Finally, we model the problem using integer linear programming, propose and test heuristic and exact Branch-and-Bound algorithms on random cell-like graphs, namely the Poisson-Voronoi tessellations.