Processing math: 100%

Con­tent Dis­tri­b­u­tion and Stor­age

This is an out­line of my the­sis. You can down­load the PDF.

In­tro­duc­tion

Multiple clients

The ex­po­nen­tial growth of In­ter­net traf­fic, re­ported to be as high as 41% in peak through­put in 2012 alone, con­tin­ues to pose chal­lenges to all in­ter­ested par­ties. Trans­fer rates over fiber are ap­proach­ing phys­i­cal lim­its. There­fore other ways to ac­com­mo­date the growth are needed.

The basic pro­to­cols of the In­ter­net are point-to-point in na­ture. How­ever, the traf­fic is largely broad­cast­ing, with pro­jec­tions stat­ing that as much as 80-90% of it will be video by 2016. This dis­crep­ancy leads to an in­ef­fi­ciency, where mul­ti­ple copies of es­sen­tially the same mes­sages travel in par­al­lel through the same links. In this the­sis we study mul­ti­ple ap­proaches to mit­i­gat­ing this in­ef­fi­ciency.

The con­tri­bu­tions are or­ga­nized by lay­ers and phases of the net­work life. We look into op­ti­mal cache pro­vi­sion­ing dur­ing net­work de­sign. Then, we look into putting de­vices to sleep mode, using caching and co­op­er­a­tion with Con­tent Dis­tri­b­u­tion Net­works, when man­ag­ing an ex­ist­ing net­work. In the ap­pli­ca­tion layer, we look into main­tain­ing bal­anced trees for media broad­cast­ing. Fi­nally, we an­a­lyze data sur­viv­abil­ity in a dis­trib­uted backup sys­tem, a shift from dis­sem­i­na­tion to stor­age ap­pli­ca­tion. In the ap­pen­dix, we con­sider a graph prob­lem mo­ti­vated by de­sign of satel­lite net­works.

Chap­ter 1: En­ergy Ef­fi­cient Cache Pro­vi­sion­ing

First, we look into a long-term so­lu­tion for the net­work layer. We pro­pose to aug­ment all IP routers with trans­par­ent caches. For any suf­fi­ciently big net­work, this is bound to take con­sid­er­able time to roll out. There­fore our propo­si­tion con­cen­trates on the net­work pro­vi­sion­ing phase, when an op­er­a­tor is plan­ning de­ploy­ments for midterm fu­ture. We study the prob­lem of re­duc­ing power con­sump­tion in an In­ter­net Ser­vice Provider (ISP) net­work by de­sign­ing the con­tent dis­tri­b­u­tion in­fra­struc­ture man­aged by the op­er­a­tor. We pro­pose an al­go­rithm to op­ti­mally de­cide where to cache the con­tent in­side the ISP net­work. We eval­u­ate our so­lu­tion over two case stud­ies dri­ven by op­er­a­tors feed­back. Re­sults show that the en­ergy-ef­fi­cient de­sign of the con­tent in­fra­struc­ture brings sub­stan­tial sav­ings, both in terms of en­ergy and in terms of band­width re­quired at the peer­ing point of the op­er­a­tor. More­over, we study the im­pact of the con­tent char­ac­ter­is­tics and the power con­sump­tion mod­els. Fi­nally, we de­rive some in­sights for the de­sign of fu­ture en­ergy-aware net­works.

The re­sults pre­sented in this chap­ter have been ac­cepted to [Globe­Com2013].

Chap­ter 2: En­ergy Ef­fi­cient Rout­ing

For more im­me­di­ate re­sults, it may be suf­fi­cient to put caches only into core routers. Their num­ber is usu­ally quite small, even in big ISPs. As we op­er­ate on al­ready de­ployed net­works, en­ergy can be saved only by putting de­vices to sleep mode. In this model we take into ac­count an ad­di­tional as­pect: we study the im­pact of using in-net­work caches and con­tent de­liv­ery net­work (CDN) co­op­er­a­tion on an en­ergy-ef­fi­cient rout­ing. We for­mu­late this prob­lem as En­ergy Ef­fi­cient Con­tent Dis­tri­b­u­tion and pro­pose an in­te­ger lin­ear pro­gram (ILP) and an ef­fi­cient heuris­tic al­go­rithm to solve it. The ob­jec­tive is to find a fea­si­ble rout­ing, so that the total en­ergy con­sump­tion of the net­work is min­i­mized sub­ject to sat­is­fy­ing all the de­mands and link ca­pac­ity. We ex­hibit the range of pa­ra­me­ters (size of caches, pop­u­lar­ity of con­tent, de­mand in­ten­sity, etc.) for which caches are use­ful. Ex­per­i­men­tal re­sults show that by plac­ing a cache on each back­bone router to store the most pop­u­lar con­tent, along with well choos­ing the best con­tent provider server for each de­mand to a CDN, we can save about 20% of power in the back­bone, while 16% can be gained solely thanks to the use of caches.

The re­sults pre­sented in this chap­ter ap­peared in [IC­C2013] and have been sub­mit­ted to [Com­Com].

Chap­ter 3: Main­tain­ing Bal­anced Trees For Struc­tured Dis­trib­uted Stream­ing Sys­tems

Then, we move to the ap­pli­ca­tion layer, still keep­ing to con­tent dis­tri­b­u­tion. We pro­pose and an­a­lyze a sim­ple lo­cal­ized al­go­rithm to bal­ance a tree. The mo­ti­va­tion comes from live dis­trib­uted stream­ing sys­tems in which a source dif­fuses a con­tent to peers via a tree, a node for­ward­ing the data to its chil­dren. Such sys­tems are sub­ject to a high churn, peers fre­quently join­ing and leav­ing the sys­tem. It is thus cru­cial to be able to re­pair the dif­fu­sion tree to allow an ef­fi­cient data dis­tri­b­u­tion. In par­tic­u­lar, due to band­width lim­i­ta­tions, an ef­fi­cient dif­fu­sion tree must en­sure that node de­grees are bounded. More­over, to min­i­mize the delay of the stream­ing, the depth of the dif­fu­sion tree must also be con­trolled. We pro­pose here a sim­ple dis­trib­uted re­pair al­go­rithm in which each node car­ries out local op­er­a­tions based on its de­gree and on the sub­tree sizes of its chil­dren. In a syn­chro­nous set­ting, we first prove that start­ing from any n-node tree our process con­verges to a bal­anced tree in O(n2) turns. We then de­scribe a more re­stric­tive model, adding a small extra in­for­ma­tion to each node, under which we adopt our al­go­rithm to con­verge in Θ(nlogn) turns. We then ex­hibit by sim­u­la­tion that the con­ver­gence is much faster (log­a­rith­mic num­ber of turns in av­er­age) for a ran­dom tree.

The re­sults pre­sented in this chap­ter have been ac­cepted to [Siroc­co2013].

Chap­ter 4: Analy­sis of the Re­pair Time in Dis­trib­uted Stor­age Sys­tems

Leav­ing the topic of con­tent dis­tri­b­u­tion, we move to a sim­ple ap­pli­ca­tion over the In­ter­net. Par­tic­u­larly, we look into backup stor­age. Dis­trib­uted or peer-to-peer stor­age sys­tems in­tro­duce re­dun­dancy to pre­serve the data in case of peer fail­ures or de­par­tures. To en­sure long-term fault tol­er­ance, the stor­age sys­tem must have a self-re­pair ser­vice that con­tin­u­ously re­con­structs lost frag­ments of re­dun­dancy. The speed of this re­con­struc­tion process is cru­cial for the data sur­vival. This speed is mainly de­ter­mined by avail­able band­width, a crit­i­cal re­source of such sys­tems. We pro­pose a new an­a­lyt­i­cal frame­work that takes into ac­count the cor­re­la­tion of con­cur­rent re­pairs when es­ti­mat­ing the re­pair time and the prob­a­bil­ity of data loss. Mainly, we in­tro- duce queu­ing mod­els in which re­con­struc­tions are served by peers at a rate that de­pends on the avail­able band­width. The mod­els and schemes pro­posed are val­i­dated by math­e­mat­i­cal analy­sis, ex­ten­sive set of sim­u­la­tions, and ex­per­i­men­ta­tion using the Grid'5000 test-bed plat­form.

The re­sults pre­sented in this chap­ter have been ac­cepted to [Globe2013].

Ap­pen­dix: Weighted Im­proper Colour­ing

Fi­nally, we move to the link layer, op­ti­miz­ing the de­ploy­ment of satel­lite net­works. We study a colour­ing prob­lem mo­ti­vated by a prac­ti­cal fre­quency as­sign­ment prob­lem and, up to our best knowl­edge, new. In wire­less net­works, a node in­ter­feres with other nodes, the level of in­ter­fer­ence de­pend­ing on nu­mer­ous pa­ra­me­ters: dis­tance be­tween the nodes, ge­o­graph­i­cal topog­ra­phy, ob­sta­cles, etc. We model this with a weighted graph (G,w) where the weight func­tion w on the edges of G rep­re­sents the noise (in­ter­fer­ence) be­tween the two end-ver­tices. The total in­ter­fer­ence in a node is then the sum of all the noises of the nodes emit­ting on the same fre­quency. A weighted t-im­proper k-colour­ing of (G,w) is a k-colour­ing of the nodes of G (as­sign­ment of k fre­quen­cies) such that the in­ter­fer­ence at each node does not ex­ceed the thresh­old t. We con­sider here the Weighted Im­proper Colour­ing prob­lem which con­sists in de­ter­min­ing the weighted t-im­proper chro­matic num­ber de­fined as the min­i­mum in­te­ger k such that (G,w) ad­mits a weighted t-im­proper k-colour­ing. We also con­sider the dual prob­lem, de­noted the Thresh­old Im­proper Colour­ing prob­lem, where, given a num­ber k of colours, we want to de­ter­mine the min­i­mum real t such that (G,w) ad­mits a weighted t-im­proper k-colour­ing. We first pre­sent gen­eral upper bounds for both prob­lems; in par­tic­u­lar we show a gen­er­al­i­sa­tion of Lovász's The­o­rem for the weighted t-im­proper chro­matic num­ber. We then show how to trans­form an in­stance of the Thresh­old Im­proper Colour­ing prob­lem into an­other equiv­a­lent one where the weights are ei­ther one or M, for a suf­fi­ciently large M. Mo­ti­vated by the orig­i­nal ap­pli­ca­tion, we then study a spe­cial in­ter­fer­ence model on var­i­ous grids (square, tri­an­gu­lar, hexag­o­nal) where a node pro­duces a noise of in­ten­sity 1 for its neigh­bours and a noise of in­ten­sity 1/2 for the nodes at dis­tance two. We de­rive the weighted t-im­proper chro­matic num­ber for all val­ues of t. Fi­nally, we model the prob­lem using in­te­ger lin­ear pro­gram­ming, pro­pose and test heuris­tic and exact Branch-and-Bound al­go­rithms on ran­dom cell-like graphs, namely the Pois­son-Voronoi tes­sel­la­tions.

The re­sults pre­sented in this chap­ter ap­peared in [IWOCA2011] and [JDA2012].

Hy-phen-a-tion