next up previous
Next: Experiment Up: Caching Policies Previous: Static Caching

Optimality of Static Caching

Assume that exactly n documents reside in the Web server which are requested at rates r1 $\geq$ r2 $\geq$ $\cdots$ $\geq$ rn . Assume that the sizes of the documents are identically distributed random variables. Moreover, we shall assume that the cache can hold a fixed number m of documents at any time (note that this assumption clearly does not hold for the ``smallest size'' caching policy). Let Xi denote the index of the document of the i -th request, and Yk,i the indicator function of whether document k is in the cache at the i -th request. We shall assume that {Xi}i is a sequence of independent and identically distributed random variables. Then, the empirical hit rate of caching policy $\pi$ , denoted by H$\scriptstyle\pi$ , can be expressed as

H$\scriptstyle\pi$ = $\displaystyle\lim_{T\to\infty}^{}$$\displaystyle{\textstyle\frac{1}{T}}$$\displaystyle\sum_{i=1}^{T}$$\displaystyle\sum_{k=1}^{n}$$\displaystyle\bf$1(Xi = k)Yk,i.

Assume that the sequence {Xi,Yk,i}i is (jointly) stationary and ergodic so that the theorem of Kingman [6] applies. Then,

H$\scriptstyle\pi$ = $\displaystyle\lim_{T\to\infty}^{}$$\displaystyle{\textstyle\frac{1}{T}}$$\displaystyle\sum_{i=1}^{T}$$\displaystyle\sum_{k=1}^{n}$E$\displaystyle\left[ {\bf 1}(X_i=k) Y_{k,i} \right].$

For any non-anticipative caching policy $\pi$ (i.e., $\pi$ has no knowledge of future requests), Yk,i depends only on the previous requests X1,...,Xi - 1 for any i . Thus,
H$\scriptstyle\pi$ = $\displaystyle\lim_{T\to\infty}^{}$$\displaystyle{\textstyle\frac{1}{T}}$$\displaystyle\sum_{i=1}^{T}$$\displaystyle\sum_{k=1}^{n}$E$\scriptstyle\pi$$\displaystyle\left[ {\bf 1}(X_i=k) Y_{k,i} \right]$   
  = $\displaystyle\lim_{T\to\infty}^{}$$\displaystyle{\textstyle\frac{1}{T}}$$\displaystyle\sum_{i=1}^{T}$$\displaystyle\sum_{k=1}^{n}$E$\scriptstyle\pi$$\displaystyle\left[ {\bf 1}(X_i=k) \right]E$$\scriptstyle\pi$$\displaystyle\left[ Y_{k,i} \right]$   
  = $\displaystyle\lim_{T\to\infty}^{}$$\displaystyle{\textstyle\frac{1}{T}}$$\displaystyle\sum_{i=1}^{T}$$\displaystyle\sum_{k=1}^{n}$rkE$\scriptstyle\pi$$\displaystyle\left[ Y_{k,i} \right]$   
  = $\displaystyle\sum_{k=1}^{n}$rkqk($\displaystyle\pi$),   
where the last equality comes from the stationarity of {Yk,i}i , and qk($\pi$) : = E$\scriptstyle\pi$$\left[ Y_{k,i} \right]$ is the probability that document k is kept in the cache.

Note that $\sum_{k=1}^{n}$qk($\pi$) = $\sum_{k=1}^{n}$E$\scriptstyle\pi$$\left[ Y_{k,i} \right]=$m , and that for all k , 0 $\leq$ qk($\pi$) $\leq$ 1 . Let rn + 1 = 0 . Then,

H$\scriptstyle\pi$ = $\displaystyle\sum_{k=1}^{n}$rkqk($\displaystyle\pi$)   
  = $\displaystyle\sum_{k=1}^{n}$$\displaystyle\sum_{i=k}^{n}$(ri - ri + 1)qk($\displaystyle\pi$)   
  = $\displaystyle\sum_{i=1}^{n}$$\displaystyle\sum_{k=1}^{i}$(ri - ri + 1)qk($\displaystyle\pi$)   
  $\textstyle\leq$ $\displaystyle\sum_{i=1}^{n}$(ri - ri + 1)min (i,m)   
  = $\displaystyle\sum_{i=1}^{m}$ri.   
By definition, the hit rate of the static policy Hs is Hs = $\sum_{i=1}^{m}$ri . Thus, H$\scriptstyle\pi$ $\leq$ Hs .

Note that under the above assumptions (in particular, the documents have identical size), the ``request hit rate'' and the ``byte hit rate'' (see definitions below) coincide.

We assume now that the documents have different sizes, say bi bytes for document i . As in the previous case, we assume that ri is decreasingly ordered. Let B be the size of the cache. Then, the optimal static caching can be formulated as a 0-1 programming problem:

$\displaystyle\begin{array}
{ll}
 \max & \sum_{i=1}^{n}r_ib_iZ_i \  & \ \text{s.t.:} & 
 \sum_{i=1}^{n}b_iZ_i\leq B, \  & \forall i, Z_i=0\ or\ 1\end{array}$

where Zi is the indicator function of whether document i is in the cache. The cost to be maximized is proportional to the byte hitrate

$\displaystyle{\frac{\sum_{i=1}^{n}r_ib_iZ_i}{\sum_{i=1}^{n}r_ib_i}}$.

Let S denote the cost of the solution of this problem. It is easy to see that such a problem is the well-known knapsack problem, which is NP-hard in general. Thus, in our implementation, we consider an approximate solution resulting from relaxation. Indeed, by relaxing the 0-1 constraint on Zi , i.e., by assuming that Zi is a real number ( Zi $\in$ [0,1] ), we can easily see that an optimal solution is {Z1 = 1,Z2 = 1,,Zk = 1 , Zk + 1 = (B - $\sum_{i=1}^{k}$bi)/bk),Zk + 2 = 0,,Zn = 0} , where k = max {j $\leq$ n|$\sum_{i=1}^{j}$biZi $\leq$ B} .

Let Sr be the hitrate of the static caching (using a cache whose size possibly exceeds B ) of documents 1,2,,k + 1 . It is clear that S $\leq$ Sr .

Our approximate solution, denoted by Sa , is obtained by caching documents 1,2,,k and some further documents after k that can be filled in the cache. Again, it is clear that Sa $\leq$ S $\leq$ Sr .

This approximate solution is suboptimal. In practice, however, with the data of WWW documents access, this approximate solution Sa turns out to be very close to the upper bound Sr (the differences being smaller than 0.01% in our numerical experimentation). Therefore, we shall simply take the approximate solution as the optimal static solution. In section 4, the performance of this solution will be reported.


next up previous
Next: Experiment Up: Caching Policies Previous: Static Caching
Nicolas Niclausse
8/25/1997