Analyzing Search Algorithms in Small Worlds
Nicolas Schabanel
Laboratoire d'Informatique Algorithmique: Fondements et Applications (LIAFA), Univ. Paris Diderot, Paris, France.
and Centre National de la Recherche Scientifique (CNRS), Paris, France.
(includes joint work with Emmanuelle Lebhar \& George Giakkoupis)}
The smallworld phenomenon. Milgram revealed in his famous experiment [12] that individuals are not only a few handshakes away from each other, but they are also able to find such short paths between them, in spite of their extremely local view of the worldwide social network. In 2000, Kleinberg [6] proposed a simple random network model that captures this surprising property of social networks. Beyond this natural sociological motivation, his model had an important impact on the design on several peer-to-peer protocols~(e.g., [13]), because it addresses the general question of how decentralized algorithms can find short paths in a partially unknown network.
Kleinberg's model. Kleinberg's small world model consists of a {$d$}-dimensional grid {$\{-n,\ldots, n\}^d$} (representing local acquaintance between individuals, such as geographic or professional) augmented with one "long-range" directed link per node pointing to a random node at distance {$r$} chosen with probability proportional to {$1/r^\alpha$} where {$\alpha$} is a parameter of the model (each long-range link represents a random acquaintance met in the past for instance). Kleinberg defined a decentralized routing algorithm as an algorithm that tries to route locally a message from a node (the source) to another (the target), that is to say, by visiting only neighbors (local or long-range) of already visited nodes (starting from the source). Kleinberg's most striking result is that no decentralized algorithm can find short paths if {$\alpha\neq d$} (i.e., path of length {$polylog(n)$} where {$n$} is the size of the grid), even when the diameter of the augmented graph is {$\Theta(\log n)$} as it was shown later on by [10] [11]}. Only when {$\alpha=d$}, decentralized algorithms may find short paths between random pairs. Indeed, the simple greedy algorithm that simply routes the message to the closest neighbor (local or long-range) of the current message holder computes paths of expected length {$O(\log^2 n)$} [6] (see Fig.1).
Two main types of questions. Kleinberg's model raises two main type of questions. First, from the algorithmic and peer-to-peer network design point of view: Can we beat greedy algorithm? Second, from the sociological point of view: What does this model tell us about real social networks?
One important issue is now to explain how this distribution arises in real social networks. Several simple dynamical models have been proposed [2] [1]}. Both of them give raise to the desired long-range link distribution (only experimentally for [2]), but none of them answers completely to the question since they both make some unsatisfying arbitrary choices: the distribution of the "time to forget" for instance in [1]. Another important question is: what kind of algorithm do people use in social network, if any? How can we tell from the experiment?
The three main types of search algorithms. To try to answer these questions, one may want to study the characteristics of the paths constructed by the various algorithms proposed in the literature. These algorithms are essentially of three types: greedy [6], local exploration based [3] and non-local exploration based [7] [8] [5]. Typical paths of each type of algorithm are given in Fig.2, Fig.3, and Fig.4, respectively. One can observed some important differences: the greedy path contains long-range links of all lengths while the two others tend to follow longer long-range links; the local exploration based algorithm tend to follow fewer long-range links than the non-local one and these are much longer. We can observed also some similarities between them: significant progresses toward the target are made by some very long long-range link that happend every so often. This last similarity was indeed proved to be true for every efficient decentralized algorithm: it is a corollary of the proof of the lower bound by [5].
Measuring the sociological meaningfulness. In order to find the algorithmic model(s) that best suit the experimental data on the recorded human behaviors in "Milgram-like" experiments, we need to exhibit characteristics of the followed paths that: first, take clearly different values for each type of algorithm; and second, that can easily be computed in the recorded paths from the experiments. With E. Lebhar, we have studied various parameters that could match these two criteria. For instance, Fig.5 displays for the greedy and local exploration based algorithms, the exact "load" of the long-range as a function of their length, i.e., the expected number of paths that go through links of that length given some random source and target. One can observed that the behavior of this parameter differs a lot between the greedy and the local-search algorithms and can also be easily recorded from real-life experiment. At the present time, there is no mathematical of these parameters, but we have some hope that the geometric framework that was designed in [5] (see Fig.6 for an illustration of the key useful facts) should help in that matter.
References: