Probabilistic Database

There are currently two main models for defining probabilistic tables: the tuple-level and the attribute-level model. See the Model page for more detailed information about these two models.

ProbDB can handle both models for storing the tables. The attribute-level model is less space consuming but it requires multiples classical tables for each probabilistic table, and thus is less convenient for calculations over the probabilistic attributes. On the other hand the tuple-level model is greedy in terms of space requirement, but as all the data are in the same table, it can be more convenient for the computations.

Meta-data

In order to store and retrieve probabilistic data and to provide probabilistic queries, we use different tables for defining which attributes represent the probability of a tuple to appear in a possible world, and which tables are correlated.

Tuple-Level Model

The simplest way to store a probabilistic table is to use the tuple-leve model. In this model, each tuple has its own probability to exist in the world, and two tuples are independent. We use an extension of this model, which provides mutual exclusion between the tuples. Each tuple has a special attribute, the tuple_id attribute. All tuples having the same tuple_id are mutually exclusive. Tuples with different tuple_id are independent.

The table ptable1 is an example of a probabilistic table, with id as the tuple_id attribute and prob as the probability attribute.

ptable1
idnamevalue1prob
1t130.7
1t250.3
2t300.8
2t430.2

In the table ptable1, the first two tuples are mutually exclusive, so are the last two. But the tuple t1 is independent from the tuple t3. There are 4 possible worlds:

WorldTuples
PW13{t1, t3}
PW14{t1, t4}
PW23{t2, t3}
PW24{t2, t4}

We store the probabilistic metadata of the database in tables. The metadata of tuple-level tables are stored in the table tuple_level_tables, as follows:

tuple_level_tables
table_nametuple_id_attrprob_attr
ptable1idprob
.........

Attribute-Level Model

With the attribute-level model, we assume that the tables have probabilistic attributes. Tuples from a probabilistic table can have probabilistic values, each of these values being a set of mutually exclusive couples (tuple,probability).

The table ptable1 can be represented:

id values
1
namevalue1prob
t130.7
t250.3
2
namevalue1prob
t300.8
t430.2

Briefly speaking, each probabilistic value in the attribute-level model looks like a tuple-level probabilistic table. This is how we will store it in the database.

The table attribute_level_tables stores the metadata of the probabilistic tables defined by the attribute-level model.

attribute_level_tables
table_nametuple_id_attrtable_prob_attr
ptable1_attr_levvaluesptable1
.........

The table ptable1_attr_lev has one probabilistic attribute called values which refers to the tuple_id of the table ptable1.

ptable1_attr_lev
idvalues
11
22

INRIA main page