There are currently two main models for defining probabilistic tables: the tuple-level and the attribute-level model. See the Model page for more detailed information about these two models.
ProbDB can handle both models for storing the tables. The attribute-level model is less space consuming but it requires multiples classical tables for each probabilistic table, and thus is less convenient for calculations over the probabilistic attributes. On the other hand the tuple-level model is greedy in terms of space requirement, but as all the data are in the same table, it can be more convenient for the computations.
Meta-data
In order to store and retrieve probabilistic data and to provide probabilistic queries, we use different tables for defining which attributes represent the probability of a tuple to appear in a possible world, and which tables are correlated.
Tuple-Level Model
The simplest way to store a probabilistic table is to use the tuple-leve model. In this model, each tuple has its own probability to exist in the world, and two tuples are independent. We use an extension of this model, which provides mutual exclusion between the tuples. Each tuple has a special attribute, the tuple_id attribute. All tuples having the same tuple_id are mutually exclusive. Tuples with different tuple_id are independent.
The table ptable1 is an example of a probabilistic table, with id as the tuple_id attribute and prob as the probability attribute.
id | name | value1 | prob |
---|---|---|---|
1 | t1 | 3 | 0.7 |
1 | t2 | 5 | 0.3 |
2 | t3 | 0 | 0.8 |
2 | t4 | 3 | 0.2 |
In the table ptable1, the first two tuples are mutually exclusive, so are the last two. But the tuple t1 is independent from the tuple t3. There are 4 possible worlds:
World | Tuples |
---|---|
PW13 | {t1, t3} |
PW14 | {t1, t4} |
PW23 | {t2, t3} |
PW24 | {t2, t4} |
We store the probabilistic metadata of the database in tables. The metadata of tuple-level tables are stored in the table tuple_level_tables, as follows:
table_name | tuple_id_attr | prob_attr |
---|---|---|
ptable1 | id | prob |
... | ... | ... |
Attribute-Level Model
With the attribute-level model, we assume that the tables have probabilistic attributes. Tuples from a probabilistic table can have probabilistic values, each of these values being a set of mutually exclusive couples (tuple,probability).
The table ptable1 can be represented:
id | values | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 |
| |||||||||
2 |
|
Briefly speaking, each probabilistic value in the attribute-level model looks like a tuple-level probabilistic table. This is how we will store it in the database.
The table attribute_level_tables stores the metadata of the probabilistic tables defined by the attribute-level model.
table_name | tuple_id_attr | table_prob_attr |
---|---|---|
ptable1_attr_lev | values | ptable1 |
... | ... | ... |
The table ptable1_attr_lev has one probabilistic attribute called values which refers to the tuple_id of the table ptable1.
id | values |
---|---|
1 | 1 |
2 | 2 |