Check Replication Variance from Experiment
This script performs an Analysis of Variance of an Experiment, having as factor variable the replication and the response variable the observation made on a vector, scalar or statistic. This Analysis is useful when evidencing how the randomization changes the observations when the RNG seed is changed. This script relies on The R Project. The generated output are two: an text file with the ANOVA analysis and the Tukey multiple comparisons of means, and a graphical representation this Tukey test. A simple explanation of the Tukey multiple comparison of means can be found here.
The input file need to have this following headers: Experiment Replica Module Obs. The analysis will be performed from the second column (Replica) against the fourth column (Obs) for all the rows coming from the same experiment (First column are exactly the same).
An example of this script usage is the following:
$ checkReplicationVariance.sh APs-numGivenUp.data Model1
Notice that in the input data file have 3 Models (Model1, Model2 and Model3), and we are checking the replication only on the first one. We can do the same for all the other experiments on the same input file just by changing the second parameter when calling the script.
The graphical representation of the Tukey's test looks like the next image. Text output is following the image.
$ cat APs-numGivenUp.data-Model1.anova
...
Df Sum Sq Mean Sq F value Pr(>F)
data.replicas 9 4.1 0.5 0.0887 0.9998
Residuals 730 3728.1 5.1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = data.obs ~ data.replicas)
$data.replicas
diff lwr upr p adj
1-0 -1.351351e-01 -1.314170 1.0439000 0.9999982
2-0 -4.054054e-02 -1.219576 1.1384946 1.0000000
3-0 -1.351351e-01 -1.314170 1.0439000 0.9999982
...
8-0 -1.216216e-01 -1.300657 1.0574135 0.9999993
9-0 -2.297297e-01 -1.408765 0.9493054 0.9998246
2-1 9.459459e-02 -1.084441 1.2736297 0.9999999
3-1 0.000000e+00 -1.179035 1.1790352 1.0000000
...
8-1 1.351351e-02 -1.165522 1.1925487 1.0000000
9-1 -9.459459e-02 -1.273630 1.0844406 0.9999999
3-2 -9.459459e-02 -1.273630 1.0844406 0.9999999
4-2 0.000000e+00 -1.179035 1.1790352 1.0000000
5-2 4.054054e-02 -1.138495 1.2195757 1.0000000
...
8-7 -2.702703e-02 -1.206062 1.1520081 1.0000000
9-7 -1.351351e-01 -1.314170 1.0439000 0.9999982
9-8 -1.081081e-01 -1.287143 1.0709270 0.9999997
Note: The ... are indicating we are omitting some output.
Analysis: You will see that each replica contrast (each horizontal line) has a vertical line in the middle (more less). This line is indicating the mean position. There is a dash line just in the middle of the chart indicating where is the zero. zero means that there is no difference in a contrast. If we are comparing replica 3 against the replica 0, we realize that the mean is a bit to the left from zero. But the zero is included within the interval. If there is some contrast that is not containing the zero, the replication is producing results too different (statistically different as to think they are different). So, it means the replication is changing too much the results. So, something is wrong in the simulation. Replication must make the result vary, but not as much as to produce significant differences.
Download the Script