A couple weeks back I advised the folks over at Brulosophy to switch to an upper tailed binomial proportions test for determining significant results in their *exbeeriments.* I also created a p-value calculator for them, which you can now use for your own *exbeeriments*!

In case you were wondering, the statistical analysis associated with the triangle test compares the proportion of test participants whom have correctly identified an odd beer out, to the proportion of tasters that would be expected to correctly identify the odd beer purely due to random chance. The greater the proportion of correct participants, the more evidence there is against the “random chance” null hypothesis. As we are only interested if the different beers can be correctly distinguished, this will be an “upper tailed” test. That is to say, the potential result of the odd beer out being correctly identified less than we’d expect under random chance isn’t of particular interest to us (and is not particularly likely either).

In plain terms, our null hypothesis is that the true proportion of population that can correctly identify the odd beer out is 1/3. Our alternative hypothesis is that the true proportion of the population that can correctly identify the odd beer out is greater than 1/3.

The exact p-value can be calculated using the binomial distribution. Specifically, the p-value is found as the probability of having observed at least as many correct tasters, if the population proportion is infact equal to 1/3.

An approximate p-value can be calculated assuming that under the null hypothesis, the estimated proportion will follow a normal distribution with mean equal to 1/3. The approximate method may be used when the sample size is equal or greater than 25.

Thank you so much for your help!!

I assume that the reading the null hypothesis for the true proportion of the population to identify the odd beer at 1/3 was set at that value is because its a three way comparison of two identical and one different?

Yes, exactly. If making a random selection, a taster would have a 1/3 chance of correctly identifying the odd beer out due to the fact that three samples are presented.

Justin, are there any rules about the minimum sample size or anything else to be aware of?

Also, what can be said about a p value of 0.04 vs a p value of 0.02, for example. I’ve always heard that significance is significance, there is no scale, but we already chose 0.05 as the cutoff, so it seems odd to say that there is no difference between 0.02 and 0.04.

The rules about minimum sample size only apply when using the normal approximation.

We select our significance level prior to conducting our statistical test – as you know if the p-value is less than the significance level, we reject the null hypothesis in favour of the alternative. While a smaller p-value indicates more evidence against the null hypothesis, this p-value is only used to make the binary conclusion of either rejecting or failing to reject the null hypothesis, so in effect, all that we are really taking from it is whether it is over that threshold of significance or not.

Where it may matter just how many participants were correct is in deriving an estimate of what the true population proportion is. While 5/7 and 6/7 would both be significant results in the triangle test, the latter leads to a “more extreme” estimate of the true population proportion.

I think that makes sense. Thanks.

Hey is it so that the 1/3 null hypothesis could be bigger, say 2/4, in case all of the participants would be more trained/skilled for tasting, and 1/3 is used here since you are doing tests with different group of people?

Have you explained somewhere how you selected this 1/3? You must have some arguments for this.

Yes. 1/3 as the null hypothesis is the “random selection” null hypothesis. This is the proportion of correct participants on average if they had all selected the odd-sample-out randomly.

Anything statistically higher than a proportion of 1/3 suggests that at least some of the participants are able to discern a difference in samples – i..e not only selecting via random chance – though it is expected that there will be at least some if not most participants performing no better than random chance still.