A couple weeks back I advised the folks over at Brulosophy to switch to an upper tailed binomial proportions test for determining significant results in their *exbeeriments.* I also created a p-value calculator for them, which you can now use for your own *exbeeriments*!

In case you were wondering, the statistical analysis associated with the triangle test compares the proportion of test participants whom have correctly identified an odd beer out, to the proportion of tasters that would be expected to correctly identify the odd beer purely due to random chance. The greater the proportion of correct participants, the more evidence there is against the “random chance” null hypothesis. As we are only interested if the different beers can be correctly distinguished, this will be an “upper tailed” test. That is to say, the potential result of the odd beer out being correctly identified less than we’d expect under random chance isn’t of particular interest to us (and is not particularly likely either).

In plain terms, our null hypothesis is that the true proportion of population that can correctly identify the odd beer out is 1/3. Our alternative hypothesis is that the true proportion of the population that can correctly identify the odd beer out is greater than 1/3.

The exact p-value can be calculated using the binomial distribution. Specifically, the p-value is found as the probability of having observed at least as many correct tasters, if the population proportion is infact equal to 1/3.

An approximate p-value can be calculated assuming that under the null hypothesis, the estimated proportion will follow a normal distribution with mean equal to 1/3. The approximate method may be used when the sample size is equal or greater than 25.

Thank you so much for your help!!

I assume that the reading the null hypothesis for the true proportion of the population to identify the odd beer at 1/3 was set at that value is because its a three way comparison of two identical and one different?

Yes, exactly. If making a random selection, a taster would have a 1/3 chance of correctly identifying the odd beer out due to the fact that three samples are presented.

Justin, are there any rules about the minimum sample size or anything else to be aware of?

Also, what can be said about a p value of 0.04 vs a p value of 0.02, for example. I’ve always heard that significance is significance, there is no scale, but we already chose 0.05 as the cutoff, so it seems odd to say that there is no difference between 0.02 and 0.04.

The rules about minimum sample size only apply when using the normal approximation.

We select our significance level prior to conducting our statistical test – as you know if the p-value is less than the significance level, we reject the null hypothesis in favour of the alternative. While a smaller p-value indicates more evidence against the null hypothesis, this p-value is only used to make the binary conclusion of either rejecting or failing to reject the null hypothesis, so in effect, all that we are really taking from it is whether it is over that threshold of significance or not.

Where it may matter just how many participants were correct is in deriving an estimate of what the true population proportion is. While 5/7 and 6/7 would both be significant results in the triangle test, the latter leads to a “more extreme” estimate of the true population proportion.

I think that makes sense. Thanks.

Hi Justin,

I’m wondering about effect sizes calculations for these statistics. I often find that many Brulosophy studies find a significant

difference from chance by only a few (sometimes 1) tasters. This seems like an appropriate way to add important nuance to

what they’re trying to do. I’m interested in your thoughts on this — I have not looked at the math real close yet. Thanks!

The estimate of effect size would be max(0, x/n – 1/3), where x is the number of correct participants, n is the total number of participants. I include the max bit as a negative effect size isn’t really interpretable with this test.

To reject the null hypothesis, all we need is the confidence interval for the effect size to not include 0. The lower bound of that CI may be barely above 0 in some cases (e.g. experiments in which we

justmeet the threshold for significance).Brulosophy has considered providing results like this – like CIs for effect size, but it’s less approachable for most people compared to something like “more people picked the odd beer out than would be expected if everyone selected at random” – so it’s left at that.

Hey is it so that the 1/3 null hypothesis could be bigger, say 2/4, in case all of the participants would be more trained/skilled for tasting, and 1/3 is used here since you are doing tests with different group of people?

Have you explained somewhere how you selected this 1/3? You must have some arguments for this.

Yes. 1/3 as the null hypothesis is the “random selection” null hypothesis. This is the proportion of correct participants on average if they had all selected the odd-sample-out randomly.

Anything statistically higher than a proportion of 1/3 suggests that at least some of the participants are able to discern a difference in samples – i..e not only selecting via random chance – though it is expected that there will be at least some if not most participants performing no better than random chance still.