After coming across Scott Janish’s exciting hop visualizations and tools yesterday, I needed to explore this Hopunion dataset for myself. I was excited to throw some statistical clustering and dimension reduction methods at the hop varieties on the basis of their oil composition. The oil composition of hops drive their distinct flavour and aroma contribution to beer… you can see why a brewer would be interested. In a simple case this information could be used to inform hop substitution, but more excitedly it could perhaps help develop new hop combinations in beer.
The dataset consist of 58 different hop varieties from the 2014 harvest year. For each hop there are alpha and beta acid measurements, total oil, and then the composition of that oil by percentage of beta-pinene, myrcene, linalool, caryophyllene, farnesene, humulene, geraniol, and other. I didn’t feel as the “other” oil category could be used to indicate hop similarity, so it was excluded. I also felt that I could make better hop comparisons by instead looking at the total amounts of the various oils rather than their percent composition, so those quantities were calculated. Alpha and beta acids were also excluded in this analysis.
I started with agglomerative and divisive hierarchical clustering algorithms. The average silhouette method of determining the number of clusters suggested two large clusters in both cases. I was honestly expecting that several distinct classes of hops would have been revealed with this method, but that was not the case. I continued with another clustering method, parsimonious Gaussian mixture models… again this approach led to identifying two large clusters… I would have to look more closely at this, but I’m curious if these clusters correlated well an aroma/bittering hop distinction, breeding or perhaps growing location.
I next looked to dimension reduction methods, both with principle components analysis, and multidimensional scaling, for the purpose of creating a hop variety visualization as well as developing some hop substitution suggestions (which I’ll post at a later date). I settled on multidimensional scaling as I felt the resulting visual could be interpreted more directly. Basically, if a hop is close to another in the plot, they are similar. If they are far away, they are dissimilar. So, while it’s not possible to create the 7 dimensional scatterplot that would be required to visualize the complete oil composition data, this figure allows you to explore the similarity of these hop varieties with a very much sufficient 3 dimensional approximation. The plot is interactive, and may take a minute to fully load. Thanks to Scott and Hopunion for this work break!