The p-value of tea: the algae specialist and the tea-value

How a cup of tea turned an everyday occurrence into a legend in science

Jan 08, 2023

Muriel Bristol was a British scientist specialized in the study of algae, born in 1888. She worked at Rothamsted Research (Harpenden, England). Rothamsted was one of the oldest research institutions in agriculture; some of the great myths of statistics worked in that place such as Fisher, Yates, or Cochran. Many of the techniques we use in clinical research were born there to investigate plant breeding. But today's story is about a cup of tea.

One ordinary day in 1920, Ronald Fisher, for some the most brilliant statistician of the 20th century, offered Bristol a cup of tea with some milk in the common room at Rothamsted. Bristol declined, stating that she preferred to pour the milk into the cup first and then the tea, as she liked the taste better that way. Fisher, who had only been at this institution a short time, replied that this made no sense. Bristol insisted that he could tell the difference perfectly well. One of those present, William Roach, who was in love with Bristol, proposed to make an experiment. Eight cups of tea were prepared at random, in some the tea was poured first, in others the milk.

Bristol was able to correctly detect the 8 cups based on the order of the tea and milk.

After the humiliation of his defeat, Fisher was intrigued about the experiment they had just performed. He wondered: How many cups should be used in the test? Should they be paired? In what order should the cups be presented? What should be done about chance variations in temperature, sweetness, etc.? What conclusion can be drawn from a perfect score or one with one or more errors?

These questions turned an everyday occurrence into a legend in science, since Fisher, as a result of this experiment, developed a statistical test that we still use today, Fisher's exact test. It compares two categorical variables (e.g., the order of the tea/milk mixture and our algae expert's success or failure). It starts from a null hypothesis (which Fisher stubbornly defended), Bristol is not able to differentiate the type of mixture. The test gives the probability (p-value, or let me make a bad joke: tea-value :p) that the observed differences between hits and misses are due to chance, assuming the null hypothesis that Bristol is not really able to differentiate between them. When this probability is very low, we accept the alternative hypothesis, Bristol is able to do so. Precisely this experiment also gave rise to the concept of Fisher's null hypothesis.

Today we use Fisher's exact test instead of the chi-squared test when we compare categorical variables and the sample size of the experiment is small, although it can also be used with large samples.

Fisher described this experiment in his book The Design of Experiments in 1935:

"A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. Our experiment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject in a random order. The subject has been told in advance of what the test will consist, namely that these shall be four of each kind, and that they shall be presented to her in a random order, that is an order not determined arbitrarily by human choice, but by the actual manipulation of the physical apparatus used in games of chance"

The next time you see a sentence like this in a scientific article, remember Bristol, Fisher, and Roach: Categorical outcomes were compared with the use of a chi-square test (with Fisher correction when needed)

Some interesting facts:

Tea is the second most consumed beverage on Earth, after water.

Muriel Bristol and William Roach were married a year after this experiment.

Rothamsted is home to the Park Grass Experiment, a study in agriculture that began in 1856 and is still ongoing.

Several algae were named after Muriel Bristol, such as Chlamydomonas muriella and probably the genus Muriella.

Ronald Fisher is considered one of the most brilliant statisticians in history, but in other aspects of his life he has been criticized: he supported eugenics and made statements and supported writings that can be considered racist.

References:

Box, J. F. R. A. Fisher: The Life of a Scientist, 1978.

Sturdivant, R. Lady testing tea

Fisher, R. A. The Design of Experiments, 1935

Fisher, R. A. The Principles of Experimentation, Illustrated by a Psycho-physical Experiment, Section 8. The Null Hypothesis, 1971

Wikipedia articles: they are hyperlinked in the text.

Versión en castellano:

Muriel Bristol fue una científica especializada en algas británica nacida en 1888. Trabajó en Rothamsted Research (Harpenden, England). Rothamsted fue una de las más antiguas instituciones de investigación en agricultura; allí trabajaron mitos de la estadística como Fisher, Yates, o Cochran. Muchas de las técnicas que usamos en investigación clínica nacieron allí para estudiar el cultivo de plantas. Pero la historia de hoy trata sobre una taza de té.

Un día cualquiera de 1920, Ronald Fisher, para algunos el estadístico más brillante del siglo XX, ofreció a Bristol una taza de té con un poco de leche en la sala común de Rothamsted. Bristol la declinó afirmando que ella prefería verter primero la leche en la taza y luego el té, ya que el sabor le gustaba más así. Fisher, que llevaba poco tiempo en esta institución, le contestó que eso no tenía sentido. Bristol insistió en que podía diferenciarlo perfectamente. Uno de los presentes, William Roach, que estaba enamorado de Bristol, propuso hacer un experimento. Se prepararon 8 tazas de té de forma aleatoria, en algunas se vertió primero el té, en otras la leche.

Bristol fue capaz de detectar correctamente las 8 tazas en función del orden del té y la leche.

Tras la humillación de su derrota, Fisher quedó intrigado sobre el experimento que acababan de hacer. Se preguntó: ¿Cuántas tazas deben utilizarse en la prueba? ¿Deben estar emparejadas? ¿En qué orden deben presentarse las tazas? ¿Qué hay que hacer con las variaciones fortuitas de temperatura, dulzura, etc.? ¿Qué conclusión se puede sacar de una puntuación perfecta o de una con uno o varios errores?

Estas preguntas hicieron que un hecho cotidiano se transformara en una leyenda en ciencia, ya que Fisher a raíz de este experimento desarrolló un test estadístico que seguimos usando hoy en día, el test exacto de Fisher. Permite comparar dos variables categóricas (por ejemplo, orden en la mezcla de té/leche y acierto o error de nuestra experta en algas). Parte de una hipótesis nula (que Fisher defendía con obstinación), Bristol no es capaz de diferenciar el tipo de mezcla. El test da la probabilidad (valor de p, o bien, déjame decir un chiste malo: el té-valor) de que las diferencias que se observan entre los aciertos y los errores se deban al azar asumiendo la hipótesis nula de que Bristol realmente no es capaz de diferenciarlas. Cuando esta probabilidad es muy baja, aceptamos la hipótesis alternativa, Bristol es capaz de hacerlo. Precisamente este experimento dio también origen al concepto de hipótesis nula de Fisher.

Usamos hoy en día el test exacto de Fisher en lugar de la ji al cuadrado cuando comparamos variables categóricas y el tamaño muestral del experimento es pequeño, aunque puede usarse también con muestras grandes.

Fisher contó este experimento en su libro The Design of Experiments en 1935:

La próxima vez que veas una frase como esta en un artículo científico, acuérdate de Bristol, Fisher y Roach: Categorical outcomes were compared with the use of a chi-square test (with Fisher correction when needed)

Algunos datos interesantes:

El té es la segunda bebida más consumida en la Tierra, después del agua.

Muriel Bristol y William Roach se casaron un año después de este experimento.

En Rothamsted se lleva a cabo el Park Grass Experiment, un estudio en agricultura que empezó en 1856 y sigue en marcha.

Varias algas recibieron sus nombres en honor a Muriel Bristol, como la Chlamydomonas muriella y probablemente el género Muriella.

Ronald Fisher es considerado como uno de los estadísticos más brillantes de la historia, pero en otros aspectos de su vida ha sido criticado: apoyaba la eugenesia y realizó declaraciones y apoyó escritos que pueden ser considerados racistas.

The p-value of tea: the algae specialist and the tea-value

How a cup of tea turned an everyday occurrence into a legend in science

Discussion about this post