Tag Archives: Probability

Paradoxes of probability and other statistical strangeness

Stephen Woodcock, University of Technology Sydney

Statistics is a useful tool for understanding the patterns in the world around us. But our intuition often lets us down when it comes to interpreting those patterns. In this series we look at some of the common mistakes we make and how to avoid them when thinking about statistics, probability and risk.


You don’t have to wait long to see a headline proclaiming that some food or behaviour is associated with either an increased or a decreased health risk, or often both. How can it be that seemingly rigorous scientific studies can produce opposite conclusions?

Nowadays, researchers can access a wealth of software packages that can readily analyse data and output the results of complex statistical tests. While these are powerful resources, they also open the door to people without a full statistical understanding to misunderstand some of the subtleties within a dataset and to draw wildly incorrect conclusions.

Here are a few common statistical fallacies and paradoxes and how they can lead to results that are counterintuitive and, in many cases, simply wrong.


Simpson’s paradox

What is it?

This is where trends that appear within different groups disappear when data for those groups are combined. When this happens, the overall trend might even appear to be the opposite of the trends in each group.

One example of this paradox is where a treatment can be detrimental in all groups of patients, yet can appear beneficial overall once the groups are combined.

How does it happen?

This can happen when the sizes of the groups are uneven. A trial with careless (or unscrupulous) selection of the numbers of patients could conclude that a harmful treatment appears beneficial.

Example

Consider the following double blind trial of a proposed medical treatment. A group of 120 patients (split into subgroups of sizes 10, 20, 30 and 60) receive the treatment, and 120 patients (split into subgroups of corresponding sizes 60, 30, 20 and 10) receive no treatment.

The overall results make it look like the treatment was beneficial to patients, with a higher recovery rate for patients with the treatment than for those without it.

The Conversation, CC BY-ND

However, when you drill down into the various groups that made up the cohort in the study, you see in all groups of patients, the recovery rate was 50% higher for patients who had no treatment.

The Conversation, CC BY-ND

But note that the size and age distribution of each group is different between those who took the treatment and those who didn’t. This is what distorts the numbers. In this case, the treatment group is disproportionately stacked with children, whose recovery rates are typically higher, with or without treatment.


Base rate fallacy

What is it?

This fallacy occurs when we disregard important information when making a judgement on how likely something is.

If, for example, we hear that someone loves music, we might think it’s more likely they’re a professional musician than an accountant. However, there are many more accountants than there are professional musicians. Here we have neglected that the base rate for the number of accountants is far higher than the number of musicians, so we were unduly swayed by the information that the person likes music.

How does it happen?

The base rate fallacy occurs when the base rate for one option is substantially higher than for another.

Example

Consider testing for a rare medical condition, such as one that affects only 4% (1 in 25) of a population.

Let’s say there is a test for the condition, but it’s not perfect. If someone has the condition, the test will correctly identify them as being ill around 92% of the time. If someone doesn’t have the condition, the test will correctly identify them as being healthy 75% of the time.

So if we test a group of people, and find that over a quarter of them are diagnosed as being ill, we might expect that most of these people really do have the condition. But we’d be wrong.


In a typical sample of 300 patients, for every 11 people correctly identified as unwell, a further 72 are incorrectly identified as unwell.
The Conversation, CC BY-ND

According to our numbers above, of the 4% of patients who are ill, almost 92% will be correctly diagnosed as ill (that is, about 3.67% of the overall population). But of the 96% of patients who are not ill, 25% will be incorrectly diagnosed as ill (that’s 24% of the overall population).

What this means is that of the approximately 27.67% of the population who are diagnosed as ill, only around 3.67% actually are. So of the people who were diagnosed as ill, only around 13% (that is, 3.67%/27.67%) actually are unwell.

Worryingly, when a famous study asked general practitioners to perform a similar calculation to inform patients of the correct risks associated with mammogram results, just 15% of them did so correctly.


Will Rogers paradox

What is it?

This occurs when moving something from one group to another raises the average of both groups, even though no values actually increase.

The name comes from the American comedian Will Rogers, who joked that “when the Okies left Oklahoma and moved to California, they raised the average intelligence in both states”.

Former New Zealand Prime Minister Rob Muldoon provided a local variant on the joke in the 1980s, regarding migration from his nation into Australia.

How does it happen?

When a datapoint is reclassified from one group to another, if the point is below the average of the group it is leaving, but above the average of the one it is joining, both groups’ averages will increase.

Example

Consider the case of six patients whose life expectancies (in years) have been assessed as being 40, 50, 60, 70, 80 and 90.

The patients who have life expectancies of 40 and 50 have been diagnosed with a medical condition; the other four have not. This gives an average life expectancy within diagnosed patients of 45 years and within non-diagnosed patients of 75 years.

If an improved diagnostic tool is developed that detects the condition in the patient with the 60-year life expectancy, then the average within both groups rises by 5 years.

The Conversation, CC BY-ND

Berkson’s paradox

What is it?

Berkson’s paradox can make it look like there’s an association between two independent variables when there isn’t one.

How does it happen?

This happens when we have a set with two independent variables, which means they should be entirely unrelated. But if we only look at a subset of the whole population, it can look like there is a negative trend between the two variables.

This can occur when the subset is not an unbiased sample of the whole population. It has been frequently cited in medical statistics. For example, if patients only present at a clinic with disease A, disease B or both, then even if the two diseases are independent, a negative association between them may be observed.

Example

Consider the case of a school that recruits students based on both academic and sporting ability. Assume that these two skills are totally independent of each other. That is, in the whole population, an excellent sportsperson is just as likely to be strong or weak academically as is someone who’s poor at sport.

If the school admits only students who are excellent academically, excellent at sport or excellent at both, then within this group it would appear that sporting ability is negatively correlated with academic ability.

To illustrate, assume that every potential student is ranked on both academic and sporting ability from 1 to 10. There are an equal proportion of people in each band for each skill. Knowing a person’s band in either skill does not tell you anything about their likely band in the other.

Assume now that the school only admits students who are at band 9 or 10 in at least one of the skills.

If we look at the whole population, the average academic rank of the weakest sportsperson and the best sportsperson are both equal (5.5).

However, within the set of admitted students, the average academic rank of the elite sportsperson is still that of the whole population (5.5), but the average academic rank of the weakest sportsperson is 9.5, wrongly implying a negative correlation between the two abilities.

The Conversation, CC BY-ND

Multiple comparisons fallacy

What is it?

This is where unexpected trends can occur through random chance alone in a data set with a large number of variables.

How does it happen?

When looking at many variables and mining for trends, it is easy to overlook how many possible trends you are testing. For example, with 1,000 variables, there are almost half a million (1,000×999/2) potential pairs of variables that might appear correlated by pure chance alone.

While each pair is extremely unlikely to look dependent, the chances are that from the half million pairs, quite a few will look dependent.

Example

The Birthday paradox is a classic example of the multiple comparisons fallacy.

In a group of 23 people (assuming each of their birthdays is an independently chosen day of the year with all days equally likely), it is more likely than not that at least two of the group have the same birthday.

People often disbelieve this, recalling that it is rare that they meet someone who shares their own birthday. If you just pick two people, the chance they share a birthday is, of course, low (roughly 1 in 365, which is less than 0.3%).

However, with 23 people there are 253 (23×22/2) pairs of people who might have a common birthday. So by looking across the whole group you are testing to see if any one of these 253 pairings, each of which independently has a 0.3% chance of coinciding, does indeed match. These many possibilities of a pair actually make it statistically very likely for coincidental matches to arise.

For a group of as few as 40 people, it is almost nine times as likely that there is a shared birthday than not.

The probability of no shared birthdays drops as the number of people in a group increases.
The Conversation, CC BY-ND

Stephen Woodcock, Senior Lecturer in Mathematics, University of Technology Sydney

This article was originally published on The Conversation. Read the original article.

Here’s the best way to shuffle a pack of cards – with a little help from some maths

Graham Kendall, University of Nottingham

Shuffling a pack of cards isn’t as easy as you think, not if you want to truly randomise the cards. Most people will give a pack a few shuffles with the overhand or riffle methods (where the pack is split and the two halves are interweaved). But research has shown this isn’t enough to produce a sufficiently random order to make sure the card game being played is completely fair and to prevent people cheating. The Conversation

As I wrote in a recent article about card counting, not having an effective shuffling mechanism can be a serious problem for casinos:

Players have used shuffle tracking, where blocks of cards are tracked so that you have some idea when they will appear. If you are given the option to cut the pack, you try and cut the pack near where you think the block of cards you are tracking is so that you can bet accordingly. A variant on this is to track aces as, if you know when one is likely to appear, you have a distinct advantage over the casino.

Card Counting and Shuffle Tracking in Blackjack.

So how can you make sure your cards are well and truly shuffled?

To work out how many ways there are of arranging a standard 52-card deck, we multiply 52 by all the numbers that come before it (52 x 51 x 50 … 3 x 2 x 1). This is referred to as “52 factorial” and is usually written as “52!” by mathematicians. The answer is so big it’s easier to write it using scientific notation as 8.0658175e+67, which means it’s a number beginning with 8, followed by 67 more digits.

To put this into some sort of context, if you dealt one million hands of cards every second, it would take you 20 sexdecillion, or 20,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, years to deal the same number of hands as there are ways to arrange a deck of cards.

You would think that it would be easy to get a random order from that many permutations. In fact, every arrangement is, in a sense, random. Even one where the cards are ordered by suit and then rank could be considered random. It is only the interpretation we put on this order that would make most people not consider it random. This is the same as the idea that the lottery is less likely to throw up the numbers one to six, whereas in reality this combination is just as probable as any other.

In theory, you could shuffle a deck so that the cards emerged in number order (all the aces, followed by all the twos, followed by all the threes and so on), with each set of numbers in the same suit order (say spades, hearts, diamonds and clubs). Most people would not consider this random, but it is just as likely to appear as any other specific arrangement of cards (very unlikely). This is an extreme example but you could come up with an arrangement that would be seen as random when playing bridge because it offered the players no advantage, but wouldn’t be random for poker because it produced consistently strong hands.

But what would a casino consider random? Mathematicians have developed several ways of measuring how random something is. Variation distance and separation distance are two measures calculated by mathematical formulas. They have a value of 1 for a deck of cards in perfect order (sorted by numbers and suits) and lower values for more mixed arrangements. When the values are less than 0.5, the deck is considered randomly shuffled. More simply, if you can guess too many cards in a shuffled deck, then the deck is not well shuffled.

The Best (and Worst) Ways to Shuffle Cards – Numberphile.

Persi Diaconis is a mathematician who has been studying card shuffling for over 25 years. Together with and Dave Bayer, he worked out that to produce a mathematically random pack, you need to use a riffle shuffle seven times if you’re using the variation distance measure, or 11 times using the separation distance. The overhand shuffle, by comparison, requires 10,000 shuffles to achieve randomness.

“The usual shuffling produces a card order that is far from random,” Diaconis has said. “Most people shuffle cards three or four times. Five times is considered excessive”.

But five is still lower than the number required for an effective shuffle. Even dealers in casinos rarely shuffle the required seven times. The situation is worse when more than one deck is used, as is the case in blackjack. If you are shuffling two decks, you should shuffle nine times and for six decks you need to shuffle twelve times.

Shuffle like a casino dealer.

Many casinos now use automatic shuffling machines. This not only speeds up the games but also means that shuffles can be more random, as the machines can shuffle for longer than the dealers. These shuffling machines also stop issues such as card counting and card tracking.

But even these machines are not enough. In another study, Diaconis and his colleagues were asked by a casino to look at a new design of a card shuffling machine that the casino had built. The researchers found that the machine was not sufficiently random, as they simply did not shuffle enough times. But using the machine twice would resolve the problem.

So next time you’re at a casino, take a look at how many times the dealers shuffle. The cards may not be as random as you think they are, which could be to your advantage.

Graham Kendall, Professor of Computer Science and Provost/CEO/PVC, University of Nottingham

This article was originally published on The Conversation. Read the original article.