The law of large numbers states that as the number of trials increases, sample values tend to converge on the expected result. The two forms of this law lay the foundation for both statistics and probability theory.

In this post, I explain both forms of the law, simulate them in action, and explain why they’re crucial for statistics and probability!

## Weak Law of Large Numbers

There are two forms of the law of large numbers, but the differences are primarily theoretical. The weak and strong laws of large numbers both apply to a sequence of values for independent and identically distributed (i.i.d.) random variables: *X*_{1}, *X*_{2}, …, *X _{n}*.

The weak law of large numbers states that as n increases, the sample statistic of the sequence converges in probability to the population value. The weak law of large numbers is also known as Khinchin’s law.

Here’s what that means. Suppose you specify a nonzero difference between the theoretical value and the sample value. For example, you might define a difference between the theoretical probability for coin toss results (0.50) and the actual proportion you obtain over multiple trials. As the number of trials increases, the probability that the actual difference will be smaller than this predefined difference also increases. This probability converges on 1 as the sample size approaches infinity.

This idea applies even when you define tiny differences between the actual and expected values. You just need a larger sample!

## Strong Law of Large Numbers

The strong law of large numbers describes how a sample statistic converges on the population value as the sample size or the number of trials increases. For example, the sample mean will converge on the population mean as the sample size increases. The strong law of large numbers is also known as Kolmogorov’s strong law.

Both laws apply to various characteristics, ranging from the means for continuous variables to the proportions for Bernoulli trials. I’ll simulate both of these scenarios next!

## Simulations for the Law of Large Numbers

While there are mathematical proofs for both laws of large numbers, I will simulate them using my favorite random sampling program, Statistics101! You can download it for free.

Here are my scripts for the IQ example and the coin toss example. You can perform the simulations yourself and see the results. I include example graphs below that I created using these scripts. I exported the data into Excel for prettier graphs, but Statistics101 produces graphs too. Your simulations won’t match mine, but they should follow the same overall pattern that I discuss.

### IQ Example

Imagine that we’re studying IQ scores. We are randomly selecting 100 participants and measuring their IQs. As we gather subjects, we’ll assess their IQ and then recalculate the sample mean with each additional person. This process produces a sequence of sample means as the sample size increases from 1 to 100. If the law of large numbers holds true, we’d expect the sample means to converge on the population mean as the sample size increases. Let’s see!

For this population, I’ll define the population distribution of IQ scores as following a normal distribution with a mean of 100 and a standard deviation of 15.

As you can see, the sample means converge on the population mean IQ value of 100. At the beginning of the sequence, they’re more erratic, but they stabilize and converge on the correct value as the sample size increases.

### Coin Flipping Example

Now, let’s look at coin flips. This is a Bernoulli Trial because there are precisely two outcomes, heads or tails. The data are binary and follow the binomial distribution defined by a proportion of events. For this scenario, we’ll define an event as heads in the coin toss. A coin toss is one trial. The law of large numbers predicts that as the number of trials increases, the proportion will converge on the expected value of 0.50.

It works! The sample proportion become more stable and converges on the expected probability value of 0.50 as the sample size increases.

## Practical Implications of the Law of Large Numbers

The law of large numbers is essential to both statistics and probability theory.

For statistics, both laws of large numbers indicate that larger samples produce estimates that are consistently closer to the population value. These properties become important in inferential statistics, where you use samples to estimate the properties of populations. That’s why you always hear statisticians saying that large sample sizes are better!

**Related post**: Inferential versus Descriptive Statistics

In probability theory, as the number of trials increases, the relative frequency of observed events will converge on the expected probability value. If you flip a coin four times, it’s not surprising to get three heads (75%). However, after 100 coin flips, the percentage will be extremely close to 50%.

These laws bring a type of order to random events. For example, if you’re talking about flipping coins, rolling dice, or games of chance, you are more likely to observe an unusual sequence of events over the short run. However, as the number of trials grows, the overall outcomes converge on the expected probability.

Consequently, casinos with a large volume of traffic can predict their earnings for games of chance. Their earnings will converge on a predictable percentage over a large number of games. You might beat the house with several lucky hands, but in the long run, the house always wins!

**Related post**: Fundamentals of Probability

## When the Laws of Large Numbers Fail

There are specific situations where the laws of large numbers can fail to converge on the expected value as the sample size or the number of trials increase. When the data follow the Cauchy distribution, the numbers can’t converge on an expected value because the Cauchy distribution does not have an expected value. Similarly, the laws don’t apply to the Pareto distribution because its expected value is infinite.

Emikel says

Hello Jim,

Does the Law of Large numbers also applies to dependent events? I’m curious about this because of Markov Chains. Markov Chains uses semi-dependent events or states when calculating the probability of moving from one state to the next state. Another example could be the standards playing cards, like what’s the probability of getting two Aces in a row over a large number of samples. I read somewhere that this law could also apply to dependent events. Is this true? Thank you and have a good day.

Sincerely,

Emikel

Jim Frost says

Hi Emikel,

I believe it can apply in those cases but you have to be careful in doing so. You need to know exactly what the dependent condition is that affects the next outcome. And then know that the law of large numbers applies to that probability. For example, suppose event B has a 60% chance of occurring if event A occurs, but only a 30% chance if event A does not occur. With a large number of opportunities, you’d expect that the observed frequencies will close in on those theoretical frequencies. But, you’ll need to be careful to know whether A occurs or not and keep track of the results accordingly. Bu, with a large number of outcomes you’d expect the observed frequencies to be close to 60% and 30%, respectively.

Alejandra Carranza says

I love these topics thanks for sharing!

Animesh Tulsyan says

What are Cauchy distribution and Pareto Distribution ?