The point-biserial correlation coefficient measures the strength and direction of the relationship between a continuous variable and a dichotomous variable (a variable with exactly two categories). It is a special case of Pearson’s correlation, used when one variable is quantitative and the other is binary, such as test score vs. pass/fail, or income vs. employed/unemployed.
The point-biserial correlation is appropriate when the binary variable is truly categorical, meaning it represents distinct groups, not artificial cutoffs from a continuous variable. Recoding a continuous variable into two categories (such as “high” vs. “low” income) can distort relationships and lead to misleading interpretations.
The coefficient ranges from -1 to 1:
- A positive value indicates that the group coded as 1 tends to have higher values on the continuous variable.
- A negative value indicates that the group coded as 1 tends to have lower values.
Although the formula for point-biserial correlation looks slightly different, the result is numerically identical to Pearson’s correlation if the binary variable is coded as 0 and 1. So, if you run a Pearson correlation on a dataset with a continuous variable and a 0/1 binary variable, you’ll get the same result as the point-biserial correlation. It’s important to use 0 and 1 coding and note which values correspond to each category because the direction of the correlation depends on which group is labeled as 1.
For example, a researcher investigates whether gender (coded 0 for male, 1 for female) is related to multitasking ability. The point-biserial correlation is 0.35, suggesting that females (coded 1) tend to score higher on multitasking tasks, with a moderate positive association.
« Back to Glossary Index