Are you puzzled by strange statistical terms or abbreviations? Are you looking for a statistical dictionary that explains these statistical terms in plain English? You’re at the right place! Jim’s Statistics Glossary lists and explains the most commonly used terms in statistics. This is the best place for those learning statistics to start and familiarize themselves with statistical jargon. If you would like for me to explain something that is not listed here, please contact me.

- a
- AlphaThe significance level, also known as alpha or α, is a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant. The researcher determines the significance level before(...)
- Alternative hypothesisThe alternative hypothesis is one of two mutually exclusive hypotheses in a hypothesis test. The alternative hypothesis states that a population parameter does not equal a specified value. Typically, this value is the null hypothesis value associated with no effect, such as zero. If your(...)
- Attribute variablesA categorical variable has values that you can put into a countable number of distinct groups based on a characteristic. For a categorical variable, you can assign categories but the categories have no natural order. If the variable has a natural order, it is an ordinal variable. Categorical(...)
- b
- Biased estimatorA sample statistic that estimates a population parameter. The value of the estimator is referred to as a point estimate. There are several different types of estimators. If the expected value of the estimator equals the population parameter, the estimator is an unbiased estimator. If(...)
- Binary logistic regressionBinary logistic regression models the relationship between a set of predictors and a binary response variable. A binary response has only two possible values, such as win and lose. Use a binary regression model to understand how changes in the predictor values are associated with changes in(...)
- Binary variablesIf you can place an observation into only two categories, you have a binary variable. For example pass/fail data are binary. With binomial data, you can calculate and assess proportions and percentages.
- c
- Categorical variablesA categorical variable has values that you can put into a countable number of distinct groups based on a characteristic. For a categorical variable, you can assign categories but the categories have no natural order. If the variable has a natural order, it is an ordinal variable. Categorical(...)
- CIA confidence interval is a range of values, derived from sample statistics, which is likely to contain the value of an unknown population parameter. Because of their random nature, it is unlikely that two samples from a given population will yield identical confidence intervals. But, if you(...)
- Coefficient of determinationR-squared is the percentage of the response variable variation that is explained by a linear model. It is always between 0 and 100%. R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the(...)
- CoefficientsRegression coefficients are estimates of the unknown population parameters and describe the relationship between a predictor variable and the response. In linear regression, coefficients are the values that multiply the predictor values. Suppose you have the following regression equation: y =(...)
- Confidence intervalA confidence interval is a range of values, derived from sample statistics, which is likely to contain the value of an unknown population parameter. Because of their random nature, it is unlikely that two samples from a given population will yield identical confidence intervals. But, if you(...)
- Confidence interval of the predictionA confidence interval of the prediction provides a range of values for the mean response associated with specific predictor settings. For example, for a 95% confidence interval of the prediction of [7 8], you can be 95% confident that the mean response will fall within this range. The(...)
- Continuous variablesContinuous variables can take on almost any numeric value and can be meaningfully divided into smaller increments, including fractional and decimal values. You often measure a continuous variable on a scale. For example, when you measure height, weight, and temperature, you have continuous(...)
- CorrelationA correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. A correlation coefficient measures both the direction and the strength of this tendency to vary together. A positive correlation indicates that as(...)
- d
- Descriptive statisticsDescriptive statistics are numbers that summarize data, such as the mean, standard deviation, percentages, rates, counts, and range. Descriptive statistics simply describe the data but do not try to generalize beyond the data. For example, we can describe starting salaries of college majors(...)
- e
- EffectThe effect is the difference between the true population parameter and the null hypothesis value. Effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect. The true(...)
- EstimatorA sample statistic that estimates a population parameter. The value of the estimator is referred to as a point estimate. There are several different types of estimators. If the expected value of the estimator equals the population parameter, the estimator is an unbiased estimator. If(...)
- f
- Factors
- Fitted line plotsFitted line plots display the fitted values for all predictor values in your observation space. Use these plots to assess model fit by comparing how well the fitted values follow the observed values.
- Fitted valuesA fitted value is a statistical model’s prediction of the mean response value when you input the values of the predictors, factor levels, or components into the model. Suppose you have the following regression equation: y = 3X + 5. If you enter a value of 5 for the predictor, the fitted value(...)
- Fixed and Random factorsIn ANOVA, factors are either fixed or random. In general, if the investigator controls the levels of a factor, the factor is fixed. The investigator gathers data for all factor levels she is interested in. On the other hand, if the investigator randomly sampled the levels of a factor from a(...)
- h
- Hypothesis testsA hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. These two statements are called the null hypothesis and the alternative hypothesis. Hypothesis tests are not 100% accurate because they use a(...)
- i
- Inferential statisticsInferential statistics use a random sample to draw conclusions about the population. Typically, it is not practical to obtain data from every member of a population. Instead, we collect a random sample from a small proportion of the population. From the sample, statistical procedures can infer(...)
- l
- Linear least squaresOrdinary least squares, or linear least squares, estimates the parameters in a regression model by minimizing the sum of the squared residuals. This method draws a line through the data points that minimizes the sum of the squared differences between the observed values and the corresponding(...)
- m
- ModeThe mode is the value that occurs most frequently in a set of observations. You can find the mode simply by counting the number of times each value occurs in a data set. For example, if the weights of five apples are 5, 5, 6, 7, and 8, the apple weight mode is 5 because it is the most(...)
- n
- Nominal logistic regressionNominal logistic regression models the relationship between a set of predictors and a nominal response variable. A nominal response has at least three groups which do not have a natural order, such as scratch, dent, and tear.
- Nominal variablesNominal variables have at least three categories and there is no natural order to these categories. For example, science fiction, drama, and comedy are nominal data.
- Null hypothesisThe null hypothesis is one of two mutually exclusive hypotheses in a hypothesis test. The null hypothesis states that a population parameter equals a specified value. If your sample contains sufficient evidence, you can reject the null hypothesis and conclude that the effect is statistically(...)
- o
- OLSOrdinary least squares, or linear least squares, estimates the parameters in a regression model by minimizing the sum of the squared residuals. This method draws a line through the data points that minimizes the sum of the squared differences between the observed values and the corresponding(...)
- Ordinal logistic regressionOrdinal logistic regression models the relationship between a set of predictors and an ordinal response variable. An ordinal response has at least three groups which have a natural order, such as hot, medium, and cold.
- Ordinal variablesOrdinal variables have at least three categories and the categories have a natural order. The categories are ranked but the differences between ranks may not be equal. For example, first, second, and third in a race are ordinal data. The difference in time between first and second place might(...)
- Ordinary least squaresOrdinary least squares, or linear least squares, estimates the parameters in a regression model by minimizing the sum of the squared residuals. This method draws a line through the data points that minimizes the sum of the squared differences between the observed values and the corresponding(...)
- Outliers
- p
- P-valueA p-value is the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is true for the populations. P-values are calculated based on your sample data and under the assumption that the null hypothesis is true. Lower p-values indicate greater(...)
- ParameterParameters are the unknown values of an entire population, such as the mean and standard deviation. Samples can estimate population parameters but their exact values are usually unknowable. Parameters are also the constant values that appear in probability functions. These parameters define(...)
- Pearson product moment correlationA correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. A correlation coefficient measures both the direction and the strength of this tendency to vary together. A positive correlation indicates that as(...)
- PIA prediction interval is a range of values that is likely to contain the value of a single new observation given specified settings of the predictors. For example, for a 95% prediction interval of [5 10], you can be 95% confident that the next new observation will fall within this(...)
- Poisson variablesPoisson variables are a count of the presence of a characteristic, result, or activity over a constant amount of time, area, or other length of observation. Poisson data are evaluated in counts per a constant unit size. With a Poisson variable, you can calculate and assess a rate of occurrence.
- PopulationIn statistics, a population is the complete set of all objects or people of interest. Typically, studies definite their population of interest at the outset. Populations can have a finite size but potentially very large size. For example, All valves produced by a specific manufacturing(...)
- PowerStatistical power in a hypothesis test is the probability that the test can detect an effect that truly exists. If an effect truly exists at the population level, it’s entirely possible that a test based on a sample can fail to detect this effect. The higher the power, the more likely the(...)
- Predicted valuesA fitted value is a statistical model’s prediction of the mean response value when you input the values of the predictors, factor levels, or components into the model. Suppose you have the following regression equation: y = 3X + 5. If you enter a value of 5 for the predictor, the fitted value(...)
- Prediction intervalsA prediction interval is a range of values that is likely to contain the value of a single new observation given specified settings of the predictors. For example, for a 95% prediction interval of [5 10], you can be 95% confident that the next new observation will fall within this(...)
- q
- Qualitative variablesA categorical variable has values that you can put into a countable number of distinct groups based on a characteristic. For a categorical variable, you can assign categories but the categories have no natural order. If the variable has a natural order, it is an ordinal variable. Categorical(...)
- r
- R-squaredR-squared is the percentage of the response variable variation that is explained by a linear model. It is always between 0 and 100%. R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the(...)
- Random factorsIn ANOVA, factors are either fixed or random. In general, if the investigator controls the levels of a factor, the factor is fixed. The investigator gathers data for all factor levels she is interested in. On the other hand, if the investigator randomly sampled the levels of a factor from a(...)
- Regression analysisRegression analysis models the relationships between a response variable and one or more predictor variables. Use a regression model to understand how changes in the predictor values are associated with changes in the response mean. You can also use regression to make predictions based on the(...)
- Regression coefficientsRegression coefficients are estimates of the unknown population parameters and describe the relationship between a predictor variable and the response. In linear regression, coefficients are the values that multiply the predictor values. Suppose you have the following regression equation: y =(...)
- ReliabilityIn statistics, reliability is the consistency a measure. If you measure the same thing many times, are the measurements consistent? A highly reliable measure is more consistent than a measure with low reliability. A measure can be reliable but not valid. In other words, you can obtain(...)
- Residuals
- s
- SampleA sample is a subset of the entire population. In inferential statistics, the goal is to use the sample to learn about the population. Consequently, the sample typically is selected in a manner that allows it to be an unbiased representation of the entire population. Drawing a random sample is(...)
- Significance levelThe significance level, also known as alpha or α, is a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant. The researcher determines the significance level before(...)
- Skewed dataSkewed data are not equally distributed on both sides of the distribution—it is not a symmetrical distribution. Use a histogram to easily see whether your data are skewed. When you refer to skewed data, you can describe it as either right skewed or left skewed. Data are skewed right when(...)
- Spearman rank-order correlationA correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. A correlation coefficient measures both the direction and the strength of this tendency to vary together. A positive correlation indicates that as(...)
- Standard scoresIn statistics, standardization is the process of putting different variables on the same scale. This process allows you to compare scores between different types of variables. Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each(...)
- Standard error of the regressionThe standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable.(...)
- StandardizationIn statistics, standardization is the process of putting different variables on the same scale. This process allows you to compare scores between different types of variables. Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each(...)
- Statistical inferenceInferential statistics use a random sample to draw conclusions about the population. Typically, it is not practical to obtain data from every member of a population. Instead, we collect a random sample from a small proportion of the population. From the sample, statistical procedures can infer(...)
- StatisticsThe field of statistics is the science of learning from data. When statistical principles are correctly applied, statistical analyses tend to produce accurate results. What’s more, the analyses even account for real-world uncertainty in order to calculate the probability of being(...)
- t
- Type I errorIn a hypothesis test, a type I error occurs when you reject a null hypothesis that is actually true. In other words, a statistically significant test result suggests that a population effect exists but, in reality, it does not exist. The difference you observed in the sample is the product of(...)
- Type II errorIn a hypothesis test, a type II error occurs when you fail to reject a null hypothesis that is actually false. In other words, you obtain an insignificant test result even though a population effect actually exists. Some combination of a small sample size, inherent variability in the data, and(...)
- u
- Unbiased estimatorA sample statistic that estimates a population parameter. The value of the estimator is referred to as a point estimate. There are several different types of estimators. If the expected value of the estimator equals the population parameter, the estimator is an unbiased estimator. If(...)
- v
- ValidityIn statistics, validity is the degree that an assessment measures what it is supposed to measure. A test that is not reliable cannot be valid. If repeated measurements are inconsistent, they're not a valid measure of the characteristic.