In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data. [Read more…] about Guide to Data Types and How to Graph Them in Statistics
Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, an inspection process produces binary pass/fail results. Or, when a customer enters a store, there are two possible outcomes—sale or no sale. In this post, I show you how to use the binomial, geometric, negative binomial, and the hypergeometric distributions to glean more information from your binary data. [Read more…] about Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions
Anecdotal evidence is a story told by individuals. It comes in many forms that can range from product testimonials to word of mouth. It’s often testimony, or a short account, about the truth or effectiveness of a claim. Typically, anecdotal evidence focuses on individual results, is driven by emotion, and presented by individuals who are not subject area experts. [Read more…] about Learn How Anecdotal Evidence Can Trick You!
The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply. [Read more…] about The Importance of Statistics
Regression analysis mathematically describes the relationship between independent variables and the dependent variable. It also allows you to predict the mean value of the dependent variable when you specify values for the independent variables. In this regression tutorial, I gather together a wide range of posts that I’ve written about regression analysis. My tutorial helps you go through the regression content in a systematic and logical order. [Read more…] about Regression Tutorial with Analysis Examples
In a previous blog post, I introduced the basic concepts of hypothesis testing and explained the need for performing these tests. In this post, I’ll build on that and compare various types of hypothesis tests that you can use with different types of data, explore some of the options, and explain how to interpret the results. Along the way, I’ll point out important planning considerations, related analyses, and pitfalls to avoid. [Read more…] about Comparing Hypothesis Tests for Continuous, Binary, and Count Data
In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. [Read more…] about Statistical Hypothesis Testing Overview
Regression analysis mathematically describes the relationship between a set of independent variables and a dependent variable. There are numerous types of regression models that you can use. This choice often depends on the kind of data you have for the dependent variable and the type of model that provides the best fit. In this post, I cover the more common types of regression analyses and how to decide which one is right for your data. [Read more…] about Choosing the Correct Type of Regression Analysis
Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model. [Read more…] about Understanding Interaction Effects in Statistics
Use regression analysis to describe the relationships between a set of independent variables and the dependent variable. Regression analysis produces a regression equation where the coefficients represent the relationship between each independent variable and the dependent variable. You can also use the equation to make predictions.
As a statistician, I should probably tell you that I love all statistical analyses equally—like parents with their kids. But, shhh, I have secret! Regression analysis is my favorite because it provides tremendous flexibility, which makes it useful in so many different circumstances. In fact, I’ve described regression analysis as taking correlation to the next level!
In this blog post, I explain the capabilities of regression analysis, the types of relationships it can assess, how it controls the variables, and generally why I love it! You’ll learn when you should consider using regression analysis. [Read more…] about When Should I Use Regression Analysis?
Log-log plots display data in two dimensions where both axes use logarithmic scales. When one variable changes as a constant power of another, a log-log graph shows the relationship as a straight line. In this post, I’ll show you why these graphs are valuable and how to interpret them. [Read more…] about Using Log-Log Plots to Determine Whether Size Matters
Standardization is the process of putting different variables on the same scale. In regression analysis, there are some scenarios where it is crucial to standardize your independent variables or risk obtaining misleading results.
In this blog post, I show when and why you need to standardize your variables in regression analysis. Don’t worry, this process is simple and helps ensure that you can trust your results. In fact, standardizing your variables can reveal essential findings that you would otherwise miss! [Read more…] about When Do You Need to Standardize the Variables in a Regression Model?
With the arrival of Fall in the Northern hemisphere, it’s flu season again.
Do you debate getting a flu shot every year? I do get flu shots every year. I realize that they’re not perfect, but I figure they’re a low-cost way to reduce my chances of a crummy week suffering from the flu.
The media report that flu shots have an effectiveness of approximately 68%. But, what does that mean exactly? What is the absolute reduction in risk? Are there long-term benefits?
In this blog post, I explore the effectiveness of flu shots from a statistical viewpoint. We’ll statistically analyze the data ourselves so we can go beyond the simplified accounts that the media presents. I’ll also model the long-term outcomes you can expect with regular flu vaccinations. By the time you finish this post, you’ll have a crystal clear picture of flu shot effectiveness. Some of the results surprised me! [Read more…] about Flu Shots, How Effective Are They?
In statistics, the degrees of freedom (DF) indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an important idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and regression analysis. Learn how this fundamental concept affects the power and precision of your statistical analysis!
In this blog post, I bring this concept to life in an intuitive manner. I’ll start by defining degrees of freedom. However, I’ll quickly move on to practical examples in a variety of contexts because they make this concept easier to understand. [Read more…] about Degrees of Freedom in Statistics
Typically, quality improvement analysts use control charts to assess business processes and don’t have hypothesis tests in mind. Do you know how control charts provide tremendous benefits in other settings and with hypothesis testing? Spoilers—control charts check an assumption that we often forget about for hypothesis tests! [Read more…] about Use Control Charts with Hypothesis Tests
Nonlinear regression analysis cannot calculate P values for the independent variables in your model. Why not? And, what do you use instead? Those are the topics of this blog post. [Read more…] about Why Are There No P Values in Nonlinear Regression?
Regression is a very powerful statistical analysis. It allows you to isolate and understand the effects of individual variables, model curvature and interactions, and make predictions. Regression analysis offers high flexibility but presents a variety of potential pitfalls. Great power requires great responsibility!
In this post, I offer five tips that will not only help you avoid common problems but also make the modeling process easier. I’ll close by showing you the difference between the modeling process that a top analyst uses versus the procedure of a less rigorous analyst. [Read more…] about Five Regression Analysis Tips to Avoid Common Problems
The ability to reproduce experimental results should be related to P values. After all, both of these statistical concepts have similar foundations.
- P values help you separate the signal of population level effects from the noise in sample data.
- Reproducible results support the notion that the findings can be generalized to the population rather than applying only to a specific sample.
So, P values are related to reproducibility in theory. But, does this relationship exist in the real world? In this blog post, I present the findings of an exciting study that answers this question! [Read more…] about What is the Relationship Between the Reproducibility of Experimental Results and P Values?
Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes
Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).
To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis