Difference between Descriptive and Inferential Statistics

By Jim Frost 90 Comments

Descriptive and inferential statistics are two broad categories in the field of statistics. In this blog post, I show you how both types of statistics are important for different purposes. Interestingly, some of the statistical measures are similar, but the goals and methodologies are very different.

Descriptive Statistics

Image of a person holding a pen with a calculator and graphs. — Both descriptive and inferential statistics help make sense out of row after row of data!

Use descriptive statistics to summarize and graph the data for a group that you choose. This process allows you to understand that specific set of observations.

Descriptive statistics describe a sample. That’s pretty straightforward. You simply take a group that you’re interested in, record data about the group members, and then use summary statistics and graphs to present the group properties. With descriptive statistics, there is no uncertainty because you are describing only the people or items that you actually measure. You’re not trying to infer properties about a larger population.

The process involves taking a potentially large number of data points in the sample and reducing them down to a few meaningful summary values and graphs. This procedure allows us to gain more insights and visualize the data than simply pouring through row upon row of raw numbers!

Common tools of descriptive statistics

Descriptive statistics frequently use the following statistical measures to describe groups:

Central tendency: Use the mean or the median to locate the center of the dataset. This measure tells you where most values fall.

Dispersion: How far out from the center do the data extend? You can use the range or standard deviation to measure the dispersion. A low dispersion indicates that the values cluster more tightly around the center. Higher dispersion signifies that data points fall further away from the center. We can also graph the frequency distribution.

Skewness: The measure tells you whether the distribution of values is symmetric or skewed. See: Skewed Distributions

You can present this summary information using both numbers and graphs. These are the standard descriptive statistics, but there are other descriptive analyses you can perform, such as assessing the relationships of paired data using correlation and scatterplots.

Related posts: Measures of Central Tendency and Measures of Dispersion

Example of descriptive statistics

Suppose we want to describe the test scores in a specific class of 30 students. We record all of the test scores and calculate the summary statistics and produce graphs. Here is the CSV data file: Descriptive_statistics.

Statistic	Class value
Mean	79.18
Range	66.21 – 96.53
Proportion >= 70	86.7%

These results indicate that the mean score of this class is 79.18. The scores range from 66.21 to 96.53, and the distribution is symmetrically centered around the mean. A score of at least 70 on the test is acceptable. The data show that 86.7% of the students have acceptable scores.

Collectively, this information gives us a pretty good picture of this specific class. There is no uncertainty surrounding these statistics because we gathered the scores for everyone in the class. However, we can’t take these results and extrapolate to a larger population of students.

We’ll do that later.

A good exploratory tool for descriptive statistics is the five-number summary, which presents a set of distributional properties for your sample.

Related post: Analyzing Descriptive Statistics in Excel

Inferential Statistics

Inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn. Because the goal of inferential statistics is to draw conclusions from a sample and generalize them to a population, we need to have confidence that our sample accurately reflects the population. This requirement affects our process. At a broad level, we must do the following:

Define the population we are studying.
Draw a representative sample from that population.
Use analyses that incorporate the sampling error.

We don’t get to pick a convenient group. Instead, random sampling allows us to have confidence that the sample represents the population. This process is a primary method for obtaining samples that mirrors the population on average. Random sampling produces statistics, such as the mean, that do not tend to be too high or too low. Using a random sample, we can generalize from the sample to the broader population. Unfortunately, gathering a truly random sample can be a complicated process. Learn more about Making Statistical Inferences.

You can use the following methods to collect a representative sample:

In contrast, convenience sampling doesn’t tend to obtain representative samples. These samples are easier to collect but the results are minimally useful.

Pros and cons of working with samples

You gain tremendous benefits by working with a random sample drawn from a population. In most cases, it is simply impossible to measure the entire population to understand its properties. The alternative is to gather a random sample and then use the methodologies of inferential statistics to analyze the sample data.

While samples are much more practical and less expensive to work with, there are tradeoffs. Typically, we learn about the population by drawing a relatively small sample from it. We are a very long way off from measuring all people or objects in that population. Consequently, when you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.

For instance, your sample mean is unlikely to equal the population mean exactly. The difference between the sample statistic and the population value is the sampling error. Inferential statistics incorporate estimates of this error into the statistical results.

In contrast, summary values in descriptive statistics are straightforward. The average score in a specific class is a known value because we measured all individuals in that class. There is no uncertainty.

Standard analysis tools of inferential statistics

The most common methodologies in inferential statistics are hypothesis tests, confidence intervals, and regression analysis. Interestingly, these inferential methods can produce similar summary values as descriptive statistics, such as the mean and standard deviation. However, as I’ll show you, we use them very differently when making inferences.

Hypothesis tests

Hypothesis tests use sample data answer questions like the following:

Is the population mean greater than or less than a particular value?
Are the means of two or more populations different from each other?

For example, if we study the effectiveness of a new medication by comparing the outcomes in a treatment and control group, hypothesis tests can tell us whether the drug’s effect that we observe in the sample is likely to exist in the population. After all, we don’t want to use the medication if it is effective only in our specific sample. Instead, we need evidence that it’ll be useful in the entire population of patients. Hypothesis tests allow us to draw these types of conclusions about entire populations.

Confidence intervals (CIs)

In inferential statistics, a primary goal is to estimate population parameters. These parameters are the unknown values for the entire population, such as the population mean and standard deviation. These parameter values are not only unknown but almost always unknowable. Typically, it’s impossible to measure an entire population. The sampling error I mentioned earlier produces uncertainty, or a margin of error, around our estimates.

Suppose we define our population as all high school basketball players. Then, we draw a random sample from this population and calculate the mean height of 181 cm. This sample estimate of 181 cm is the best estimate of the mean height of the population. However, it’s virtually guaranteed that our estimate of the population parameter is not exactly correct.

Confidence intervals incorporate the uncertainty and sample error to create a range of values the actual population value is like to fall within. For example, a confidence interval of [176 186] indicates that we can be confident that the real population mean falls within this range.

Related post: Understanding Confidence Intervals

Regression analysis

Regression analysis describes the relationship between a set of independent variables and a dependent variable. This analysis incorporates hypothesis tests that help determine whether the relationships observed in the sample data actually exist in the population.

For example, the fitted line plot below displays the relationship in the regression model between height and weight in adolescent girls. Because the relationship is statistically significant, we have sufficient evidence to conclude that this relationship exists in the population rather than just our sample.

Related post: When Should I Use Regression Analysis?

Example of inferential statistics

For this example, suppose we conducted our study on test scores for a specific class as I detailed in the descriptive statistics section. Now we want to perform an inferential statistics study for that same test. Let’s assume it is a #0excludeGlossary statewide test. By using the same test, but now with the goal of drawing inferences about a population, I can show you how that changes the way we conduct the study and the results that we present.

In descriptive statistics, we picked the specific class that we wanted to describe and recorded all of the test scores for that class. Nice and simple. For inferential statistics, we need to define the population and then draw a random sample from that population.

Let’s define our population as 8^th-grade students in public schools in the State of Pennsylvania in the United States. We need to devise a random sampling plan to help ensure a representative sample. This process can actually be arduous. For the sake of this example, assume that we are provided a list of names for the entire population and draw a random sample of 100 students from it and obtain their test scores. Note that these students will not be in one class, but from many different classes in different schools across the state.

Inferential statistics results

For inferential statistics, we can calculate the point estimate for the mean, standard deviation, and proportion for our random sample. However, it is staggeringly improbable that any of these point estimates are exactly correct, and there is no way to know for sure anyway. Because we can’t measure all subjects in this population, there is a margin of error around these statistics. Consequently, I’ll report the confidence intervals for the mean, standard deviation, and the proportion of satisfactory scores (>=70). Here is the CSV data file: Inferential_statistics.

Statistic	Population Parameter Estimate (CIs)
Mean	77.4 – 80.9
Standard deviation	7.7 – 10.1
Proportion scores >= 70	77% – 92%

Given the uncertainty associated with these estimates, we can be 95% confident that the population mean is between 77.4 and 80.9. The population standard deviation (a measure of dispersion) is likely to fall between 7.7 and 10.1. And, the population proportion of satisfactory scores is expected to be between 77% and 92%.

Another key inferential statistic is the standard error of the mean. To learn more about it, read my post The Standard Error of the Mean.

Differences between Descriptive and Inferential Statistics

As you can see, the difference between descriptive and inferential statistics lies in the process as much as it does the statistics that you report.

For descriptive statistics, we choose a group that we want to describe and then measure all subjects in that group. The statistical summary describes this group with complete certainty (outside of measurement error).

For inferential statistics, we need to define the population and then devise a sampling plan that produces a representative sample. The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population. The sample size becomes a vital characteristic. The law of large numbers states that as the sample size grows, the sample statistics (i.e., sample mean) will converge on the population value.

A study using descriptive statistics is simpler to perform. However, if you need evidence that an effect or relationship between variables exists in an entire population rather than only your sample, you need to use inferential statistics.

If you’re learning about statistics and like the approach I use in my blog, check out my Introduction to Statistics book! It’s available at Amazon and other retailers.

Comments

Grivin says

December 8, 2022 at 5:17 pm

It is generally not recommended to use a chi-squared test or cross-tabulation on data from purposive sampling because the sample is not representative of the population. Purposive sampling is a non-random sampling method in which the researcher deliberately selects the sample based on specific criteria, such as individuals who are experts in a particular field or who have a specific characteristic. This means that the sample is not randomly selected from the population and may not be representative of the population as a whole.

Loading...

Reply
- Jim Frost says
  
  December 8, 2022 at 7:17 pm
  
  Hi Grivin,
  
  Thanks for writing! I certainly agree with your comment. In general, all inferential statistics are questionable when you don’t have a representative sample. That includes all hypothesis tests, including chi-square tests.
  
  To collect a representative sample, you need to use a probability sampling method. You mention purposive sampling, which is one of several types of NON-probability sampling. You don’t expect non-probability methods to produce representative samples. For more information, read my post about Sampling Methods.
  
  Loading...
  
  Reply
ejersa wako says

May 5, 2022 at 3:30 pm

can I use both descriptive and inferential statistics for one research?

Loading...

Reply
Paulus Mungeyi says

August 26, 2021 at 9:20 am

Hi I have conducted sampling purposive sampling, as I am collect data I came to realize that the population size could be bigger as compared to the figures I have. I am now analysising data, can I use Chi- squre test and cross tabulation using data from purposive sampling?

Loading...

Reply
Binyam says

March 12, 2021 at 4:16 am

Interesting Explanation!

Loading...

Reply
Preet Kaur says

November 27, 2020 at 3:40 am

I’m pursuing my phd on child labour laws in India. In research methodology, I adopted ‘Questionnaire’ (close ended ques) and ‘Interview Schedule’ methods (structure based ques). My both methods are covered under quantitative research. Now ques it what statics tool i will use for prove my hypothesis. Can I use inferential or descriptive? If its inferential statistics then my hypothesis will prove under parameteric test?

Loading...

Reply
Rod says

November 7, 2020 at 3:31 pm

You say under Confidence intervals that, “For example, a confidence interval of [176 186] indicates that we can be confident that the real population mean falls within this range.” Do you mean that there is a, e.g., 95% probability that the interval [176 186] covers the population parameter (mean)? Since the population parameter (mean) is an unknown constant and no probability statement concerning its value can be made, wouldn’t the 95% probability relate to the estimation procedure and not a single calculated interval? What I mean is that if the method of deriving a CI from a new sampling of the population data is done a large number of times, the number of CIs containing the population parameter will tend towards 95%. So, for any one CI calculated, the population parameter either will or will not be within the CI. Thank you for sharing your knowledge.

Loading...

Reply
TICHAFARA MUSINGARIMI says

November 4, 2020 at 5:19 am

Well written article with clear explanations and examples.
Is the book yet published?
I am interested to have a read…

Loading...

Reply
- Jim Frost says
  
  November 4, 2020 at 10:54 pm
  
  Hi Tichafara,
  
  Yes, I have three books published, all are in both ebook and print formats! For more information, and free samples, go to My Webstore!
  
  Loading...
  
  Reply
Ryan Xing says

November 1, 2020 at 10:38 pm

Hi Jim,

Thank you so much for your plain but informative illustration! The use of examples is the biggest strength that suits novice researchers like me. Currently, I don’t have specific questions but I may request some help from you some day. Thank you in advance!

Ryan Xing

Loading...

Reply
Ib du says

October 23, 2020 at 7:14 am

Hi Jim,
I am doing statistical analysis to see the radiation dose to heart of two methods for breast cancer treatment, 3D versus IMRT.
I have two groups of patients, one treated with 3D (6 samples) and another treated with IMRT (4 samples).
What is a good test to use knowing that the radiation dose to heart is a critical?
thank you

Loading...

Reply
Kim Brennan says

October 15, 2020 at 7:09 pm

If I am conducting research that there is an increased likelihood of adolescent loneliness and isolation stemmed from the coronavirus pandemic, I would use inferential analysis correct? If not what data analysis technique would be best to use. I am trying to show that there is.a positive relationship between adolescent loneliness and isolation due to the coronavirus.

Loading...

Reply
AYAZ says

September 30, 2020 at 8:01 am

HI Jim
if some one want to study TUBERCULOSIS Patients in a district. he collects data from tuberculosis patients in that particular district who have been registered from 1st July 2019 to 31st DEC 2019.
what sampling method have been used here if he can collect information from 80% of the said patients.
is it a convenient sampling or else other. please explain ?

Loading...

Reply
- Jim Frost says
  
  September 30, 2020 at 3:55 pm
  
  Hi Ayaz,
  
  It’s not entirely certain. It sounds like they’ve defined their population as that specific group. However, how did the researchers obtain the subjects? If they obtain data from all tuberculosis patients in that district during that timeframe, then it’s not a sample at all. It is the population under study. You’d use descriptive statistics to describe that complete group. However, it sounds like the researcher wants to collect 80% of those patients, which makes it a sample. The research could draw either a random sample or a convenience sample from that population. He could use either to get up to 80%. Hopefully, it’s a random sample though!
  
  Loading...
  
  Reply
Madhav says

June 15, 2020 at 2:26 am

Hi, very good blog! In medicine to help us to understand degree of abnormality ,ex: caliber of a blood vessel
we use z score. Will there be t score as well for that and can we use that?
Regards

Loading...

Reply
- Jim Frost says
  
  June 15, 2020 at 3:27 pm
  
  Hi Madhav,
  
  I’m not sure of the context but I suspect that they’re using Z-scores to show how an individual’s blood vessel compares to the average blood vessel. You would not use t-scores for that purpose. However, if you were performing a hypothesis test on the mean differences between blood vessels for two groups, you’d use a t-test, which does use t-scores. Read my post about performing t-tests for more information.
  
  Additionally, stayed tuned, as I will be releasing my brand new book about hypothesis testing very soon!
  
  Loading...
  
  Reply
Madhav says

June 15, 2020 at 2:06 am

Hi Jim, this is a great stuff! Would you please help me to understand the following question. In medical research, broadly 2 types of studies. One is descriptive and the other analytical.In descrptive studies we use operational verb ‘to estimate’ and in in analytical studies ‘to determine’. Analytical studies are to test the hypothesis and descriptive ones to generate hypothesis. Do descriptive studies use inferential statistics too?
Thanks in advance!

Loading...

Reply
- Jim Frost says
  
  June 15, 2020 at 3:34 pm
  
  Hi Madhav,
  
  Descriptive studies simply describe the group that is measures. The results are not generalized beyond that group. There is no need to use inferential procedures in a descriptive study. There is also no need for estimates because you’re measuring all subjects. For example, if you are simply describing the tests results of a class, you know the average score. It is not an estimate of a population value. You don’t test hypotheses in descriptive studies.
  
  Inferential studies will generalize the results beyond the group and draw inferences about a larger population. All scientific studies use inferential statistics because they don’t want to know whether an effect exists just in a small group of subjects. However, scientific studies can include descriptive statistics about the subjects for informational purposes and to verify they are not unusual in some manner. These studies do derive estimates of the population values and test hypotheses about the properties of the population.
  
  This post describes all of this information throughout. I’d read through it again more carefully.
  
  Loading...
  
  Reply
Maria says

April 22, 2020 at 4:05 pm

Hi,
Thanks for the post. I am completing a paper and my instructor asked me to use inferential statistics to lend credence to my results. I am confused because I only have 11 participants. I have collected the following data from these 11 participants : baseline and week five questionnaires, and 5 weeks of weekly adherence reports. Is it possible to use inferential statistics to present my results?
Thank you,
BM

Loading...

Reply
Luis Villafuerte says

April 17, 2020 at 12:37 pm

Hi Jim,

Thank a lot for your answer. You definitely help me a lot with how to manage the results of my experiment.

Loading...

Reply
Abu ouf says

April 17, 2020 at 11:25 am

Thanks alot,,,,very informative

Loading...

Reply
Luis Villafuerte says

April 3, 2020 at 11:31 pm

Hi Jim, I am wondering if I performed experiments in wich I measured strain for two differentes materials in a condition in wich I varying the external temperature. It is possible to use descriptive statistic for show the results and then inferential statistics for try to compare the behaivor both materials or just i have to choose one of both statistics?

Loading...

Reply
- Jim Frost says
  
  April 5, 2020 at 6:58 pm
  
  Hi Luis,
  
  If you want to apply the results from your sample beyond just the sample, you’ll need to be sure to use a representative sampling method and to use inferential procedures that incorporate estimates of the sampling error. You can include the descriptive statistics, but be sure to mention the results of the inferential procedures, such as statistical significance and confidence intervals. For example, you can say that the difference in mean strength of the two materials is X (descriptive), and that the difference is statistically significant and the CI is [y z] (inferential).
  
  I hope this helps!
  
  Loading...
  
  Reply
Chief says

February 16, 2020 at 5:24 pm

Hello sir,
Your explanations has really helped me to understands most of the concepts. God bless you and I wish you continue this way.

Loading...

Reply
Bakary S Dibba says

February 4, 2020 at 11:04 am

Very helpful thanks a lot

Loading...

Reply
Bethel says

January 30, 2020 at 12:18 am

Hi Jim, thank you for supporting me on the topic I didn’t know before. God bless.
And, I’ve one question if you have time, tell me in detail. All types of inferential statistical tools and describe for which purpose could applied it.

Loading...

Reply
- Jim Frost says
  
  January 31, 2020 at 4:38 pm
  
  Hi Bethel,
  
  That’s an extremely broad question, which I can’t answer in a blog comment. However, in this post, I do mention the key tools of inferential statistics. So, look through this blog post for your answers!
  
  Loading...
  
  Reply
niroshan says

December 18, 2019 at 7:38 am

Thank you.This is very helpful me to understand the difference of these two..

Loading...

Reply
Omar says

December 11, 2019 at 6:35 pm

thank you sir , i wanna ask you one question is there any relationship between the sampling techniques used to gather data and descriptive/inferential statistics ?

Loading...

Reply
- Jim Frost says
  
  December 13, 2019 at 10:31 am
  
  Hi Omar,
  
  I hope you’ve read this blog post thoroughly. It should be clear from this post that for descriptive statistics you just pick the group(s) you’re interested and measure all people/items in them. You don’t collect a sample from those groups but instead measure all members of the group(s). No sampling in descriptive statistics. For inferential statistics, you do take a sample of the larger population and that sample must be representative of that population.
  
  Loading...
  
  Reply
Darren Y says

December 6, 2019 at 3:13 am

Amazing explanation! Much better than the textbook given to me. Thanks!

Loading...

Reply
africa55group says

August 14, 2019 at 6:33 pm

Yyoy simplify complex processes. Amazing

Loading...

Reply
Blessing says

August 2, 2019 at 11:24 pm

Thanks, I now understand the difference between descriptive stat and inferential stat

Loading...

Reply
Muhammad Ayub Sabir says

July 25, 2019 at 8:18 am

It was very helpful. Thanks Sir

Loading...

Reply
Ayodele Marcus says

May 24, 2019 at 7:29 am

Thanks so much for the clear explanations and your time. A quick question please, can i use inferential statistics to test the hypothesis of ” there is no significant relationship between congestion and the ambient air condition”? Also if Yes, was method is most appropriate regression analysis or t test or ANOVA.
Thanks

Loading...

Reply
- Jim Frost says
  
  May 28, 2019 at 9:47 am
  
  Hi, inferential statistics are a collection of procedures that allow you to use random samples drawn from a population to make conclusions about the entire population. Assuming you can define a population for your study area of ambient air condition and draw a random sample from it, you can probably use inferential statistics. The correct analysis to use depends on the goals of your analysis and the type of variables that you use. Because I don’t know that information, I couldn’t tell you whether t-tests or ANOVA are the correct procedure.
  
  Loading...
  
  Reply
Nompumelelo says

March 6, 2019 at 7:31 am

thank you Sir, very easy to understand.

Loading...

Reply
Ashish Kharloya says

February 11, 2019 at 12:55 pm

Fantastic description, makes the concepts really clear for someone who wants a revision of these topics. Much appreciated, Sir!

Loading...

Reply
Raja Wajahat says

February 1, 2019 at 7:51 am

One of the best article i have witnessed on that topic, thanks sir 🙂

Loading...

Reply
- Jim Frost says
  
  February 1, 2019 at 9:53 am
  
  Thank you, Raja! I really appreciate that!
  
  Loading...
  
  Reply
Lance says

January 15, 2019 at 10:23 pm

I like how you simplify the words just to make the topic much clear.

Loading...

Reply
- Jim Frost says
  
  January 15, 2019 at 11:47 pm
  
  Thanks, Lance!
  
  Loading...
  
  Reply
Anum says

December 26, 2018 at 5:54 am

Looking fwd fr ur book it will b a great help

Loading...

Reply
- Jim Frost says
  
  December 29, 2018 at 6:50 pm
  
  Thank you, Anum!
  
  Loading...
  
  Reply
Zed says

October 20, 2018 at 11:57 am

Pretty cleared about this concept now 🙂, you are doing a great job 👍

Loading...

Reply
- Jim Frost says
  
  October 21, 2018 at 1:04 am
  
  Thank you, Zed. I really appreciate the nice comment!
  
  Loading...
  
  Reply
AMANUEL TAFESSE says

October 18, 2018 at 4:25 am

Thanks a lots for your clear and conscious note posts. I understood the better know-how on the area of descriptive and inferential statistics.

Loading...

Reply
- Jim Frost says
  
  October 18, 2018 at 2:02 pm
  
  You’re very welcome, Amanuel. I appreciated your nice comment!
  
  Loading...
  
  Reply
MARIA SOCORRO QUIDER GUIBONE says

October 17, 2018 at 8:18 am

A great help for us who are studying statistics. Thank you for making it easier for us to understand this subject. God bless.

Loading...

Reply
- Jim Frost says
  
  October 17, 2018 at 10:44 am
  
  Thank you, Maria!
  
  Loading...
  
  Reply
Ajay S says

October 5, 2018 at 2:41 am

Hi Sir,

Nice article, I had a question….

I have a dataset which is skewed to the right and when I perform “Descriptive Statistics” it provides MEAN as one of the parameter (Mean = sum/Number of data points), but when I fit the same data to a distribution and I found “Weibull” to be a best fit and calculate “Mean” [Mean of Weibull = Scale *Gamma(1+1/Beta)], now the “Descriptive Statistics” Mean and Weibull Mean have same value, how is this possible when the formulas of calculating Mean are different for each approach?

Loading...

Reply
- Jim Frost says
  
  October 5, 2018 at 9:25 am
  
  Hi,
  
  Just a guess but either beta equals 1, or the descriptive statistics procedure simply uses the general calculation of the mean rather than the Weibull specific calculation.
  
  Loading...
  
  Reply
Rosa M says

September 23, 2018 at 2:15 pm

This was so unbelievably helpful! Thanks for making this so easy to understand!

Loading...

Reply
- Jim Frost says
  
  September 24, 2018 at 10:33 am
  
  You’re very welcome, Rosa! I’m glad it was helpful!
  
  Loading...
  
  Reply
Motlatsi says

September 5, 2018 at 3:15 pm

hy Jim you are inspirational worldwide by helping us thank you so much im now a distinction student in statistics all because of you,you are a blessing to us

Loading...

Reply
- Jim Frost says
  
  September 6, 2018 at 12:46 am
  
  Hi Motlatsi,
  
  Thank you so much for your very kind comment! I really appreciate it. I put a lot of work into my website because I want to make statistics easier to learn for all.
  
  That all said, I’m sure you put in a lot of hard work learning statistics! Congratulations on being such a great student!
  
  I wish you the very best!
  
  Loading...
  
  Reply
Nick says

July 24, 2018 at 9:53 am

Jim,

Thank you for the insight! I wish someone told me this earlier. To follow up with another similar question, most example problems also state “assume alpha = 0.05.” Someone told me that in practice, we use alpha from similar research topics found in industry that pertains to your own. Would you agree with that statement?

-Nick

Loading...

Reply
- Jim Frost says
  
  July 24, 2018 at 2:32 pm
  
  Hey again Nick,
  
  You bet!!
  
  As for significance levels, in the field, the most commonly used alpha by far is 0.05. I almost never see a different value. The most I see is that analysts will adjust the significance level when they’re making many comparisons, such as between the factor levels in an ANOVA.
  
  I do agree with the practice of seeing what others in your industry have used and their rational. For example, if a Type I error is particularly costly, dangerous, or bad in whatever way, you might change the significance level to 0.01. If a Type II error is particularly bad, you might change alpha to 0.10. Although, I’m always leery of increasing alpha from 0.05 to say 0.10. Simulation studies show that p-values near 0.05 actually reflect very weak evidence of an effect–so decreasing the strength of evidence you require (e.g., by increasing alpha from 0.05 to 0.10) doesn’t seem like a good idea. I cover this a bit at the end of my post about interpreting p-values. But, I can often imagine a need to lower alpha to something like 0.01.
  
  So, I do agree with the principle, but I often don’t see it in practice. Although, I think 0.05 is often a good value to use, so that’s probably part of why it is so ubiquitous. It’s probably a good value to use unless you can identify a specific and important reason to use a different value. And, that information is what you might gain by looking at similar research topics in your industry.
  
  Loading...
  
  Reply
prem shankar Mishra says

July 23, 2018 at 2:34 pm

Thank You Sir….! It’s really really nice, i have been found very simplistic way to understand the things which you have taken care of very well sir. thank you once again sir

Loading...

Reply
Arliezl D. Mancio says

July 22, 2018 at 11:39 pm

i’d been reading several readings but still confused… Thank you so much for the informations you shared… And now everything is clear…

Loading...

Reply
Nick says

July 21, 2018 at 3:45 pm

Thank you so much for such great content! I use your posts frequently to grasp all the material currently studied in school. I do have one question I can not wrap my head around. Was hoping you could help explain.

I would certainly agree that we can gain value by analyzing random samples because it is sometimes impossible to measure the entire population. With that being said, let us for a moment consider methods described in textbooks: estimating population mean using the Z statistic (when pop. st. dev. is known) or t statistic (when pop. st. dev. is unknown). If we can not measure the entire population and are unable to get a population standard deviation or a population mean as result, how can we use these methods or construct a confidence interval if we actually know nothing about the population? Most problems in textbooks state (assume population mean is xxx or std. dev. is yyy). To me, this does not sound practical… How is this process done in industry?

Loading...

Reply
- Jim Frost says
  
  July 21, 2018 at 10:27 pm
  
  Hi Nick,
  
  I’m glad that my posts help you out!
  
  You’re entirely correct about when to use t-values versus Z scores. Because you almost never know the population standard deviation, you never really use Z-tests in practice. After all, if you knew the population standard deviation, wouldn’t you probably also know the population mean? I don’t know why some statistics classes and textbooks use that test and assume you know the population standard deviation. I suppose it’s a little simpler case than using the t-distribution which changes depending on your degrees of freedom.
  
  If you need to test hypotheses or find confidence intervals about a population mean and you’re using a sample, you’ll almost always use t-tests and t-values.
  
  Loading...
  
  Reply
karma says

July 10, 2018 at 5:38 am

Happened to discover your website recently and have been going through it. Very helpful!

Thank you.

Loading...

Reply
- Jim Frost says
  
  July 10, 2018 at 2:30 pm
  
  Thanks, Karma! I’m glad you have found it to be helpful!
  
  Loading...
  
  Reply
Patrik Silva says

March 29, 2018 at 10:58 am

You’re welcome Jim!
Your blog is pulling me into statistics every time I read any of your post.
Statistics is nice and beautiful.
I am a Geographer I like modelling. I would like to see some of your post talking something about spatial statistic, if you now something that might be useful.

I will be here every time with you teacher.

Thank you again, Jim.

Loading...

Reply
Patrik Silva says

March 29, 2018 at 12:17 am

I am getting addicted to your blog, Jim Frost.
I think this is what should be taught at the first statistic class, before going to any math and formulas.
I am safe here, at least I know who can help me solving my doubts.

Thank you Jim, God bless you always.

Loading...

Reply
- Jim Frost says
  
  March 29, 2018 at 12:22 am
  
  Hi Patrik, you have no idea how much your kind comments mean to me! Thank you!
  
  Loading...
  
  Reply
SUBROTO CHATTERJEE says

March 24, 2018 at 7:48 am

Your blog explains statistics in a very student-friendly manner. Importantly, your explanations to various terminologies is nicely illustrated. Could you write more on multi-variate statistical analysis? Thanks.

Loading...

Reply
- Jim Frost says
  
  March 24, 2018 at 5:52 pm
  
  Hi, thanks so much! I strive to make statistics as easy to understand as possible. Your nice comments mean a lot to me!
  
  I’ll try to write more about multivariate analyses in the future.
  
  Loading...
  
  Reply
Ndamona Namalemo says

March 23, 2018 at 2:10 pm

Please help me on this assignment this is the following questions
1. Define the descriptive statistic and inferential statistics
2. The difference between descriptive statistics and inferential statistics

Loading...

Reply
- Jim Frost says
  
  March 23, 2018 at 2:22 pm
  
  Hi Ndamona, the information you need to answer your questions are in this blog post. You’re in the right place!
  
  Loading...
  
  Reply
John Sneed says

March 12, 2018 at 9:53 pm

This was a good introduction and an important help to me. I wish you had gone into a little more detail about standard deviation. I also wish there were a link to print this page. It is the kind I could go back to from time to time to refresh what I have learned. I am John and I am a PhD student in education. Thanks for this help.

Loading...

Reply
- Jim Frost says
  
  March 12, 2018 at 10:42 pm
  
  Hi John, I’m happy to hear that you found this helpful. I’m also adding new content all the time. As for the standard deviation, I write about it in a different post about Measures of Variability. You might find that helpful.
  
  Loading...
  
  Reply
Carlo Lauro says

February 21, 2018 at 3:48 pm

Still waiting for your reply

Loading...

Reply
- Jim Frost says
  
  February 21, 2018 at 4:07 pm
  
  Hi Carlo, that’s a very broad question–I could write an entire book about that topic. Is there something more specific you want to know?
  
  Loading...
  
  Reply
Evelyn says

February 19, 2018 at 3:17 pm

Just discovered this website today very helpful. Thank you Jim..

Loading...

Reply
- Jim Frost says
  
  February 21, 2018 at 3:08 pm
  
  Hi Evelyn, thank you for you kind words! I’m glad you found it to be helpful!
  
  Loading...
  
  Reply
Anandaraj says

February 5, 2018 at 10:02 pm

Very good one. Explains the basics well. Thanks

Loading...

Reply
daboo says

February 5, 2018 at 7:43 am

thank u so much continuously i need such brief explanation about statistics therefore i need another material specially about Bayesian distribution b/c i.m post graduate class a thesis on maternal mortality approach of bayesian model

Loading...

Reply
rama krishna reddy says

February 5, 2018 at 6:59 am

I am a data scientist,i enjoy while going through your articles.thank you jim.

Loading...

Reply
- Jim Frost says
  
  February 5, 2018 at 10:26 am
  
  Hi Rama, I’m glad that you find my posts to be helpful!
  
  Loading...
  
  Reply
Aayush says

February 4, 2018 at 9:27 pm

Hello sir, l want to know that what is the need of interval estimation while already we have point estimation?

Loading...

Reply
- Jim Frost says
  
  February 4, 2018 at 10:55 pm
  
  Hi Aayush, that is a great question! I talk about this in the Example of Inferential Statistics section. It is possible to calculate the point estimate for the population. However, it’s virtually guaranteed that this estimate is wrong by some amount. So, the question becomes, how far off is the point estimate likely to be?
  
  Confidence intervals answer this question. The narrower the intervals, the more precise the estimate. With narrow intervals, you can be reasonably sure that the point estimate isn’t too far wrong. However, if the CI is wide, you know that you shouldn’t expect the point estimate to be too near the true value. In that case, don’t place to much confidence in the point estimate! Interval estimation provides additional information about the precision of the point estimate.
  
  I hope this helps clarify things!
  
  Loading...
  
  Reply
Jerry Tuttle says

February 4, 2018 at 2:28 pm

I have seen definitions of sample standard deviation in social science textbooks using an n denominator for descriptive statistics and an n-1 for inferential statistics. I have never seen a math book using the n denominator for descriptive. Any comment on why the social science world goes off on a different direction here?

Loading...

Reply
- Jim Frost says
  
  February 5, 2018 at 10:36 pm
  
  Hi Jerry, I don’t know why social science takes that route. I can tell you that in statistics the correct formula to use for standard deviation depends on whether the data are the entire group or population or a sample from a larger population.
  
  When the data are the entire group (descriptive statistics), the denominator is n. However, if you are using a sample to estimate the value of a population (inferential), you use n-1. This is because you need to account for the degrees of freedom that you use for the estimate.
  
  Loading...
  
  Reply
ANN MARY CHACKO says

February 4, 2018 at 8:16 am

Thank you Jim for making things simpler and better. I am Ann, PhD Scholar from India

Loading...

Reply
- Jim Frost says
  
  February 4, 2018 at 10:56 pm
  
  Hi Ann, you’re very welcome! I’m so glad that you find my posts to be helpful! I love India! I’ve been there several times!
  
  Loading...
  
  Reply
Carlo Lauro says

February 4, 2018 at 7:26 am

Very useful presentation of the topic. What about their use in big data analysis?

Loading...

Reply
Sol says

February 4, 2018 at 3:23 am

Many thanks for this post. You’re a godsend. Have you authored any books?

Loading...

Reply
- Jim Frost says
  
  February 5, 2018 at 1:24 am
  
  Hi Sol, You’re very welcome! 🙂 And, that’s a timely question. I’m working on my first book at the moment!
  
  Loading...
  
  Reply