Inference for a Mean
Hypothesis Testing and Confidence Intervals
Introduction
Statistical Inference is the practice of using data sampled from a population to make conclusions about population parameters.
The two primary methods of statistical inference are:
- Hypothesis Testing
- Confidence Intervals
Hypothesis testing is the practice of gathering evidence in an attempt to disprove a hypothesis. We either reject the conventional wisdom or fail to reject it.
Confidence intervals are a way to use data to estimate the unknown population parameter, often the population mean,
This chapter lays the foundation for Hypothesis Testing.
Review: distribution of sample means
Hypothesis testing depends on the validity of the assumption that the distribution of the sample mean is normally distributed.
Recall that the distribution of sample means is normal when:
- The underlying population is normally distributed
- The sample size, n, is sufficiently large (
for this class) for the Central Limit Theorem to apply
Hypothesis Testing
When we know the population standard deviation,
NOTE: In practice, we do not typically know the population standard deviation. To help us make the connection between probability calculations for means and hypothesis testing, we assume we know the population standard deviation,
Testing fo the True Mean Length of a Footlong Sandwich
One might expect a foot-long sandwich to be 12 inches long, at least on average. Consumers will naturally allow for some level of variation. But if enough customers are convinced they are getting consistently short-changed, they may loudly complain, or even attempt a lawsuit.
The Null Hypothesis
Proper scientific inquiry is based on attempts to disprove a claim. The claim representing the “status quo,” the commonly held belief or the usual value is called the null hypothesis. In the case of foot-long sandwiches, our null hypothesis is that the true mean length of foot-long sandwiches is 12 inches.
We present the null hypothesis in the following way:
The null hypothesis is standing trial. We seek evidence against
The Alternative Hypothesis
The alternative hypothesis,
There are other possible alternatives that will depend on the context of the research question. Sometimes we may want to test if something is higher than a proposed value. Sometimes we are not sure if something is higher or lower at the outset, so we could test if something is not equal to a proposed value.
Alternative hypothesis could have been written as:
(two-sided hypothesis; two-tailed) (one-sided hypothesis; left-tailed) (one-sided hypothesis; right-tailed)
It is important that the null and alternative hypotheses be determined prior to collecting the data. It is not appropriate to use the data from your study to choose the alternative hypothesis that will be used to test the same data! This is an example of using data twice, once to choose the test and again to conduct the test. It is okay to use data from a previous study to determine your null and alternative hypotheses, but it is an improper use of the statistical procedures to use the data to define and conduct a hypothesis test.
KEY POINTS:
- Hypotheses are statements about population parameters
- The null hypothesis will always be a statement of equality
- We never PROVE the null hypothesis is true, we can only fail to disprove it.
That last point is important. Data collected are only evidence against
Test Statistic
We use Test Statistics to determine how likely our results are assuming the null hypothesis is true. We will use many different test statistics throughout this course, but the first one will be very familiar: the Z-score.
When testing a null hypothesis, the
Suppose for now that sandwich-to-sandwich lengths vary by about 0.5 inches and that the distribution is normal. If we sampled 21 random sandwiches from shops in a region and got a sample mean length of 11.82 inches, we can calculate a Z-score:
Where
Based on the results from the formula above,
We can then use this Z-score to calculate the probability of observing such a result or more extreme, if the null hypothesis is true. This is the evidence we are going to use to make a decision about the null hypothesis.
-value
In other words, our
KEY DEFINITION: A P-value is the probability of observing a test statistic as extreme, or more extreme than the one we observed in our sample, if the null hypothesis is true.
We use “as or more extreme” because the direction (greater than or less than) depends on our alternative hypothesis.
In the case against the sandwich shop, we can use pnorm()
to get the P-value:
<- (11.82-12) / (0.5/sqrt(21))
z
pnorm(z)
[1] 0.04949937
Conclusion: If the true population mean was 12 inches, there is a 4.95% chance of obtaining a sample mean of 11.82 hours for a sample of size 21.
At this point we have to make a decision. Is that probability small enough to reject the null hypothesis in favor the alternative? Are foot-long sandwich is LESS than 12 inches on average?
When the P-value is very small, we have strong evidence to reject the null hypothesis. But how small is small enough?
Level of Significance,
We need a number that can be used to determine if the
We will use the same decision rule for all hypothesis tests:
- If the
-value is less than , we reject the null hypothesis. - If the
-value is greater than , we fail to reject the null hypothesis.
Memory Aid: Some students find it helpful to remember the decision rule using the couplet:
If
Where “low” means less than
The level of significance,
Type I and Type II Errors
This is not an infallible process. We may end up rejecting a null hypothesis that is, in fact, true. Or fail to reject a null hypothesis that is, in fact, false.
This is because researchers will sometimes get a very high or very low sample mean purely by chance. Perhaps their sampling methods were not as random as they had supposed. We may reject or fail to reject a null hypothesis because of dumb luck.
Type I Error: Rejecting a TRUE null hypothesis.
Assuming the null hypothesis is true, the level of significance (
Let’s demonstrate
The red area,
The most common choice for
We can set
Type II Error: Failing to Reject a FALSE null hypothesis.
If the
A level of significance of
Visualizing Error
The below graph illustrates the relationship between Type I and Type II errors. The red distribution represents the Null Hypothesis, the sampling distribution of sample mean-lengths of foot-long sandwiches, assuming
The blue distribution represents the “TRUE” distribution with a mean,
If we set
While we don’t know what the “TRUE” distribution is, we can see the relationship between moving the cutoff based on