Confidence Intervals for a Mean

Introduction

Lesson Outcomes

By the end of this lesson, you should be able to:

Explain the meaning of a level of confidence
Create a confidence interval for a single mean with \(\sigma\) known using the following steps:
1. Find the point estimate (\(\bar{x}\))
2. Calculate the margin of error for the given level of confidence
3. Calculate a confidence interval from the point estimate and the margin of error
4. Interpret the confidence interval
5. Check the requirements for the confidence interval
Explain how the margin of error is affected by the sample size and level of confidence

Statistical Inference is the practice of using data sampled from a population to make conclusions about population parameters.

The two primary methods of statistical inference are:

Confidence Intervals
Hypothesis Testing

This chapter lays the foundation for confidence intervals.

Background

Point Estimators

We have learned about several statistics. Remember, a statistic is any number computed based on data. The sample statistics we have discussed are used to estimate population parameters.

	Sample Statistic	Population Parameter
Mean	\(\bar x\)	\(\mu\)
Standard Deviation	\(s\)	\(\sigma\)
Variance	\(s^2\)	\(\sigma^2\)
\(\vdots\)	\(\vdots\)	\(\vdots\)

The statistics above are called point estimators because they are just one number that is used to estimate a population parameter. Population Parameters are generally unknown.

Consider the mean. If \(\mu\) is unknown, how do we know if \(\bar{x}\) is close to it?

The short answer is that we will never know for sure if \(\bar{x}\) is close to \(\mu\). This does not mean that we are helpless. We can rely on the Central Limit Theorem and the Law of Large Numbers to predict a range of reasonable values for \(\mu\) based on our sample.

DEFINITION: A confidence interval is a range of values, derived from sample data, that is likely to contain the value of an unknown population parameter.

We start by looking at confidence intervals for an unknown population mean, \(\mu\).

Review: Distribution of Sample Means

Confidence intervals rely on the validity of the assumption that the distribution of the sample mean is normally distributed.

Recall that the distribution of sample means is normal when:

The underlying population is normally distributed
The sample size, n, is sufficiently large (\(n<30\) for this class) for the Central Limit Theorem to apply

Thought Question: If we have a good sample from a population and can trust that the sampling distribution of the mean is approximately normal, how frequently would a sample mean be within 2 standard deviations from the true population mean?

Remember, the standard deviation of \(\bar x\) is \(\frac{\sigma}{\sqrt{n}}\). For the \(\bar x\), two standard deviations would be equal to \(2 \frac{\sigma}{\sqrt{n}}\).

ANSWER: If we collect a random sample from a population and \(\bar x\) is normally distributed, then about 95% of the time (based on the 68-95-99.7 rule!) the sample mean \(\bar x\) will be within \(2 \frac{\sigma}{\sqrt{n}}\) units away from the population mean \(\mu\).

This means that about 95% of the time, we will get a sample mean within 2 Standard Deviations of the true population mean.

Flipping this around, we can take our sample mean, \(\bar{x}\), and make an interval 2 standard deviations, \(2\frac{\sigma}{\sqrt{n}}\) above the sample mean and 2 below, the interval will overlap with the true population mean about 95% of the time.

An Approximate 95% Confidence Interval

The equation of an approximate 95% confidence interval would be:

\[ CI = \bar{x} \pm 2 \frac{\sigma}{\sqrt{n}}\]

The part that we are adding and subtracting from our point estimate is called the Margin of Error. We use the letter \(m\) to denote the margin of error:

\[m = 2 \frac{\sigma}{\sqrt{n}}\]

Using this definition for \(m\), our confidence interval can be written as

\[( \bar x - m, ~ \bar x + m )\]

Confidence Intervals

Recall that it is only approximately 95% of the area under the curve within 2 standard deviations of the mean.

We want to be more precise in our confidence intervals and may want to choose a level of confidence different from 95%.

The generalized formula for a confidence interval is

\[ CI = \bar{x} \pm z^*\frac{\sigma}{\sqrt{n}}\]

Thought Question: What would an approximate \(z^*\) value be for a 68% confidence interval? 99.7%?

The \(z^*\) is a percentile from the standard normal distribution that corresponds to a desired area under the curve between it and its negative. For example:

NOTE: 1.96 is the exact \(z^*\) (to two significant digits) for 95% confidence interval which is very close to the approximate 2 above.

Common \(z^*\) values for confidence intervals are:

Conf. Level	Z*
0.99	2.576
0.95	1.96
0.9	1.645

A 95% confidence interval will miss the true population mean 5% of the time because 5% of the time you will get a mean in the one tail or the other of the sampling distribution just by chance.

Interpretation

Confidence intervals are typically reported using parentheses like: (lower limit, upper limit). We say that we are \(C\%\) confident that the true population mean is between [lower limit] and [upper limit].

Average GRE Scores of BYU-I Students

The published population standard deviation of the quantitative portion of the Graduate Record Examination (GRE) scores is \(\sigma=8.3\).

Suppose we take a random sample of \(n=100\) BYU-I students who have taken the GRE and find that their average score was \(\bar{x}=162.1\)

We can calculate the 99% confidence interval:

\[ 162.1 \pm 2.576\frac{8.3}{\sqrt{100}} = (159.96, 164.24)\]

The interpretation of the above confidence interval would be:

I am 99% confident that the true population mean GRE score for BYU-I students is between 159.96 and 164.24.

Margin of Error

QUESTION: What happens to the margin of error, \(z^*\frac{\sigma}{\sqrt{n}}\), as the sample size, \(n\), increases?

QUESTION: What happens to the margin of error, \(z^*\frac{\sigma}{\sqrt{n}}\), as our confidence level increases? (see table above about Z* and confidence level)

Consider that if I make a wide enough interval, I can be 100% confident. But to get 100% confidence, my interval will be useless. For example, I can be 100% confident that the true population average height of BYU-I students is between 2 feet and 100 feet. More confidence means we need a wider interval.