pnorm(1.25)
[1] 0.8943502
When is the sample mean normally distributed? This happens when either of the two conditions are satisfied:
For the mean of draws from a random variable with mean
Previously, you learned how to use the normal probability applet and R to convert a
1a. Compare this to the R output using pnorm()
If the sample mean is normally distributed, then we can consider the sample mean,
Notice that we just replaced the “value” with the normal random variable, the “mean” with its mean and “standard deviation” with its standard deviation.
The pnorm()
to get the probability that a randomly selected mean will be above or below a given value of
Worked Example: Finding the Area under a Normal Curve (Based on a Sample Mean)
After finding the pnorm()
to find the area under the curve (i.e. the probability.) Suppose that a sample of size
The
The probability that
[1] 2
[1] 0.9772499
[1] 0.02275013
The area to the right of
In this example, the sample mean was automatically normally distributed because the parent population was normally distributed. This will always be true, no matter what size sample is drawn.
Worked Example: Finding the Area under a Normal Curve (Based on a Sample Mean)
What do we do if the parent population is not normally distributed? If the sample size is large, then the Central Limit Theorem guarantees that the sample mean will be approximately normally distributed. Based on this, we can still do normal probability calculations for the mean of a random sample.
The distribution of the weekly costs incurred by Global Solutions Unlimited is right skewed. The population mean of the costs is $26,400 and the standard deviation is $23,200. A random sample of
Since the number of observations is large, the Central Limit Theorem assures that the sample mean will follow a normal distribution. The
Using R, we can find the area to the left of this
xbar <- 20000
mu <- 26400
sigma <- 23200
n <- 40
sigma_xbar <- sigma/sqrt(n)
z <- (xbar-mu) / sigma_xbar
z
[1] -1.744705
[1] 0.04051812
[1] 0.9594819
NOTE: Recall the the Normal Probabilty Applet rounds the
The area to the left of
We will now consider a complete example that shows how these probabilities are used in practice.
The United States Government decided to open some land near a uranium enrichment facility to public use. After a few years of the public hiking, biking, and sometimes even hunting on this land, workers from the facility noticed that there were several unnatural-looking mounds in the earth near the area. Because this land was once used by the facility and nobody knew the origin of these piles, the government closed public access to the land until they could assess if the mounds were safe.
Step 1: Design the study.
Measurements were taken from the mounds to assess one of the contaminants, lead. The tests involved are very expensive. Each sample costs about $600 to process. The Environmental Protection Agency (EPA) has set a “No Action Level” (NAL) for lead. If the mean concentration in the soil of the contaminant is less than the NAL, then the area can be declared safe for public use. If the concentration of the contaminant reaches or exceeds the NAL, the site must be cleaned additionally before it is declared safe. The NAL for lead is 50 milligrams of lead per kilogram of soil (mg/kg).
The hypothesis for this test are:
In environmental testing, we always assume the site is dirty. That is, our null hypothesis is that the mean level of contamination is at the NAL. We gather data to determine if there is sufficient evidence to support rejecting the null hypothesis.
Step 2: Collect Data
Scientists collected
Step 3: Describe the Data
Step 4: Make Inferences
We assume the null hypothesis is true and we gather evidence against this requirement. We will find the probability that the mean lead concentration is less than
Since the sample size is large (
First, we compute the
[1] -6.791663
[1] 5.542414e-12
Using the pnorm()
function, we find the area to the left of
The
Since the
Step 5: Take Action
There is sufficient evidence to suggest that the mean lead level is less than 50 mg/kg. We conclude that the lead concentration in the soil is low enough that it is not a danger to the public. Based on the results of this and other similar test results, the government has reopened public access to this area.
The parent population is Normally distributed.
The Central Limit Theorem guarantees the distribution of
The collection of all possible sample means
Once we have determined that the sample mean is Normally distributed, we can compute probabilities with
A z-score for a sample mean is calculated as:
When the distribution of sample means is normally distributed, we can use a z-score R to calculate the probability that a sample mean is above, below or between some given value (or values).