1-pnorm(33,21,4)
[1] 0.001349898
# Equivalently:
pnorm(33, 21,4, lower.tail = FALSE)
[1] 0.001349898
In your reading, you learned about the normal distribution which is a probability model that can calculate probabilities for certain types of events that follow a normal distribution.
For example, scores on the ACT exam are normally distributed with a mean of 21 and a standard deviation of 4. Suppose you really want to get into Stanford which requires a 33 or higher on the ACT. We can calculate the probability of a random test taker getting into Stanford (if we’re basing it on test score alone.)
We can first calculate a Z-score, which is the number of standard deviations an ACT score is away from the mean.
I always recommend drawing a picture first. Without doing any math, you can usually make a pretty good guess about what the Z-score will be.
Before doing any math, what is the minimum z-score required to get into Stanford?
Answer:
That is significantly above average!
The probability of getting a score higher than 33 is the area under the curve to the right of the red line. That’s a small area!
We can calculate a Z-score using the formula
\[Z = \frac{x - \mu}{\sigma} \]
where \(\mu\) is the mean of the normal distribution and \(\sigma\) is the standard deviation.
NOTE: Z-scores follow what is called a standard normal distribution which means it is centered at \(\mu = 0\) with a standard deviation \(\sigma=1\).
Recall that to get an area under the curve, we need the calculus. Fortunately, we have computers to do the heavy lifting for us.
We will use the function pnorm()
to calculate the areas under the curve for specified values. The p
in pnorm()
stands for probability and norm
obviously stands for the normal distribution.
The pnorm(x, mu, sigma)
function takes the value, x
, we wish to evaluate, the mean, \(\mu\), and standard deviation, \(\sigma\). We can use pnorm()
in the original units of the data and put in \(\mu\) and \(\sigma\). Sticking with our example of Stanford admissions, we can calculate the probability of getting a value greater than 33.
By default, pnorm()
gives the area to the LEFT of the given value. If we want the area to the right, we can use the lower.tail = FALSE
or equivalently 1-pnorm(33,21,4)
.
[1] 0.001349898
[1] 0.001349898
Equivalently, we can put the z-score into pnorm()
It’s easy to create a calculator in R that will calculate Z for us and probabilities automatically.
NOTE: Z
is what we call the Standard Normal Distribution. It has a mean of 0 and a SD of 1. If X is normally distributed, subtracting the mean and dividing by the standard deviation gives the standard normal distribution with a mean of 0 and an SD = 1.
If IQ is normally distributed with a mean of 100 and a Standard Deviation of 11, what’s the probability of a randomly selected person having an IQ GREATER than 127?
What about the probability of a randomly selected person having an IQ LESS than 85?
If IQ scores have a mean of 100 and standard deviation of 11, we can find the area between two numbers be subtracting the left tail area of the lower number from the left tail area of the higher number, leaving the area between.
What is the probability that a randomly selected individual will have an IQ between 84 and 109?
Finding percentiles of a normal distribution is easy in R! Remember that a percentile is the value below which a given percentage of observations fall. So someone who scores in the 92nd-percentile scores above 92% of the population.
NOTE: We calculated the percentile from a set of data using the function quantile()
. This can be used on data regardless of the distribution of the data.
To get percentiles for a normal distribution with mean, \(\mu\), and standard deviation, \(\sigma\), we use the qnorm()
function.
The q
in qnorm()
stands for quantile (which is a synonym for percentile and for whatever reason is what R uses for percentiles).