The Normal Distribution (Class)

The Normal Distribution

In your reading, you learned about the normal distribution which is a probability model that can calculate probabilities for certain types of events that follow a normal distribution.

Stanford Admissions

For example, scores on the ACT exam are normally distributed with a mean of 21 and a standard deviation of 4. Suppose you really want to get into Stanford which requires a 33 or higher on the ACT. We can calculate the probability of a random test taker getting into Stanford (if we’re basing it on test score alone.)

We can first calculate a Z-score, which is the number of standard deviations an ACT score is away from the mean.

I always recommend drawing a picture first. Without doing any math, you can usually make a pretty good guess about what the Z-score will be.

Before doing any math, what is the minimum z-score required to get into Stanford?

Answer:

That is significantly above average!

The probability of getting a score higher than 33 is the area under the curve to the right of the red line. That’s a small area!

The Math

We can calculate a Z-score using the formula

\[Z = \frac{x - \mu}{\sigma} \]

where \(\mu\) is the mean of the normal distribution and \(\sigma\) is the standard deviation.

NOTE: Z-scores follow what is called a standard normal distribution which means it is centered at \(\mu = 0\) with a standard deviation \(\sigma=1\).

Recall that to get an area under the curve, we need the calculus. Fortunately, we have computers to do the heavy lifting for us.

pnorm()

We will use the function pnorm() to calculate the areas under the curve for specified values. The p in pnorm() stands for probability and norm obviously stands for the normal distribution.

The pnorm(x, mu, sigma) function takes the value, x, we wish to evaluate, the mean, \(\mu\), and standard deviation, \(\sigma\). We can use pnorm() in the original units of the data and put in \(\mu\) and \(\sigma\). Sticking with our example of Stanford admissions, we can calculate the probability of getting a value greater than 33.

By default, pnorm() gives the area to the LEFT of the given value. If we want the area to the right, we can use the lower.tail = FALSE or equivalently 1-pnorm(33,21,4).

1-pnorm(33,21,4)
[1] 0.001349898
# Equivalently:
pnorm(33, 21,4, lower.tail = FALSE)
[1] 0.001349898

Equivalently, we can put the z-score into pnorm()

It’s easy to create a calculator in R that will calculate Z for us and probabilities automatically.

NOTE: Z is what we call the Standard Normal Distribution. It has a mean of 0 and a SD of 1. If X is normally distributed, subtracting the mean and dividing by the standard deviation gives the standard normal distribution with a mean of 0 and an SD = 1.

# To calculate a z-score, you can use R like a calculator:

x <- 33
mu <- 21
stdev <- 4
z <- (x-mu) / stdev

z
[1] 3
# By default, pnorm() gives us the area to the LEFT of our observation

pnorm(x, mean = mu, sd = stdev)
[1] 0.9986501
1 - pnorm(x, mean = mu, sd = stdev)
[1] 0.001349898
# Equivalently:  

pnorm(z) ## Left Tail
[1] 0.9986501
1-pnorm(z) ## Right Tail
[1] 0.001349898

Practice

IQ Scores

If IQ is normally distributed with a mean of 100 and a Standard Deviation of 11, what’s the probability of a randomly selected person having an IQ GREATER than 127?

What about the probability of a randomly selected person having an IQ LESS than 85?

x <- 
mu <- 
stdev <- 

z <- (x-mu) / stdev
  
pnorm(z)  # LESS THAN
[1] 0.9986501
1-pnorm(z) # GREATER THAN
[1] 0.001349898

Area BETWEEN 2 Points

If IQ scores have a mean of 100 and standard deviation of 11, we can find the area between two numbers be subtracting the left tail area of the lower number from the left tail area of the higher number, leaving the area between.

What is the probability that a randomly selected individual will have an IQ between 84 and 109?

mu <- 
stdev <- 

LowerPoint <- 
UpperPoint <- 

# Area Between:
pnorm(UpperPoint, mu, stdev) - pnorm(LowerPoint, mu, stdev) 
Error in eval(expr, envir, enclos): object 'UpperPoint' not found

Percentiles

Finding percentiles of a normal distribution is easy in R! Remember that a percentile is the value below which a given percentage of observations fall. So someone who scores in the 92nd-percentile scores above 92% of the population.

NOTE: We calculated the percentile from a set of data using the function quantile(). This can be used on data regardless of the distribution of the data.

To get percentiles for a normal distribution with mean, \(\mu\), and standard deviation, \(\sigma\), we use the qnorm() function.

The q in qnorm() stands for quantile (which is a synonym for percentile and for whatever reason is what R uses for percentiles).

# for a normal distribution with mean, mu, and standard deviation, stdev, we can find a percentile using the following code:

mu <- 
stdev <- 
percentile <- 

qnorm(percentile, mu, stdev)
Error in eval(expr, envir, enclos): object 'percentile' not found