We need the following libraries and custom functions to evaluate the examples in this file. Expand the code button on the right to see them.
library(knitr)
#Shades a rug diagram for a probability mass function.
#Inputs:
# x - a vector of data points
# p - a corresponding vector of probabilities or frequencies
#All widths are 1 unit wide.
draw_pmf <- function(x,p){
xs <- c(rbind(x-1/2,x-1/2,x+1/2,x+1/2))
px <- c(rbind(0,p,p,0))
par(mar=c(2.5,2.5,0.25,0.25))
plot.new()
plot(xs,px,type="l")
polygon(xs,px,col="gray")
}
#Shades a rug diagram (shades area under) for a function f from a to b.
#Inputs:
# f - a function f(x)
# a - left end of the rug
# b - right end of the rug
# num_points - how many point are sent into f for plotting.
draw_rug <- function(f,a,b,num_points=100){
x <- c(a,seq(a,b,(b-a)/num_points),b,a)
y <- c(0,f(seq(a,b,(b-a)/num_points)),0,0)
par(mar=c(2.5,2.5,0.25,0.25))
plot(x,y,type = "l")
polygon(x,y,col="gray")
}
#Draws rectangles over the top of a given function.
#The midpoint of top of each rectangle passes through the function.
# f - a function f(x)
# a - left end of graph
# b - right end of graph
# num_rectangles - how many rectangles to plot.
# method - One of "left", "right", or "mid". Defaults to mid.
draw_rect_approx <- function(f,a,b,num_rectangles, method = "mid"){
n <- num_rectangles
dx <- (b-a)/n
x <- c(a,seq(a,b,dx/100),b,a)
y <- c(0,f(seq(a,b,dx/100)),0,0)
par(mar=c(2.5,2.5,0.25,0.25))
plot(x,y,type = "l")
if(method == "left"){
xi <- seq(a+0*dx/2,b-dx/2,dx)
lines(xi,f(xi),type = "h")
lines(xi,f(xi),type = "s")
lines(c(xi[n],xi[n]+dx),f(c(xi[n],xi[n])),type = "l")
lines(c(xi[n],xi[n]+dx),f(c(xi[n],xi[n])),type = "h")
}
else if(method == "right"){
xi <- seq(a+dx,b+dx/2,dx)
lines(xi-dx,f(xi),type = "h")
lines(xi-dx,f(xi),type = "s")
lines(c(xi[n]-dx,xi[n]),f(c(xi[n],xi[n])),type = "l")
lines(c(xi[n]-dx,xi[n]),f(c(xi[n],xi[n])),type = "h")
}
else{#Use midpoint
xi <- seq(a+dx/2,b,dx)
lines(xi-dx/2,f(xi),type = "h")
lines(xi-dx/2,f(xi),type = "s")
lines(c(xi[n]-dx/2,xi[n]+dx/2),f(c(xi[n],xi[n])),type = "l")
lines(c(xi[n]-dx/2,xi[n]+dx/2),f(c(xi[n],xi[n])),type = "h")
}
}
Suppose a class takes a test and there are three scores of 70, five scores of 85, one score of 90, and two scores of 95. Let’s calculate the mean class score, \(\bar x\), three different ways to emphasize three ways of thinking about the mean. We’ll emphasize the pattern of the calculation, rather than the final answer, so we’ll write out each calculation completely first, before simplifying.
Have you ever sat on a seesaw with someone whose weight is quite different than yours? Maybe you’ve tried balancing something with your finger? To find the right place to sit on the seesaw, or to find the correct balancing point, we need is to locate the center-of-mass of a system.
When our weight matches another person’s weight, we can sit the same distance away from the center of the seesaw, and things are in balance. This is because the center-of-mass of the two individuals lies directly in the middle. Abstractly, we could think of people as rectangles (as in the figure below), where we track weight by the area of each rectangle, and imagine the \(x\)-axis as a tray that holds up the rectangles. When we view the problem in terms of area, rather than mass, we use the term centroid instead of center-of-mass. If one person is sitting at \(x_1=2\) and another is sitting at \(x_2=6\) (with equal weight, so equal area), then the central point, balancing point, center-of-mass, or centroid, is directly between the two rectangles at \(\bar x =4.5\). Note that there is also a \(y\)-coordinate to the center of mass, but for the purposes of our class we won’t need to discuss it.
x <- c(3,6)
num <- c(2,2)
draw_pmf(x,num)
How does the picture above change if the weights (areas) of the two objects are different? Suppose the first rectangle has an area of \(A_1= 2\) centered at \(x_1=3\), and the second object has an area of \(A_2=4\) centered at \(x_2=6\) (see the image below).
x <- c(3,6)
num <- c(2,4)
draw_pmf(x,num)
Remember that we’ve imagined the \(x\)-axis as a tray holding these rectangles up. If we placed our finger under the point \(x=4.5\), the larger rectangle would cause the tray to tilt to the right. At what point \(\bar x\) would we place our finger to balance the system? We call this point the centroid of the system (technically this will only give us the \(x\)-coordinate of the centroid, but we’ll continue to refer to it as the centroid).
If we place our finger under the \(x\)-axis at the centroid \(\bar x = 5\), then the system is perfectly balanced. Here’s a video that walks through this entire computation and visually shows the result using legos.
The exercise above can be done with any number of rectangles. In general, the centroid of a bunch of objects with area \(A_i\) whose individual centroids are located at \(x_i\) is given by \[\bar x = \frac{\sum x_iA_i}{\sum A_i}.\] We can use this same geometric idea to locate the average exam score, from the opening example. With 4 rectangles centered at \(x_1=70\), \(x_2 = 85\), \(x_3 = 90\), \(x_4 = 95\), with areas \(A_1 = 3\), \(A_2 = 5\), \(A_3 = 1\), \(A_4 = 2\), the picture below shows the geometric object whose centroid is given by \(\bar x \approx 83.18\).
x <- c(70,85,90,95)
num <- c(3,5,1,2)
draw_pmf(x,num)
Instead of using counts for area (as in \(A_1 = 3\), \(A_2 = 5\), \(A_3 = 1\), \(A_4 = 2\)), we could instead use the proportion of the total located at each point (so \(A_1 = \frac{3}{11}\), \(A_2 = \frac{5}{11}\), \(A_3 = \frac{1}{11}\), \(A_4 = \frac{2}{11}\)). Using these proportions for area gives the following geometric object.
x <- c(70,85,90,95)
num <- c(3/11,5/11,1/11,2/11)
draw_pmf(x,num)
If we ignore the \(y\)-axis, this second image looks almost identical to the first, and has the exact same centroid along the \(x\)-axis. With a total area of \(\sum A_i = 1\), the centroid formula simplifies to \[\bar x = \frac{\sum x_i A_i}{\sum A_i} = \frac{\sum x_i A_i}{1} = \sum x_i A_i.\] We use the word centroid when referring to a geometric center. In the next section, we’ll start calling this the expected value of a random variable, written \(\text{E}[X]\) (or \(\mu\)).
Each of these exercises computes an expected value for the next section.
Let’s turn our attention to discrete random variables and probability. We’ll see that the centroid discussed above gives us additional insight into understanding random variables.
Suppose we toss a coin three times and record the number of heads. Let the random variable \(X\) represent this total number of heads.Note that possible outcomes of counting the number of heads are \(0,1,2,3\) (these are discrete values, hence we call this a discrete random variable). These outcomes arise from the 8 possible results of tossing a coin three times (TTT,TTH,THT, THH,HTT,HTH,HHT, HHH).
This gives us a function (called a probability mass function) which we can summarize in the table and graphs below.
x <- c(0,1,2,3)
p <- c(1/8,3/8,3/8,1/8)
tbl <- data.frame(outcome=x, probability = p )
kable(tbl, align = "c")
outcome | probability |
---|---|
0 | 0.125 |
1 | 0.375 |
2 | 0.375 |
3 | 0.125 |
plot(x,p)
draw_pmf(x,p)
Note that in the graph above, we’ve given each rectangle a width of 1
unit so that areas match probabilities. To connect this theoretical
model to something physical, we can think of this graphical
representation as a rug, made by sewing 4 different length rectangles to
the \(x\)-axis. The centroid of said
rug is located directly in the center of the graph at \(x=1.5\). We can obtain this via the
computation \[\bar x = \sum x_i A_i =
0(\frac{1}{8})+1(\frac{3}{8})+2(\frac{3}{8})+3(\frac{1}{8}) =
1.5.\]
We computed the above in the context of a physical rug, and the word centroid and symbol \(\bar x\) are appropriate in this context. For a random variable, the vocabulary and notation change, but the computations are the same.
The expected value of a discrete random variable \(X\) with outcomes \(x_i\) having probability \(p_i\) is given by \[\text{E}[X] = \sum x_i p_i.\]
We compute the expected value below for the random variable \(X\) representing tossing 3 coins and counting the number of heads.
x <- c(0,1,2,3)
p <- c(1/8,3/8,3/8,1/8)
sum(x*p)
## [1] 1.5
The expected value of a random variable provides a measure of what would happen if we repeated something many times, and then averaged the results. If we were to run this experiment 8 million times, then about 1 million times we would have 0 heads, 3 million times we’d have 1 head, 3 million times we’d have 2 heads, and 1 million times we’d have 3 heads. The mean of these values is \[\bar x = 0(\frac{1000000}{8000000}) + 1(\frac{3000000}{8000000}) + 2(\frac{3000000}{8000000}) + 3(\frac{1000000}{8000000}).\] This simplifies to the same computation we had before, resulting in \(\bar x = 1.5\) When we write \(\text{E}[x]\) instead of \(\bar x\), we’re referring to a distribution average. We can think of this as a limit of averages as we increase the number of times we repeat an experiment. We use \(\bar x\) to represent a sample mean (or mean of the data) and \(\text{E}[X]\) to represent the population mean (or mean of the distribution).
Before we move on to the next example, let’s pause and notice another key fact about all discrete random variables.
Notice that sum of the probabilities equals 1. Why is this true for all probability mass functions?
Notice that sum of the areas of the rectangles equals 1. Having made the observation above, let’s compute the probability \(P(X\leq 2)\). We’ll do this in two ways.
The process above, of computing the probability that a random variable takes on any value less than or equal to some given value, is so common that we’ve given it a name and most software programs have a simple way to compute it.
The cumulative distribution function (CDF) of a random variable \(X\) is the function \(F(x) = P(X\leq x)\). The domain of the CDF is all real numbers. The CDF computes the probability that \(X\) takes on a value less than or equal to \(x\). We’ll often use a capital \(F\) to denote the CDF of \(X\).
Above we computed \(F(2)\). We can use the cumsum() function in R to rapidly compute the CDF at a few value for this discrete random variable, and then display the results in the table below.
x <- c(0,1,2,3)
p <- c(1/8,3/8,3/8,1/8)
cdf <- cumsum(p)
tbl <- data.frame(outcome=x, probability = p, CDF = cdf )
kable(tbl, align = "c")
outcome | probability | CDF |
---|---|---|
0 | 0.125 | 0.125 |
1 | 0.375 | 0.500 |
2 | 0.375 | 0.875 |
3 | 0.125 | 1.000 |
We now let \(X\) be the discrete random variable obtained by rolling two 6-sided dice and recording their sum. The possible outcomes are the integers from 2 up to 12. (If you’ve played Settler’s of Catan before, then the dots on the number cards provide a visual of the probability mass function). There are 36 results for rolling two six-sided fair dice, and the probability mass function for \(X\) is given in the table and diagram below.
x <- seq(2,12)
p <- c(1/36,2/36,3/36,4/36,5/36,6/36,5/36,4/36,3/36,2/36,1/36)
tbl <- data.frame(outcome=x, prob = p)
kable(tbl, align = "c")
outcome | prob |
---|---|
2 | 0.0277778 |
3 | 0.0555556 |
4 | 0.0833333 |
5 | 0.1111111 |
6 | 0.1388889 |
7 | 0.1666667 |
8 | 0.1388889 |
9 | 0.1111111 |
10 | 0.0833333 |
11 | 0.0555556 |
12 | 0.0277778 |
draw_pmf(x,p)
Some Solutions: The table below provides some of the values of the CDF of \(X\).
x <- seq(2,12)
p <- c(1/36,2/36,3/36,4/36,5/36,6/36,5/36,4/36,3/36,2/36,1/36)
cdf <- cumsum(p)
tbl <- data.frame(outcome=x, prob = p, CDF = cdf)
kable(tbl, align = "c")
outcome | prob | CDF |
---|---|---|
2 | 0.0277778 | 0.0277778 |
3 | 0.0555556 | 0.0833333 |
4 | 0.0833333 | 0.1666667 |
5 | 0.1111111 | 0.2777778 |
6 | 0.1388889 | 0.4166667 |
7 | 0.1666667 | 0.5833333 |
8 | 0.1388889 | 0.7222222 |
9 | 0.1111111 | 0.8333333 |
10 | 0.0833333 | 0.9166667 |
11 | 0.0555556 | 0.9722222 |
12 | 0.0277778 | 1.0000000 |
The expected value is computed below.
x <- seq(2,12)
p <- c(1/36,2/36,3/36,4/36,5/36,6/36,5/36,4/36,3/36,2/36,1/36)
sum(x*p)
## [1] 7
Gambling can be phrased in terms of probability. We can mathematically give a reason why gambling is a bad idea. As a humor aside, in The True Meaning of Smekday by Adam Rex (the book behind DreamWorks Animation’s 2015 movie Home), we find the quote
At one point, an alien searching for a casino shouts: “I WAS TOLD TO GO TO THE LARGE OFFENSIVELY COLORED BUILDING WHERE HUMANS WHO ARE BAD AT MATH GIVE AWAY THEIR MONEY.”
Here’s a simple gambling game we’ll explore. You pay 1 dollar to play. Your roll two dice and multiply the result together. If you roll a product of 20 or more, you get $4 (your dollar back, plus 3 more). Otherwise, you get nothing (you lost a dollar). There are two outcomes, namely you lose a buck, or you gain 3 bucks. We’ll let \(X\) be the random variable with outcomes of \(-1\) and \(3\) resulting from this game.
To get a probability mass function, we need the probability of getting a product of 20 or above. Let \(Y\) be the random variable which is the product of rolling two dice. The probability mass function for \(Y\) is shown below.
x <- c(1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,30,36)
p <- c(1/36,2/36,2/36,3/36,2/36,4/36,2/36,1/36,2/36,4/36,2/36,1/36,2/36,2/36,2/36,1/36,2/36,1/36)
cdf <- cumsum(p)
tbl <- data.frame(outcome=x, prob = p, CDF = cdf)
kable(tbl, align = "c")
outcome | prob | CDF |
---|---|---|
1 | 0.0277778 | 0.0277778 |
2 | 0.0555556 | 0.0833333 |
3 | 0.0555556 | 0.1388889 |
4 | 0.0833333 | 0.2222222 |
5 | 0.0555556 | 0.2777778 |
6 | 0.1111111 | 0.3888889 |
8 | 0.0555556 | 0.4444444 |
9 | 0.0277778 | 0.4722222 |
10 | 0.0555556 | 0.5277778 |
12 | 0.1111111 | 0.6388889 |
15 | 0.0555556 | 0.6944444 |
16 | 0.0277778 | 0.7222222 |
18 | 0.0555556 | 0.7777778 |
20 | 0.0555556 | 0.8333333 |
24 | 0.0555556 | 0.8888889 |
25 | 0.0277778 | 0.9166667 |
30 | 0.0555556 | 0.9722222 |
36 | 0.0277778 | 1.0000000 |
draw_pmf(x,p)
From the table above, we quickly see that \(F_Y(18) = P(Y\leq 18)= 0.777778\), with
exact value \(F_Y(18) = \frac{7}{9}\).
Because 19 is not a possible product of two dice, we know that \[P(Y\geq 20)=1-P(Y<20) =1-P(Y\leq 18)=
1-\frac79 = 2/9.\]
We now have the details we need to examine the gambling game represented by the random variable \(X\), since we now have \(P(X=-1) = \frac{7}{9}\) and \(P(X=3) = \frac{2}{9}\).
The code below can be used to help you verify some of your work above.
x <- c(-1,3)
p <- c(7/9,2/9)
cdf <- cumsum(p)
tbl <- data.frame(outcome=x, prob = p, CDF = cdf)
kable(tbl, align = "c")
outcome | prob | CDF |
---|---|---|
-1 | 0.7777778 | 0.7777778 |
3 | 0.2222222 | 1.0000000 |
draw_pmf(x,p)
c(sum_of_prob = sum(p), expected_value = sum(x*p))
## sum_of_prob expected_value
## 1.0000000 -0.1111111
y <- c(1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,30,36)
py <- c(1/36,2/36,2/36,3/36,2/36,4/36,2/36,1/36,2/36,4/36,2/36,1/36,2/36,2/36,2/36,1/36,2/36,1/36)
c(sum_of_prob = sum(py), expected_value = sum(y*py))
## sum_of_prob expected_value
## 1.00 12.25
We’ll look at another example related to the tropical storms example we did earlier this semester. Let \(X\) represent the number of tropical storms in a year that make landfall in Florida. We already saw that this discrete random variable follows a Poison distribution \(f(x;\lambda) =\lambda^x \frac{e^{-\lambda}}{x!}\), and we estimated the parameter to be \(\lambda \approx 5.904762\). The follow code computes and graphs the first 20 values of the probability mass function (it only displays the first 10 in the table).
lambda <- 5.904762
x <- seq(0,20)
p <- lambda^x *exp(-lambda)/factorial(x)
n <-10
tbl <- data.frame(outcome=x[1:n], prob = p[1:n])
kable(tbl, align = "c")
outcome | prob |
---|---|
0 | 0.0027264 |
1 | 0.0160989 |
2 | 0.0475302 |
3 | 0.0935514 |
4 | 0.1380997 |
5 | 0.1630892 |
6 | 0.1605005 |
7 | 0.1353882 |
8 | 0.0999294 |
9 | 0.0655621 |
draw_pmf(x,p)
sum(x*p)
## [1] 5.904738
We now let \(X\) be the random variable that returns the result of rolling a single 6 sided fair die. There are 6 possible outcomes, and they each have an equal probability (hence the die is fair).
x <- seq(1,6)
p <- rep(1/6,6)
draw_pmf(x,p)
We now let \(X\) be the random variable that returns the result of rolling a single 6 sided unfair die. There are 6 possible outcomes, but this time the probabilities of rolling each number depend on the number. Rolling a 2 is twice a likely as rolling a 1. Rolling a 3 is 3 times as likely as rolling a 1. Rolling \(n\) is \(n\) times as likely as rolling a 1. A graph of the probability mass function is given below.
x <- seq(1,6)
p <- c(1/21,2/21,3/21,4/21,5/21,6/21)
draw_pmf(x,p)
In the discrete probability (random variable) examples above, we were able to attach a probability to each individual event. Our goal now is to generalize what we did above to work with continuous probabilities. We can actually reframe every probability problem into the context of dropping a dart at a random spot on a rug. The rug diagrams we construct will be very similar to the ones in the discrete case, but the computational approach will change.
Let’s suppose for a minute that we have a rug (as shown below) and a trap door on the ceiling above the rug.
g <- function(x){2+x*0}
draw_rug(g,0,5)
Each time we open the trap door, a dart falls from above to a random spot on the rug. Assume the rug is oriented on the coordinate plane with the lower left corner at the origin and the bottom side of the rug along the \(x\)-axis. (We will always think of our rugs as having the bottom side of the rug along the \(x\)-axis.) Each point on the rug has the same probability of being hit as any other.
When a dart falls on the point \((x,y)\), we’ll record just the \(x\)-coordinate (so a number between 0 and 5), and let \(X\) represent this random variable. Answer each of the following questions (most of them have answers that appear later on) by considering the area of a rug related to the questions.
We can use a simulation to analyze the last question. Below we pick 1000 random numbers between 0 and 5, and then find their mean. We can also view the first 10 numbers, just to verify that they are not discrete integers, but rather take on any value between 0 and 5 (which is why we call this a continuous random variable).
n <- 1000
x <- runif(n, min = 0, max = 5)
#Here are 10 of the values.
x[1:10]
## [1] 1.0930773 4.5306774 1.7534713 1.7918301 4.0504664 2.0723985 4.7847418
## [8] 0.8473842 4.1266219 1.2615469
#And here is the mean of all 1000.
mean(x)
## [1] 2.578134
n <- 1000000
x <- runif(n, min = 0, max = 5)
#This gives the mean of 1000000
mean(x)
## [1] 2.500779
We computed the mean \(x\)-value in several samples. As the sample size increases, the mean approaches a limit, which we call the expected value of the random variable, and denote using \(\text{E}[X]\) (or \(\mu\)). The expected value gives us a way of discussing a theoretical mean when the possible outcomes is infinite.
For this rectangular rug, notice that the centroid (geometric center) of the rug is at the point \((x,y) = (2.5,1)\). The fact that the expected value of our random variable \(X\) matched the \(x\)-coordinate of the centroid of our rug is more than just a coincidence. In general, we can find the expected value of a random variable by obtaining the geometric center.
We chose a rectangular rug to start with for a reason. Many connections to geometry will generalize quickly as we move away from a rectangular rug. Here are some key points.
Let \(g(x)=\begin{cases}5 & 0\leq x\leq 6\\ 0 &\text{otherwise}\end{cases}\). Consider a rug that lies under this function and above the \(x\)-axis. Let \(X\) represent the random variable that we obtain by randomly dropping a dart on the rug and recording the \(x\)-value.
We now swap to a triangular rug with a corresponding trap door on the ceiling above the rug. The rug is shown below, and lies below the function \(g(x) = x\) for \(0\leq x\leq 5\).
g <- function(x){x}
draw_rug(g,0,5)
Each time we open the trap door, a dart falls from above to a random spot on the rug where each point on the rug has the same probability of being hit as any other. When a dart falls on the point \((x,y)\), we’ll record just the \(x\)-coordinate (so a number between 0 and 5), and let \(X\) represent this random variable. Answer the following questions by considering area arguments.
Let \(g(x)=\begin{cases}2x & 0\leq x\leq 6\\ 0 &\text{otherwise}\end{cases}\). Consider a rug that lies under this function and above the \(x\)-axis. Let \(X\) represent the random variable that we obtain by randomly dropping a dart on the rug and recording the \(x\)-value.
We now swap to a parabolic rug with a corresponding trap door on the ceiling above the rug. The rug is shown below, and lies below the function \(g(x) = x^2\) for \(0\leq x\leq 4\). We call this shape a “parabolic spandrel,” language we’ll need to look up areas and centroids.
g <- function(x){x^2}
draw_rug(g,0,4)
Again, each time we open the trap door, a dart falls from above to a random spot on the rug where each point on the rug has the same probability of being hit as any other. When a dart falls on the point \((x,y)\), we’ll record just the \(x\)-coordinate (so a number between 0 and 4), and let \(X\) represent this random variable.
Download this list of centroids of common shapes. The bottom of the first page includes a parabolic spandrel (the shape of our rug above).
Let \(g(x)=\begin{cases}5x^2 & 0\leq x\leq 2\\ 0 &\text{otherwise}\end{cases}\). Consider a rug that lies under this function and above the \(x\)-axis. Let \(X\) represent the random variable that we obtain by randomly dropping a dart on the rug and recording the \(x\)-value.
In all three of the preceding examples, we started with a function \(g(x)\) and considered the random variable \(X\) that arose from selecting the \(x\)-coordinate of a point randomly chosen from the area under the function \(g(x)\). We then computed the cumulative distribution function \(F(x)\) of \(X\), as well as created a normalized rug function \(f(x) = kg(x)\) by selecting a value for \(k\) so that the area under \(kg(x)\) equaled 1. In all cases, we found that the derivative of the cumulative distribution function equaled the normalized rug function, in other words \[F'(x) = kg(x) = f(x).\] In all these examples, we started with a given rug function and then computed \(F(x)\) from it. In practice, we’ll start with a given random variable \(X\), from which we have the cumulative distribution function \(F(x) = P(X\leq x)\). Taking the derivative of the CDF gives us a function \(f(x) = F'(x)\) that we can think of as a normalized rug function. This leads to the following definition.
The probability density function f(x) of a random variable \(X\) with cumulative distribution function \(F(x)\) is the derivative of \(F(x)\).
As the derivative of \(F(x)\) will always yield a normalized rug function, then the following two properties will always hold for probability density functions.
It turns out that any function \(f(x)\) that satisfies the two properties above can be interpreted as the probability density function of a random variable. If the function is non-negative, and the area under the function is finite, then we can normalize the function (divide by the area) to make it a PDF.
We’ll discuss the reason for the name probability density function after the next section.
We’ll now focus on continuous random variables related to rugs that are formed by sewing together several rectangular rugs.
Our next example is a rug that can be thought of as two different rugs, sewn together at \(x=2.5\). One rug is 2.5 units wide by 2 units tall, while the other is 1.5 units wide by 4 units tall.
g <- function(x){ifelse(x<2.5,2,4)}
draw_rug(g,0,4,num_points=1000)
#The 1000 is needed to cause R to plot enough points
#to avoid a visual glitch with drawing piecewise defined functions.
Again we’ll let the random variable \(X\) be the \(x\)-coordinate of a dart that drops from the ceiling with an equal chance of hitting any point on the rug.
Let’s start by computing the expected value of \(X\). We can do so by finding the \(x\)-coordinate of the centroid of this rug. One option is to chop the rug into two rugs (split it at 2.5), find the centroid of each, and then use \(\frac{\sum x_i A_i}{\sum A_i}\) to obtain the result.
We then compute the center of mass of the two rugs, shown below.
xi <- c(1.25,3.25)
Ai <- c(5,6)
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 11.000000 2.340909
Another option is to chop the rug into 8 rugs with equal width, shown below. Note that the width of each rectangle is \(\Delta x = \frac{4}{8} = 0.5\).
g <- function(x){ifelse(x<2.5,2,4)}
draw_rect_approx(g,0,4,8)
We now use the same weighted average formula \(\frac{\sum x_i A_i}{\sum A_i}\) to compute the \(x\)-coordinate of the centroid. The benefit of using equally spaced rectangles is that we have a simple formula for computing the width of each rectangle, namely the total width divided by the number of rectangles. By letting \(x_i\) be the \(x\)-coordinate of the centroid of each rectangle, we have that the area of the \(i\)th rug is \(A_i = g(x_i) \Delta x\). The code below gives \(x_i\) and \(A_i\) in a table, followed by the total area and expected value (centroid).
dx <- 0.5
xi <- seq(dx/2,4,dx)
Ai <- g(xi)*dx
tbl <- data.frame(x_i = xi, A_i = Ai)
kable(tbl, align = "c")
x_i | A_i |
---|---|
0.25 | 1 |
0.75 | 1 |
1.25 | 1 |
1.75 | 1 |
2.25 | 1 |
2.75 | 2 |
3.25 | 2 |
3.75 | 2 |
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 11.000000 2.340909
The latter method, using rugs of equal width, allows us to use similar code when we doubled the number of rectangles. Note that this makes the width of each rectangle smaller, so for this example we have \(\Delta x = 4/16 = 0.25\).
draw_rect_approx(g,0,4,n=16)
dx <- 0.25
xi <- seq(dx/2,4,dx)
Ai <- g(xi)*dx
tbl <- data.frame(x_i = xi, A_i = Ai)
kable(tbl, align = "c")
x_i | A_i |
---|---|
0.125 | 0.5 |
0.375 | 0.5 |
0.625 | 0.5 |
0.875 | 0.5 |
1.125 | 0.5 |
1.375 | 0.5 |
1.625 | 0.5 |
1.875 | 0.5 |
2.125 | 0.5 |
2.375 | 0.5 |
2.625 | 1.0 |
2.875 | 1.0 |
3.125 | 1.0 |
3.375 | 1.0 |
3.625 | 1.0 |
3.875 | 1.0 |
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 11.000000 2.340909
Why care about adding more rectangles? The answer is because for just about anything that isn’t a rectangle, we can approximate with rectangles. Let’s illustrate this using two examples we’ve already explored, namely the triangular rug, and the rug under a parabola. Note that in general, if there are \(n\) rectangles between \(a\) and \(b\), then we have \(\Delta x = \frac{b-a}{n}\).
For the triangular rug from the previous section, approximating the rug by using 20 rectangles (so \(\Delta x = \frac{5-0}{20} = 0.25\)), yields the exact area and an expected value of 3.33125, rather than the exact value \(\text{E}[X] = 3.3333\bar3\). Our approximation is pretty close. To get a more accurate approximation, we can increase the number of rectangles in our approximation, as shown below.
g <- function(x){x}
a <- 0
b <- 5
n <- 20
dx <- (b-a)/n
draw_rect_approx(g,a,b,n)
xi <- seq(a+dx/2,b,dx)
Ai <- g(xi)*dx
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 12.50000 3.33125
For our parabola function, approximating the rug by using 50 rectangles yields an area approximation of 21.3312 instead of the exact area of \(A = 21.333\bar 3\), and expected value approximation of 2.9997 which is only 0.0003 away from the actual value of \(\text{E}[X] = 3\).
g <- function(x){x^2}
a <- 0
b <- 4
n <- 50
dx <- (b-a)/n
draw_rect_approx(g,a,b,n)
xi <- seq(a+dx/2,b,dx)
Ai <- g(xi)*dx
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 21.3312 2.9997
By using thin rectangles, we can use the approach above to approximate area under most functions. This enables us to compute probabilities, as well as find expected values, of the corresponding random variables. If the approximation isn’t good enough, we increase the number of rectangles.
Let’s end the unit by looking at some examples that are not just rugs that lie under lines or parabolas. We can approximate almost anything with thin rectangles. For example, if our rug is the region under the function \(g(x) = -x^4+2x^3-2x^2+3x+4\) for \(0\leq x\leq 2\), then with 25 rectangles (show below) we get a decent approximation at the total area under \(f\), as well as the \(x\)-coordinate of the centroid.
g <- function(x){-x^4+2*x^3-2*x^2+3*x+4}
a <- 0
b <- 2
n <- 25
dx <- (b-a)/n
draw_rect_approx(g,a,b,n)
xi <- seq(a+dx/2,b,dx)
Ai <- g(xi)*dx
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 10.2709309 0.9873295
While the area and expected value approximations are not exact, we can obtain more precision by increasing the number of rectangles.
Some distributions, such as the exponential, normal, or gamma distribution, allow ends of the rug to stretch to \(\pm\infty\). For these rugs, we can do similar rectangular approximations, though the subtle details are a tad more involved. The example below illustrates using 1000 rectangles for the exponential distribution \(f(x) = 3 e^{-3x}\) with \(x\geq 0\), where we’ve restricted the function to \(x\leq 30\).
f <- function(x){3 * exp(-3*x)}
a <- 0
b <- 30
n <- 1000
dx <- (b-a)/n
draw_rect_approx(f,a,b,n)
xi <- seq(a+dx/2,b,dx)
Ai <- f(xi)*dx
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 0.9996626 0.3335583
The total exact area under the exponential distribution with PDF \(f(x) = 3 e^{-3x}\) is 1 (which we approximated as \(A\approx 0.9996626\)) with expected value \(\text{E}[X] = \frac{1}{3} = 0.333\bar 3\) (which we approximated as \(\text{E}[X] \approx 0.3335583\)). Not too bad for an approximation that resulted from just using lots of rectangles.
So how did people compute the expected value exactly? They took a limit as the number of rectangles increased to infinity. This process goes by the name of “Riemann Sums” and leads to the “Definite Integral.” This is precisely the content of the next section.
Which kind of functions can we apply this process to? We’ll restrict ourselves to a specific set of functions that help us analyze probability. We will require that \(g(x)\geq 0\), as we don’t want negative probabilities. In addition, we will require that the area under \(g(x)\) is finite (but we may allow left and right bounds to become infinite, provided the total area under the \(g(x)\) remains finite). To simplify probability computations, we’ll sometimes require that the total area under our function is 1. If the area \(A\) under a function \(g(x)\) is not 1, note that letting \(k=1/A\) means the area under the normalized function \(f(x) = kg(x)\) is 1. This leads us again to the properties that define probability density functions.
Consider a rug that lies under the function \(g(x) = 4-x\) for \(0\leq x\leq 4\), with corresponding random variable \(X\) obtained by taking the \(x\)-coordinate of a randomly chosen point on the rug. The code below uses 8 rectangles to approximate the area under \(g\), as well as the expected value of \(X\). Increase the number of rectangles to 20, and then 100. Then use geometric arguments to find the exact values for the area and expected value.
g <- function(x){4-x}
a <- 0
b <- 4
n <- 8
dx <- (b-a)/n
draw_rect_approx(g,a,b,n)
xi <- seq(a+dx/2,b,dx)
Ai <- g(xi)*dx
c(total_area =sum(Ai),
expected_value = sum(xi*Ai)/sum(Ai))
## total_area expected_value
## 8.00000 1.34375
Repeat the previous problem using the function \(g(x) = 9-x^2\) for \(0\leq x\leq 3\).
Repeat the previous problem using the function \(g(x) = \sqrt{25-x^2}\) for \(0\leq x\leq 5\). Note that the rug is the
region in the first quadrant that lies inside the circle \(x^2+y^2 =25\) (so a quarter of the circle).
This geometric fact will help you identify the exact values for the area
and expected value from a list of known centroids.
Repeat the previous problem using a function of your choice, obtaining just an approximation for the area and expected value. You can pick a function from a list of known centroids if you want to compare your approximation to the exact value.
Why do we use the words probability density? The word density comes from a notion of talking about mass per length. A thin rope made of twine has a lower mass per length than a steel wire of the same thickness.
We’ve connected probabilities to areas. A normalized rug function \(f(x)\) provides just the height of thin rectangles. We must multiply the height by a width \(\Delta x\) in order to get a probability \(p=f(x) \Delta x\). This means \(f(x) = \frac{p}{\Delta x}\) is a probability \(p\) per length \(\Delta x\), so a probability density.
Remember that a probability density function alone does not give us a probability. We need an interval of \(x\)-values (a width) to provide meaningful probabilities.
From here on out we’ll use \(f(x)\) to represent any function, not just normalized rug functions.
In the previous section we approximated the area under a function \(f(x)\) from \(a\leq x \leq b\) by using rectangles. We chose the number \(n\) of equal width rectangles which gave the width as \(\Delta x = \frac{b-a}{n}\). We then chose the height of each rectangle to be the value of the function at the midpoint \(x_i\) of the rectangle. The area of each rectangle is hence \(A_i = f(x_i)\Delta x\). Our approximation for the area is then the sum of these values, namely \[\sum_{i=1}^n A_i = \sum_{i=1}^n f(x_i)\Delta x.\] This sum is an example of a more general concept, namely a Riemann sum.
Let \(f(x)\) be defined on \(a\leq x\leq b\). Let \(n\) be a positive integer, and divide the interval \([a,b]\) into \(n\) subintervals of equal width. From each subinterval choose a point \(x_i\). We call \[\sum_{i=1}^n f(x_i)\Delta x\] a Riemann sum for \(f\).
In the definition above, each subinterval has an equal width of \(\Delta x = \frac{b-a}{n}\). A more general definition of Riemann sums exists that allows the widths of the intervals to change, but for simplicity we will keep the width constant.
Another observation is that we can choose the point \(x_i\) as any point inside the \(i\)th subinterval. Up till now, we always chose the midpoint of each subinterval, as that point corresponded to the rectangle’s centroid. Other common choices are to use the left endpoint, or the right endpoint, of each subinterval. Let’s briefly explore these two alternatives.
Let \(f(x) = 7-x\) for \(1\leq x\leq 5\). We can think of this as a rug, drawn below, and a quick check with geometry reveals that the area of the rug is \(A=16\).
f <- function(x){7-x}
a <- 1
b <- 5
draw_rug(f,a,b)
Let’s approximate this area using Riemann sums. With \(n=8\) rectangles, we have \(\Delta x = \frac{5-1}{8} = 0.5\). If we let \(x_i\) be the left end point of each subinterval and then use \(f(x_i)\) as the height of each rectangle we obtain the following rectangular approximation to the area under \(f\). Notice how the left edge of each rectangle ends precisely on the function.
f <- function(x){7-x}
a <- 1
b <- 5
n <- 8
draw_rect_approx(f,a,b,n,method = "left")
Each of the 8 rectangles above has the same width, but the height or the \(i\)th rectangle is equal to \(f(x_i)\) where \(x_i\) is the left point \(x\)-value of the \(i\)th rectangle. The points \(x_i\), heights \(f(x_i)\), and corresponding areas \(A_i = f(x_i)\Delta x\), along with the Riemann sum \(\sum f(x_i)\Delta x_i\) are displayed with the code below. Notice that the approximation is greater than \(A=16\), which occurs precisely because each rectangle is larger than the region it is approximating.
dx <- (b-a)/n
xi <- seq(a,b-dx,dx)
kable(data.frame(left_point = xi, function_at_xi = f(xi), area_i = f(xi)*dx))
left_point | function_at_xi | area_i |
---|---|---|
1.0 | 6.0 | 3.00 |
1.5 | 5.5 | 2.75 |
2.0 | 5.0 | 2.50 |
2.5 | 4.5 | 2.25 |
3.0 | 4.0 | 2.00 |
3.5 | 3.5 | 1.75 |
4.0 | 3.0 | 1.50 |
4.5 | 2.5 | 1.25 |
c(riemann_sum_using_left_endpoints = sum(f(xi)*dx))
## riemann_sum_using_left_endpoints
## 17
Let’s repeat the above, but this time we’ll let \(x_i\) be the right most point of each rectangle. Notice in the diagram below that the right edge of each rectangle touches the function.
xi <- seq(a+dx,b,dx)
draw_rect_approx(f,a,b,n,method = "right")
Now each rectangle has an area that is less than the region the rectangle is trying to approximates. The height of each rectangle is now determined by the the right most \(x\)-value of each rectangle. Because each area \(A_i = f(x_i)\Delta x\) is an under approximation, the Riemann sum computed below will be less than the the true area \(A=16\).
kable(data.frame(right_point = xi, function_at_xi = f(xi), area_i = f(xi)*dx))
right_point | function_at_xi | area_i |
---|---|---|
1.5 | 5.5 | 2.75 |
2.0 | 5.0 | 2.50 |
2.5 | 4.5 | 2.25 |
3.0 | 4.0 | 2.00 |
3.5 | 3.5 | 1.75 |
4.0 | 3.0 | 1.50 |
4.5 | 2.5 | 1.25 |
5.0 | 2.0 | 1.00 |
c(riemann_sum_using_right_endpoints = sum(f(xi)*dx))
## riemann_sum_using_right_endpoints
## 15
When we use the midpoint as our choice for \(x_i\) and because the function \(f(x) = 7-x\) represents a line, then each rectangle happens to have the exact same area as the trapezoid it approximates. As such, we will get an exact area from a Riemann sum that uses midpoints, as seen below.
xi <- seq(a+dx/2,b,dx)
draw_rect_approx(f,a,b,n,method = "mid")
c(riemann_sum_using_midpoints = sum(f(xi)*dx))
## riemann_sum_using_midpoints
## 16
f <- function(x){9-x^2}
a <- -2
b <- 3
n <- 20
draw_rect_approx(f,a,b,n,method = "mid")
dx <- (b-a)/n
xi <- seq(a+dx/2,b,dx)
c(riemann_sum = sum(f(xi)*dx))
Let’s examine what happens as we increasing the number \(n\) of subintervals used in a Riemann sum. Using the function \(f(x) = 7-x\) for \(1\leq x\leq 5\), with 20 subintervals and picking right endpoints for our \(x_i's\), we obtain the following diagram and Riemann sum of 15.6 (an underestimate to the total area).
f <- function(x){7-x}
a <- 1
b <- 5
n <- 20
draw_rect_approx(f,a,b,n,method = "right")
dx <- (b-a)/n
xi <- seq(a+dx,b,dx)
c(riemann_sum_using_right_endpoints = sum(f(xi)*dx))
## riemann_sum_using_right_endpoints
## 15.6
The following table shows the Riemann sum (using right endpoints) as we increase \(n\). \[\begin{array}{c|c} n&\sum_{i=1}^nf(x_i)\Delta x\\ \hline 8 & 15 \\ 20 & 15.6 \\ 100 & 15.92 \\ 1000& 15.992 \\ 10000& 15.9992 \\ 100000& 15.99992 \end{array}\] From this table it appears that \[\lim_{n\to \infty }\sum_{i=1}^nf(x_i)\Delta x = 16.\] By computing a limit of Riemann sums we were able to extract the exact area of \(A = 16\). A limit of Riemann sums we call a definite integral.
For a fuction \(f(x)\) defined on \(a\leq x\leq b\), the definite integral of \(f\) from \(a\) to \(b\) is \[\int_a^b f(x) dx = \lim_{n\to \infty }\sum_{i=1}^nf(x_i)\Delta x,\] provided the limit exists. If the limit exists, we say that \(f\) is integrable on \([a,b]\).
There are several names attached to pieces of a definite integral \(\displaystyle\int_a^b f(x) dx\).
Note that R is not designed to compute definite integrals exactly, rather we can approximate them quite well using Riemann sums. Most computer algebra systems can perform these computations quite easily. We’ll use Mathematica in our course to compute definite integrals. For example, to compute \(\displaystyle\int_1^5 (7-x) dx\) in Mathematica, we type the following code.
Integrate[7-x,{x,1,5}]
Let’s now look at several examples. For the definite integral \(\displaystyle\int_0^4 x^2 dx\), the integrand is \(f(x)=x^2\), the variable of integration is \(x\), the lower bound is \(a=0\), and the upper bound is \(b=4\). The corresponding Mathematica code is
Integrate[x^2,{x,0,4}]
Typing this into Mathematica yields \(\displaystyle\int_0^4 f(x) dx = \frac{64}{3}.\) Note that is this the area of the parabolic rug shown below.
f <- function(x){x^2}
draw_rug(f,0,4)
Note that the functions \(f(x) = x^2\) and \(f(t) = t^2\), both defined over all real numbers, are the same function. The name we attach to the input variable does not change the nature of the function. As such, we sometimes call the variable of integration a “dummy” variable. We get \(\displaystyle\int_0^4 f(t) dt = \frac{64}{3}\) and \(\displaystyle\int_0^4 f(x) dx = \frac{64}{3}.\)
Verify each of the following with Mathematica.
The last two examples above show that for a function with various inputs, such as \(f(c,x) = cx^2\), specifying which variable is the variable of integration matters.
Consider now the function \[f(x) = \begin{cases}3x & 0\leq x \leq 2\\ 0 &\text{otherwise}\end{cases}.\] This function is defined for all real numbers, but is only nonzero for \(0\leq x \leq 2\). To compute \(\displaystyle\int_{0}^{5} f(x) dx\), we note that \(f(x) = 0\) for \(2\leq x\leq 5\), and as such \[\displaystyle\int_{0}^{5} f(x) dx = \displaystyle\int_{0}^{2} f(x) dx = \displaystyle\int_{0}^{2} 3x dx = 6.\] The last step of the computation we performed in Mathematica. In a similar fashion, with infinite limits we obtain the same result \[\displaystyle\int_{-\infty}^{\infty} f(x) dx = \displaystyle\int_{0}^{2} f(x) dx = \displaystyle\int_{0}^{2} 3x dx = 6.\]
Consider now the function \[f(x;\lambda) = \lambda e^{-\lambda x} = \begin{cases}\lambda e^{-\lambda x} & x\geq 0\\ 0 & x<0\end{cases},\] Where \(\lambda\) is assumed to be some positive real number. We compute \[\displaystyle\int_{-\infty}^{\infty} f(x) dx = \displaystyle\int_{0}^{\infty} f(x) dx = \displaystyle\int_{0}^{\infty} \lambda e^{-\lambda x} dx = 1,\] where the last step is obtained in Mathematica using the code below.
Assuming[\[Lambda] > 0, Integrate[\[Lambda] Exp[-\[Lambda] x], {x, 0, Infinity}]]
Note two things above.
\[Lambda]
. This is merely a
typesetting issue, and not crucial to computations.Try running the following lines of code in Mathematica to see what happens if we leave either of the above off.
Integrate[\[Lambda] Exp[-\[Lambda] x], {x, 0, Infinity}]
Assuming[lambda > 0, Integrate[lambda Exp[-lambda x], {x, 0, Infinity}]]
To finish this section, remember that for a non-negative function \(f(x)\) defined over the interval \([a,b]\), a Riemann \(\sum f(x) \Delta x\) provides an approximation to the area under \(f(x)\) above \([a,b]\) which is given by the definite integral \(\displaystyle\int_a^b f(x)dx\). As such, we can often compute a definite integral by giving a geometric argument, as shown in the following examples.
Use the code below to check the last computation with Mathematica.
Integrate[Sqrt[9 - (x - 4)^2], {x, 1, 7}]
If you have not yet, go back through the section and use Mathematica to verify each definite integral computation.
Use an area argument, rather than software or Riemann sums, to state the value of each definite integral. Check your solution with software.
With the definite integral defined, we now return our attention to probability and continuous random variables. We state several definitions, two of which we’ve seen before but now we state in terms of definite integrals.
A probability density function \(f(x)\) is a nonnegative (\(f(x)\geq 0\)) function that satisfies \(\displaystyle\int_{-\infty}^{\infty}f(x)dx = 1\).
Note that if a function is zero everywhere except on an interval \(a\leq x\leq b\), then the above simplifies to \(\displaystyle\int_a^b f(x) dx = 1\). This same principle applies to the next 3 definitions.
The expected value of a continuous random variable \(X\) with probability density function \(f(x)\) is \[\text{E}[X] = \int_{-\infty}^{\infty} x f(x) dx.\]
The variance of a continuous random variable\(X\) with probability density function \(f(x)\) is \[\text{Var}[X] = \int_{-\infty}^{\infty} (x-\text{E}[X])^2 f(x) dx.\]
The standard deviation of a continuous random variable \(X\) with probability density function \(f(x)\) is equal to the square root of the variance, so \(\sigma_X = \sqrt{\text{Var}[X]}\).
When the context is clear, we’ll drop the subscript an simply use \(\sigma\) for the standard deviation.
Note that for a discrete random variable \(X\) where outcome \(x_i\) has probability \(p_i\), we defined the expected value to be \(\text{E}[X] = \sum x_i p_i\). For a continous random variable with probability density function \(f(x)\), multiplying the density \(f(x)\) by a length \(dx\) gives a probability. Replacing \(\sum\) with \(\int_a^b\), then \(x_i\) with \(x\), and finally \(p_i\) with \(f(x)dx\) shows how \[\sum x_i p_i \quad\text{becomes}\quad \underbrace{\int_a^b}_{\sum} \underbrace{x}_{x_i}\underbrace{f(x) dx}_{p_i}.\] Using this same correspondence, we could obtain formulas for the variance and standard deviation of discrete random variables. You may encounter the expected value, variance, and standard deviation of random variables in future courses. The goal in this section is to just practice using definite integrals to compute them.
Time to practice. Remember to use Mathematica to compute all definite integrals.
Assuming[]
command. )