This project outline and background information have been provided to assist you as you complete your project. You should assume the reader of your work has no knowledge or access to this information.
How long does an LED light bulb last? Lumens are a measure of how much light you get from a bulb. When you first turn on an LED bulb, the lumen output slightly increases for a while, going above the initial brightness. While LEDs do not “burn out”, after peaking the lumen output stays relatively constant before it starts to decrease in lumen output. In the bulb data we will use, lumen measures are normalized to the initial intensity of the bulb, so that we can compare different bulbs.
In 2008, the US Department of Energy launched the Bright Tomorrow Lighting Prize (or L Prize) to encourage the development of high-efficiency replacement for the incandescent light bulb. To win the prize the bulb needed a lifetime longer than 25,000 hours (almost 3 years). Source
We do not have three years of data on our bulbs so we will use mathematical models to predict the lifetime. Our work in this project relies on using loglikelihood functions to fit several assumed mathematical models. We will (1) use optimization to fit deterministic models to data and (2) use the fitted models to provide information about an LED bulb.
In this project, we’ll be fitting the data to deterministic models, functions \(f(t)\), that give the lumen output of LED bulbs (as a percent of the initial lumens) after \(t\) hours. The input of the models is time, \(t\), measured in hours since the bulb is turn on. The output of the models is bulb intensity, \(f(t)\), measured as a percent of initial bulb intensity. By choosing to normalize bulb intensity in this way, we have fixed the initial output as 100% of the original intensity, \(f(0) = 100\). For this project, we will use 80% of the initial intensity1 as the threshold for determining the lifetime of a light bulb. This means once the bulb intensity decreases below 80% we will consider this the life of the bulb (in other words we will consider the bulb “burned out”).
Create an R Markdown file.
Consider the following general models.
Assume the errors are independent and normally distributed (with mean of 0 and standard deviation of 1). Assume \((t_i,y_i)\) is a list of 44 data points to be provided. Show that the loglikelihood functions for the errors when fitting \(f_1\), \(f_2\), \(f_4\), \(f_5\), and \(f_6\) to the 44 data points are as given below. Write out your solutions (include all the steps) and include a picture of each of your calculations (or you can use the Latex Cheat Sheet if you would like to try to type out your calculations). Here is a computation done for \(\ell_1\) that you are welcome to use for guidance.
The final answers above are provided to help you know when you’re finished. Your job is to lead your reader through your calculations to get to these answers. Make sure you include enough steps, including explanations, throughout your calculations so a reader at the calculus 1 level can understand your solution. What properties of logarithms and sums did you use? Be explicit.
Organize your work into a cohesive analysis and submit the html file on Canvas.
Create a new R Markdown file.
Use the seed=
argument in the
led_bulb()
function to set the seed and use the following
code to read in the light bulb data.
#If needed, refer to Project 1 Task 1 to install the data4led package.
library(data4led)
#Change the DDDD below to your assigned seed, and then load the data for that randomly selected bulb.
#This is part of what makes your work reproducible.
bulb <- led_bulb(1,seed = DDDD)
## Error in eval(expr, envir, enclos): object 'DDDD' not found
This code creates a data frame called “bulb”. The bulb data frame contains measurements for one randomly selected bulb at many time points. You will need to set the seed so that you will have your own random, but reproducible, data with which to work. Please set the seed as the four digit number from the class list of assigned seeds.
The bulb data frame, a table, includes the columns (1) “id”, the identification number for your randomly selected bulb, (2) “hours”, the number of hours since the bulb has turned on, (3) “intensity”, the lumen output of the bulb, (4) “normalized_intensity”, the lumen at that time divided by the lumen of your bulb at time 0, and (5) “percent_intensity”, the bulb intenstity as a percent of the original lumen (notice the first row in this column is 100).
Beneath the instructions for this task, there are two sections titled “Maximum Likelihood Method for \(f_2\)” and “Maximum Likelihood Method for \(f_6\).” Each is a cohesive analysis that applies the maximum likelihood method to the functions \(f_2\) and \(f_6\), using the random seed 123.
For each of the 3 remaining models (\(f_1\), \(f_4\), \(f_5\)), use the maximum likelihood method (assuming the errors follow a standard normal distribution) to obtain the parameters for the model with the maximum likelihood, given the data corresponding to your seed. For each model, you will need to do the following:
Include an image from the Shiny App below showing you have found all the correct parameter values. Think of this as a look up table that can either be included at the top of your document (as an executive summary) or at the bottom of your document (as a conclusion). Note: when using fitted models it is best practice not to round in any preliminary calculations, so make sure you use all known decimal places for the parameter values, NOT the rounded values, when you use the fitted model in plots or other places. However, when presenting the fitted models in your narrative, it’s customary to round parameters values to 3 significant figures.
CHECK YOUR WORK:
Organize your work into a cohesive analysis and submit the html file on Canvas.
Consider the model \(f_2(t; a_1, a_2) = 100 + a_1t + a_2t^2\). The function \(f_2\) models the brightness of a lightbulb, measured as a percent of the original intensity of the lightbulb, given the number of hours the lightbulb as been on, \(t\). We will fit \(f_2\) to the list of 44 measurements, \((t_i, y_i)\), obtained from the data4led package using the seed 123.
Assuming the residuals (or errors) are independent and normally distributed (with mean 0 and standard deviation 1), the loglikelihood function for these errors is \[\ell_2(a_1,a_2; \mathbf{t},\mathbf{y}) = 44\ln\left(\frac{1}{\sqrt{2\pi}}\right) + \sum_{i=1}^{44} \left(-\frac{1}{2}(y_i - 100 - a_1t_i - a_2t_i^2)^2\right).\] We want to find the maximum of \(\ell_2\). The first partials of \(\ell_2\) are
To find the critical points of \(\ell_2\), we set each partial derivative above equal to zero and then solve \[\left\{ \begin{array}{ll} \left(\sum_{i=1}^{44} (y_i - 100)t_i\right) - \left(\sum_{i=1}^{44}t_i^2\right)a_1 - \left(\sum_{i=1}^{44}t_i^3\right)a_2 &= 0 \\ \left(\sum_{i=1}^{44} (y_i - 100)t_i^2\right) - \left(\sum_{i=1}^{44}t_i^3\right)a_1 - \left(\sum_{i=1}^{44}t_i^4\right)a_2 &= 0. \end{array} \right.\] We notice that this system is of the form \[\left\{ \begin{align*} b_1 - c_{11}a_1 - c_{12}a_2 &= 0 \\ b_2 - c_{21}a_1 - c_{22}a_2 &= 0, \end{align*} \right.\] with
Notice that \(\sum_{i=1}^{44} (y_i - 100)t_i\), \(\sum_{i=1}^{44}t_i^2\), \(\sum_{i=1}^{44} t_i^3\), \(\sum_{i=1}^{44} (y_i - 100)t_i^2\), and \(\sum_{i=1}^{44} t_i^4\) are just constants that depend on the given data. We can calculate (and store) their values using the R code below.
library(data4led)
bulb <- led_bulb(1,seed=123) #Remember to use your assigned seed!
t <- bulb$hours
y <- bulb$percent_intensity
c.11 <- sum(t^2)
c.12 <- sum(t^3)
c.22 <- sum(t^4)
b.1 <- sum((y-100)*t)
b.2 <- sum((y-100)*t^2)
Since we noticed this system is of a general form we have already solved, then we can use the solution from previous work. We found that the solution to this system is \[a_2 = \frac{c_{11}b_2 - c_{12}b_1}{c_{11}c_{22} - c_{12}^2}\text{ and }a_1 = \frac{b_1 - c_{12}a_2}{c_{11}}.\] Below we use R to calculate \(a_1\) and \(a_2\) using the formula above.
best.a2 <- (c.11*b.2 - c.12*b.1)/(c.11*c.22 - c.12^2)
best.a1 <- (b.1 - c.12*best.a2)/c.11
#While we calculate them in reverse order, let's display them in order
best.a1
## [1] 0.001190918
best.a2
## [1] -1.743522e-07
The critical point for \(\ell_2\) is \((a_1,a_2) = (0.0011909, -1.7435215\times 10^{-7})\). Let’s use the second derivative test to confirm that this critical point is the location of a maximum of \(\ell_2\). The second partials of \(\ell_2\) are below. We will need the second partials for the second derivative test.
When then compute \[D = \left(\frac{\partial^2\ell_2}{\partial a_1^2}\right)\left( \frac{\partial^2\ell_2}{\partial a_2^2}\right) - \left(\frac{\partial^2\ell_2}{\partial a_2 \partial a_1}\right)^2 = \left(- \sum_{i=1}^{44}t_i^2\right)\left(- \sum_{i=1}^{44}t_i^4\right) - \left(- \sum_{i=1}^{44}t_i^3\right)^2.\] To use the second derivative test, we need numerical values for both \(D\) and \(\frac{\partial^2\ell_2}{\partial a_1^2}\). The code below computes both these values.
D <- (-c.11)*(-c.22) - (-c.12)^2
D
## [1] 1.23003e+23
-c.11 #the second partial with respect to a1 twice
## [1] -328767530
We see that \(D = 1.2300296\times 10^{23}\), which means \(D>0\). The fact that \(D>0\), together with \(\frac{\partial^2\ell_2}{\partial a_1^2} = -3.2876753\times 10^{8}<0\) means that our critical point corresponds to a local maximum (by the second derivative test). The completes the computations for the maximum likelihood method.
Our best fit model is \[f_2(t) = 100 + (0.0011909)t +(-1.7435215\times 10^{-7})t^2\] where \(t \geq 0\). Let’s confirm this fit visually with the following R code.
f2 <- function(x,a0=0,a1=0,a2=1){
a0 + a1*x + a2*x^2
}
a0 <- 100
a1 <- best.a1
a2 <- best.a2
x <- seq(-10,80001,2)
par(mfrow=c(1,2),mar=c(2.5,2.5,1,0.25))
plot(t,y,xlab="Hour ", ylab="Intensity(%) ", pch=16,main='f2')
lines(x,f2(x,a0,a1,a2),col=2)
plot(t,y,xlab="Hour ", ylab="Intensity(%) ", pch=16, xlim = c(-10,80000),ylim = c(-10,120))
lines(x,f2(x,a0,a1,a2),col=2)
Notice that the graph of the fitted function does appear to provide a good visual fit to the data, as seen in the image above. The story told by this model suggests that the light bulb will burn out (hit 80% intensity) somewhere between 10 and 20 thousand hours (we could compute this exactly with uniroot).
We now consider the function \(f_6(t; a_1, a_2) = 100 + a_1t + a_2(1 - e^{-0.0003t})\). The loglikelihood function we previously computed to be \[\ell_6(a_1,a_2; \mathbf{t},\mathbf{y}) = 44\ln\left(\frac{1}{\sqrt{2\pi}}\right) - \frac{1}{2}\sum_{i=1}^{44} (y_i - 100 - a_1t_i - a_2(1 - e^{-0.0003t_i}))^2.\] We will now apply the maximum likelihood method (assuming the errors are normally distributed) to identify the best fit parameters for this model using the data from our seed. Let’s first load the data, using the code below.
library(data4led)
bulb <- led_bulb(1,seed=123) #Remember to use your assigned seed!
t <- bulb$hours
y <- bulb$percent_intensity
To find the critical values of \(\ell_6\) we need to solve the system of linear equations obtained by setting each partial derivative equal to zero, which means we must solve the system \[\left\{ \begin{align*} \left(\sum_{i=1}^{44} (y_i - 100)t_i\right) - \left(\sum_{i=1}^{44}t_i^2\right)a_1 - \left(\sum_{i=1}^{44}t_i(1 - e^{-0.0003t_i})\right)a_2 &= 0 \\ \left(\sum_{i=1}^{44} (y_i - 100)(1 - e^{-0.0003t_i})\right) - \left(\sum_{i=1}^{44}t_i(1 - e^{-0.0003t_i})\right)a_1 - \left(\sum_{i=1}^{44}(1 - e^{-0.0003t_i})^2\right)a_2 &= 0. \end{align*} \right.\] Once again we notice this system of equation is of the form \[\left\{ \begin{align*} b_1 - c_{11}a_1 - c_{12}a_2 &= 0 \\ b_2 - c_{21}a_1 - c_{22}a_2 &= 0. \end{align*} \right.\] where \(\sum_{i=1}^{44} (y_i - 100)t_i\), \(\sum_{i=1}^{44}t_i^2\), \(\sum_{i=1}^{44}t_i(1 - e^{-0.0003t_i})\), \(\sum_{i=1}^{44} (y_i - 100)(1 - e^{-0.0003t_i})\), and \(\sum_{i=1}^{44}(1 - e^{-0.0003t_i})^2\) are just constants that depend on the given data. Again we can calculate (and store) their values using R, as well as solve the system, using the code below.
c.11 <- sum(t^2)
c.12 <- sum(t*(1-exp(-0.0003*t)))
c.22 <- sum((1-exp(-0.0003*t))^2)
b.1 <- sum((y-100)*t)
b.2 <- sum((y-100)*(1-exp(-0.0003*t)))
best.a2 <- (c.11*b.2 - c.12*b.1)/(c.11*c.22 - c.12^2)
best.a1 <- (b.1 - c.12*best.a2)/c.11
#While we calculate them in reverse order, let's display them in order
best.a1
## [1] -0.0008219878
best.a2
## [1] 7.442637
To finish the maximum likelihood method, we now use the second derivative test. The second partials of \(\ell_6\) are
The code below uses R to compute \[ \begin{align*} D &= \left(\frac{\partial^2\ell_6}{\partial a_1^2}\right)\left( \frac{\partial^2\ell_6}{\partial a_2^2}\right) - \left(\frac{\partial^2\ell_6}{\partial a_2 \partial a_1}\right)^2 \\ &= \left(-\sum_{i=1}^{44}t_i\right)\left(- \sum_{i=1}^{44}(1-e^{-0.0003t_i})^2\right) - \left(-\sum_{i=1}^{44}t_i(1-e^{-0.0003t_i})\right)^2 \end{align*}\] and \(\frac{\partial^2\ell_6}{\partial a_1^2}\).
D <- (-c.11)*(-c.22) - (-c.12)^2
D
## [1] 73001831
-c.11 #the second partial with respect to a1 twice
## [1] -328767530
Because \(D = 7.3001831\times 10^{7}>0\) and \(\frac{\partial^2\ell_2}{\partial a_1^2} = -3.2876753\times 10^{8}<0\), we know that our critical point corresponds to a local maximum (by the second derivative test). The completes the computations for the maximum likelihood method.
Our best fit model is \[f_6(t) = 100 + (-8.2198784\times 10^{-4})t +(7.4426368)(1 - e^{-0.0003t})\] where \(t \geq 0\). Let’s confirm this fit visually with the following R code.
f6 <- function(x,a0=100,a1=0,a2=1){
a0 + a1*x + a2*(1-exp(-0.0003*x))
}
a0 <- 100
a1 <- best.a1
a2 <- best.a2
x <- seq(-10,80001,2)
par(mfrow=c(1,2),mar=c(2.5,2.5,1,0.25))
plot(t,y,xlab="Hour ", ylab="Intensity(%) ", pch=16,main='f6')
lines(x,f6(x,a0,a1,a2),col=2)
plot(t,y,xlab="Hour ", ylab="Intensity(%) ", pch=16, xlim = c(-10,80000),ylim = c(-10,120))
lines(x,f6(x,a0,a1,a2),col=2)
The figure above verifies that our fitted function does indeed provide a good visual fit for the model. For this model, the light bulb will burn out (hit 80% intensity) somewhere around 30,000 hours (again uniroot can find this time exactly).
uniroot()
function in R to find the approximate
solution for where each of your five fitted models is at 80% of the
initial intensity. So solve the equation \(f_i(t) = 80\) for each of the five fitted
models.
This number is a simplified story for illustrative purposes only.↩︎