# Use this R-Chunk to import all your datasets!

Background

How long does an LED light bulb last? Lumens are a measure of how much light you get from a bulb. When you first turn on an LED bulb, the lumen output slightly increases for a while, going above the initial brightness. While LEDs do not “burn out”, after peaking the lumen output stays relatively constant before it starts to decrease in lumen output. In the bulb data we will use, lumen measures are normalized to the initial intensity of the bulb, so that we can compare different bulbs.

In 2008, the US Department of Energy launched the Bright Tomorrow Lighting Prize (or L Prize) to encourage the development of high-efficiency replacement for the incandescent light bulb. To win the prize the bulb needed a lifetime longer than 25,000 hours (almost 3 years). Source

We do not have three years of data on our bulbs so we will use mathematical models to predict the lifetime. Our work in this project relies on using loglikelihood functions to fit several assumed mathematical models. We will (1) use optimization to fit deterministic models to data and (2) use the fitted models to provide information about an LED bulb.

In this project, we’ll be fitting the data to deterministic models, functions f(t), that give the lumen output of LED bulbs (as a percent of the initial lumens) after t hours. The input of the models is time, \(t\), measured in hours since the bulb is turn on. The output of the models is bulb intensity, \(f(t)\), measured as a percent of initial bulb intensity. By choosing to normalize bulb intensity in this way, we have fixed the initial output as 100% of the original intensity, \(f(0)=100\). For this project, we will use 80% of the initial intensity as the threshold for determining the lifetime of a light bulb. This means once the bulb intensity decreases below 80% we will consider this the life of the bulb (in other words we will consider the bulb “burned out”).

Task 1

Consider the following general models.

  • \(f_1(t;a_1)=100+a_1t\) where \(t\geq 0\)
  • \(f_2(t;a_1,a_2)=100+a_1t+a_2t^2\) where \(t\geq 0\)
  • \(f_4(t;a_1,a_2)=100+a_1t+a_2\ln(0.005t+1)\) where \(t\geq 0\)
  • \(f_5(t;a_1)=100e^{−0.00005t}+a_1te^{−0.00005t}\) where \(t\geq 0\)
  • \(f_6(t;a_1,a_2)=100+a_1t+a_2(1−e^{−0.0003t})\) where \(t\geq 0\)

We now assume the errors are independent and normally distributed (with mean of 0 and standard deviation of 1) and that \((t_i,y_i)\) is a list of 44 data points to be provided. We will compute the loglikelihood functions for the errors when fitting \(f_1\), \(f_2\), \(f_4\), \(f_5\), and \(f_6\) to these 44 data points. Recall that the likelihood function for a single normally distributed random variable with mean 0 and standard deviation 1 is \[f(r) = \frac{1}{\sqrt{2\pi}}e^{-r^2/2}.\] Note that in our work below, the variable \(r\) will represent the residual, or error, between our data and model.

Loglikelihood for \(f_1\)

For the model \(f_1(t;a_1)=100+a_1t\) where \(t\geq 0\) with 44 data points \((t_i,y_i)\), note that each residual is given by \(r_i = y_i - 100 - a_1t_i\). We will assume these errors are normally distributed with mean 0 and standard deviation 1. The likelihood function for a single error \[f(r_i) = \frac{1}{\sqrt{2\pi}}e^{-r_i^2/2} = \frac{1}{\sqrt{2\pi}}e^{-(y_i - 100 - a_1t_i))^2/2}.\] We then assume the errors are independent of each other, which means we can compute their product to obtain the likelihood of all 44 observations as \[L_1(a_1;\textbf{t},\textbf{y}) = \prod_{i=1}^{44}f(r_i) = \prod_{i=1}^{44}\frac{1}{\sqrt{2\pi}}e^{-\left(y_i - 100 - a_1t_i\right)^2/2}.\] We obtain the loglikelihood function by computing \[\begin{align*} \mathscr{l_1(a_1;\textbf{t},\textbf{y})} &= \ln(L_1(a_1;\textbf{t},\textbf{y})) &\text{(definition)}\\ &= \ln\left(\prod_{i=1}^{44}\frac{1}{\sqrt{2\pi}}e^{-\left(y_i - 100 - a_1t_i\right)^2/2}\right) &\text{(substitution)}\\ &= \sum_{i=1}^{44}\ln\left(\frac{1}{\sqrt{2\pi}}e^{-\left(y_i - 100 - a_1t_i\right)^2/2}\right) &\text{(logs turn products to sums)}\\ &= \sum_{i=1}^{44}\left(\ln\left(\frac{1}{\sqrt{2\pi}}\right)+\ln\left(e^{-\left(y_i - 100 - a_1t_i\right)^2/2}\right)\right)&\text{(another product to sum)}\\ &= \sum_{i=1}^{44}\ln\left(\frac{1}{\sqrt{2\pi}}\right)+\sum_{i=1}^{44}\ln\left(e^{-\left(y_i - 100 - a_1t_i\right)^2/2}\right)&\text{(separate the sum)}\\ &= \ln\left(\frac{1}{\sqrt{2\pi}}\right)\sum_{i=1}^{44}1+\sum_{i=1}^{44}\ln\left(e^{-\left(y_i - 100 - a_1t_i\right)^2/2}\right)&\text{(pull out constant)}\\ &= \ln\left(\frac{1}{\sqrt{2\pi}}\right)\sum_{i=1}^{44}1+\sum_{i=1}^{44}-\frac{1}{2}\left(y_i - 100 - a_1t_i\right)^2\ln\left(e\right)&\text{(bring power down)}\\ &= 44\ln\left(\frac{1}{\sqrt{2\pi}}\right)-\frac{1}{2}\sum_{i=1}^{44}\left(y_i - 100 - a_1t_i\right)^2&\text{(simplify, note 44 sums of 1 equals 44)}. \end{align*}\]

Loglikelihood for \(f_2\)

Loglikelihood for \(f_4\)

Loglikelihood for \(f_5\)

Loglikelihood for \(f_6\)