Bias & Variance

What is Bias?

In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased. In statistics, “bias” is an objective statement about a function, and while not a desired property, it is not pejorative, unlike the ordinary English use of the term “bias”. WikiPedia

Bias Variance Estimate

So the following equation is a biased estimate for \(\sigma^2\).

\[\frac{\sum_{i=1}^n(x_i-\bar{x})^2}{n}\]

The following R code shows a simulation that exemplifes the bias. I have created a funtion that calculates bias variance. The biasV variable is a repeated sample of 5000 variance estimates of a sample of size 5 (n=5).

biasVar = function(x){ sum((x-mean(x))^2)/length(x) }

biasV = replicate(5000,biasVar(rnorm(5,mean=5,sd=sqrt(25))))
mean(biasV)

## [1] 19.9774

UnBias Variance Estimate

Now let’s do the same thing again. However, this time we will use the unbiased variance estimate.

unbiasedV = replicate(5000,var(rnorm(5,mean=5,sd=sqrt(25))))
mean(unbiasedV)

## [1] 25.41366

Comparison

Now we see that, on average, that the biased estimator is too small. It is in fact \(\frac{1}{n-1}\) biased. So if we do the same example with n=10 what should we expect? What about n=25?

#### n=10 chunk
n=10
biasVar = function(x){ sum((x-mean(x))^2)/length(x) }
biasV = replicate(5000,biasVar(rnorm(n,mean=5,sd=sqrt(25))))
unbiasedV = replicate(5000,var(rnorm(n,mean=5,sd=sqrt(25))))
mean(biasV)

## [1] 22.54333

mean(unbiasedV)

## [1] 25.46702

####

n=25
biasV = replicate(5000,biasVar(rnorm(n,mean=5,sd=sqrt(25))))
unbiasedV = replicate(5000,var(rnorm(n,mean=5,sd=sqrt(25))))
mean(biasV)

## [1] 24.01788

mean(unbiasedV)

## [1] 24.99065

You can copy that “n=10” chunk and run it in R with different sample sizes to explore the idea farther. There are in fact math proofs for this that we can discuss later.