We are going to try to do some analysis that requires confidence intervals. With your group work out the best analysis and visualization. Make a two slide presentation that has a plot on one slide and your numerical analysis output and interpretation on the second slide.
On page 258 of Akritas he introduces the Charpy impact test. It measures the amount of energy absorbed by a metal during fracture. This video demonstrates the technique. NIST still uses this method[1][2].
We will use the Akritas data and some equipment validation data from NIST. They have an interesting plot here as well.
Your employer wants to sell a tool for this type of test to the industry. The NIST HH evaluation data shows the permissible manufacturing standards based on 187 different evaluations using their test specimens. We will largely be using the mean
and std_dev
columns for our work.
x = c(4.9,3.38,3.32,2.38,3.14,2.97,3.87,3.39,2.97,3.45,3.35,4.34,3.54,2.46,4.38,2.92)
nistHH = read.csv("http://www.nist.gov/sites/default/files/documents/mml/acmd/structural_materials/HH-105-48877.csv",stringsAsFactors = FALSE,skip = 2)
#nistSH = read.csv("https://www.nist.gov/sites/default/files/documents/mml/acmd/structural_materials/SH-31-46485.csv",stringsAsFactors = FALSE, skip=2)
#nistLL = read.csv("https://www.nist.gov/sites/default/files/documents/mml/acmd/structural_materials/LL-118-54385.csv",stringsAsFactors = FALSE,skip=2)
nistHH.lm = lm(nistHH$std_dev~1,data=nistHH)
# This will provide a confidence interval on the mean
confint(nistHH.lm)
# or we could use the predict statement as well
predict(nistHH.lm,data.frame(1),interval="confidence")
# Now we want to calculate an interval for the median
#install.packages("BSDA")
library(BSDA)
# See page 267 of Akritas for a description.
SIGN.test(nistHH$std_dev,alternative = "two.sided",conf.level=.95)
### For an extra push ####
## We could use the bootsrap to provide a confidence interval for the mean
# https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf
# Percentile Bootsrap
bs5000 = replicate(5000,mean(sample(nistHH$std_dev,length(nistHH$std_dev),replace=TRUE)))
quantile(bs50000,c(.025,.975))
### a more accurate method is as follows. It is called the empirical bootstrap
bs5000_dev = replicate(5000,mean(nistHH$std_dev)-mean(sample(nistHH$std_dev,length(nistHH$std_dev),replace=TRUE)))
mean(nistHH$std_dev)+quantile(bs5000_dev,c(.025,.975))
### And do it for the median as well
bs5000_dev_median = replicate(5000,median(nistHH$std_dev)-median(sample(nistHH$std_dev,length(nistHH$std_dev),replace=TRUE)))
median(nistHH$std_dev)+quantile(bs5000_dev_median,c(.025,.975))
From the homework we had some Rexburg housing market data. Data was collected on homes for sale in Madison County as of January 2011. Information on the listings such as price, size of the home, and style were recorded. Open the data file MadisonCountyRealEstate.
Your construction company is looking for new cities to expand the market. It is not financially feasible to get into a market unless they can sell houses for over $80 a square foot. They would like you to analyze the data from Rexburg to decide if the market is worth it. They mentioned that they are not in the market to build houses larger than 4,200 ft2 or smaller than 2,000 ft2.
library(readr)
library(ggplot2)
Madison = read.csv(file = "http://raw.githubusercontent.com/byuistats/data/master/MadisonCountyRealEstate/MadisonCountyRealEstate.csv", header = TRUE, stringsAsFactors = FALSE)
Madison.subset = subset(Madison,SQFT>=2000 & SQFT<=4000)
qplot(data=Madison.subset,x=SQFT,y=ListPrice)+geom_smooth(method="lm")
qplot(data=Madison.subset,x=ListPrice/SQFT)
qplot(data=Madison.subset,y=ListPrice/SQFT,x=SQFT)
madison.lm1 = lm(ListPrice~SQFT,data=Madison.subset)
madison.lm2 = lm(ListPrice/SQFT~1,data=Madison.subset)
confint(madison.lm1)
confint(madison.lm2)