Regression Practice

Introduction

In this assignment, you will practice regression analysis including:

  • Plotting bivariate data with a regression line
  • Calculating and interpreting the correlation coefficient, r
  • Fitting a linear regression analysis
  • Verifying if a linear model is model is adequate:
    • Checking for linearity (scatterplot)
    • Checking for constant variance (plot(lm_output, which=1))
    • Checking for normality of residuals (qqPlot(lm_output$residuals))
library(tidyverse)
library(mosaic)
library(rio)
library(car)

Car Prices and Mileage

You are interested in purchasing an all wheel drive Acura MDX for those slick Rexburg winters. You found what you think is a good deal for on a low-mileage 2020 model but you’d like to be sure. You go on Autotrader.com and randomly select 23 Acura MDX’s and collect Price and Mileage information.

Load the data and use R to answer the questions below.

cars <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/acuraMDX_price_vs_mileage.csv')

Before you begin:

What is the response/dependent variable?

What is the explanatory variable

What do you think is the nature of the relationship between the two?

How strong do you think the relationship is?

Plot the Data and calculate r

Does the relationship look linear?

What is the correlation coefficient, r?

What does this r show?

Fit a Linear Regression Model

#lm_output <- lm()

Add the regression line to your chart:

# Sometimes you have to run the whole chunk with plot() and the abline() together:

#plot()
#abline(lm_output$coefficients)

What is the slope of the regression line, and what does it mean?

What is the intercept and what does it mean?

What is your p-value?

What is your conclusion?

What is the confidence interval for the slope?

Interpret the confidence interval.

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):

#plot(lm_output, which = 1)

Lastly, the car you’re interested in buying has around 100,000 miles and costs $11,200. Could this be considered a good deal? Why?

Manatee Deaths and Motorboat Registrations in Florida

Florida is a fabulous place for experiencing wildlife and recreation. Unfortunately, sometimes those two activities conflict.

Researchers collected over 30 years of data about water craft registrations (motor and non-motor boats) and manatee deaths. The goal of the research is to evaluate the relationship between boat registrations and manatee deaths.

Before you begin:

What is the response/dependent variable?

What is the explanatory variable

What do you think is the nature of the relationship between the two?

How strong do you think the relationship is?

Load the data:

manatees <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/manatees.csv')

Plot the Data and calculate r

Does the relationship look linear?

What is the correlation coefficient, r?

What does this r show?

Fit a Linear Regression Model

Add the regression line to your chart:

# Sometimes you have to do the plot() and the abline() in one chunk and run the whole thing:

What is the slope of the regression line, and what does it mean?

What is the intercept and what does it mean?

What is your p-value?

What is your conclusion?

What is the confidence interval for the slope?

Interpret the confidence interval:

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):

MCAT Score and GPA

The MCAT is an entrance exam for medical schools. It seems likely that there is a relationship between your undergraduate GPA and how well you do on the MCAT.

GPA and MCAT score data were collected on 55 prospective medical students.

Before you begin:

What is the response/dependent variable?

What is the explanatory variable

What do you think is the nature of the relationship between the two?

How strong do you think the relationship is?

Load the data:

mcat <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/mcat_gpa.csv')

Plot the Data and calculate r

Does the relationship look linear?

What is the correlation coefficient, r?

What does this r show?

Fit a Linear Regression Model

Add the regression line to your chart:

# Sometimes you have to do the plot() and the abline() in one chunk and run the whole thing:

What is the slope of the regression line, and what does it mean?

What is the intercept and what does it mean?

What is your p-value?

What is your conclusion?

What is the confidence interval for the slope?

Interpret the confidence interval:

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):