Regression Practice

Introduction

In this assignment, you will practice regression analysis including:

  • Plotting bivariate data with a regression line
  • Calculating and interpreting the correlation coefficient, r
  • Fitting a linear regression analysis
  • Verifying if a linear model is model is adequate:
    • Checking for linearity (scatterplot)
    • Checking for constant variance (plot(lm_output, which=1))
    • Checking for normality of residuals (qqPlot(lm_output$residuals))
library(tidyverse)
library(mosaic)
library(rio)
library(car)

Car Prices and Mileage

You are interested in purchasing an all wheel drive Acura MDX for those slick Rexburg winters. You found what you think is a good deal for on a low-mileage 2020 model but you’d like to be sure. You go on Autotrader.com and randomly select 23 Acura MDX’s and collect Price and Mileage information.

Load the data and use R to answer the questions below.

cars <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/acuraMDX_price_vs_mileage.csv')

QUESTION: What is the response/dependent variable?
ANSWER:

QUESTION: What is the explanatory variable?
ANSWER:

QUESTION: What do you think is the nature of the relationship between the two? (Positive or Negative?) ANSWER:

QUESTION: How strong do you think the relationship is?
ANSWER:

Plot the Data and calculate r

Use ggplot() to create a scatter plot with the regression line, then calculate r:

# geom_smooth(method="lm")

QUESTION: Does the relationship look linear?
ANSWER:

QUESTION: What is the correlation coefficient, r?
ANSWER:

QUESTION: What does r show?
ANSWER:

Fit a Linear Regression Model

#lm_output <- lm()

QUESTION: What is the slope of the regression line, and what does it mean?
ANSWER:

QUESTION: What is the intercept and what does it mean?
ANSWER:

QUESTION: What is the p-value?
ANSWER:

QUESTION: State your conclusion in context of the research question:
ANSWER:

QUESTION: What is the confidence interval for the slope?
ANSWER:

QUESTION: Explain the confidence interval in context of the research question:
ANSWER:

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):

#plot(lm_output, which = 1)

QUESTION: Do the test requirements appear to be satisfied? Why?
ANSWER:

Good Price?

Lastly, the car you’re interested in buying has around 100,000 miles and costs $11,200. Could this be considered a good deal? Why?

Manatee Deaths and Motorboat Registrations in Florida

Florida is a fabulous place for experiencing wildlife and recreation. Unfortunately, sometimes those two activities conflict.

Researchers collected over 30 years of data about water craft registrations (motor and non-motor boats) and manatee deaths. The goal of the research is to evaluate the relationship between boat registrations and manatee deaths.

Load the data and use R to answer the questions below.

manatees <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/manatees.csv')

QUESTION: What is the response/dependent variable?
ANSWER:

QUESTION: What is the explanatory variable?
ANSWER:

QUESTION: What do you think is the nature of the relationship between the two? (Positive or Negative?) ANSWER:

QUESTION: How strong do you think the relationship is?
ANSWER:

Plot the Data and calculate r

Use ggplot() to create a scatter plot with the regression line, then calculate r:

# geom_smooth(method="lm")

QUESTION: Does the relationship look linear?
ANSWER:

QUESTION: What is the correlation coefficient, r?
ANSWER:

QUESTION: What does r show?
ANSWER:

Fit a Linear Regression Model

#lm_output <- lm()

QUESTION: What is the slope of the regression line, and what does it mean?
ANSWER:

QUESTION: What is the intercept and what does it mean?
ANSWER:

QUESTION: What is the p-value?
ANSWER:

QUESTION: State your conclusion in context of the research question:
ANSWER:

QUESTION: What is the confidence interval for the slope?
ANSWER:

QUESTION: Explain the confidence interval in context of the research question:
ANSWER:

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):

#plot(lm_output, which = 1)

QUESTION: Do the test requirements appear to be satisfied? Why?
ANSWER:

MCAT Score and GPA

The MCAT is an entrance exam for medical schools. It seems likely that there is a relationship between your undergraduate GPA and how well you do on the MCAT.

GPA and MCAT score data were collected on 55 prospective medical students.

Load the data and respond to the questions below:

mcat <- import('https://github.com/byuistats/Math221D_Cannon/raw/master/Data/mcat_gpa.csv')

QUESTION: What is the response/dependent variable?
ANSWER:

QUESTION: What is the explanatory variable?
ANSWER:

QUESTION: What do you think is the nature of the relationship between the two? (Positive or Negative?) ANSWER:

QUESTION: How strong do you think the relationship is?
ANSWER:

Plot the Data and calculate r

Use ggplot() to create a scatter plot with the regression line, then calculate r:

# geom_smooth(method="lm")

QUESTION: Does the relationship look linear?
ANSWER:

QUESTION: What is the correlation coefficient, r?
ANSWER:

QUESTION: What does r show?
ANSWER:

Fit a Linear Regression Model

QUESTION: What is the slope of the regression line, and what does it mean?
ANSWER:

QUESTION: What is the intercept and what does it mean?
ANSWER:

QUESTION: What is the p-value?
ANSWER:

QUESTION: State your conclusion in context of the research question:
ANSWER:

QUESTION: What is the confidence interval for the slope?
ANSWER:

QUESTION: Explain the confidence interval in context of the research question:
ANSWER:

Check Model Requirements

Check the normality of the residuals:

Check for constant variance (Residual by Predicted plot):

QUESTION: Do the test requirements appear to be satisfied? Why?
ANSWER: