Here are the summaries for each lesson in unit 4. Reviewing these key points from each lesson will help you in your preparation for the exam.
Show/Hide SummariesCreating scatterplots of bivariate data allows us to visualize the data by helping us understand its shape (linear or nonlinear), direction (positive, negative, or neither), and strength (strong, moderate, or weak).
The correlation coefficient (\(r\)) is a number between \(-1\) and \(1\) that tells us the direction and strength of the linear association between two variables. A positive \(r\) corresponds to a positive association while a negative \(r\) corresponds to a negative association. A value of \(r\) closer to \(-1\) or \(1\) indicates a stronger association than a value of \(r\) closer to zero.
In statistics, we write the linear regression equation as \(\hat Y=b_0+b_1X\) where \(b_0\) is the Y-intercept of the line and \(b_1\) is the slope of the line. The values of \(b_0\) and \(b_1\) are calculated using software.
Linear regression allows us to predict values of \(Y\) for a given \(X\). This is done by first calculating the coefficients \(b_0\) and \(b_1\) and then plugging in the desired value of \(X\) and solving for \(Y\).
The independent (or explanatory) variable (\(X\)) is the variable which is
not affected by what happens to the other variable. The
dependent (or response) variable (\(Y\)) is the variable which
is affected by what happens to the other variable. For example,
in the correlation between number of powerboats and number of manatee
deaths, the number of deaths is affected by the number of powerboats in
the water, but not the other way around. So, we would assign \(X\) to represent the number of powerboats
and \(Y\) to represent the number of
manatee deaths.
The unknown true linear regression line is \(Y=\beta_0+\beta_1X\) where \(\beta_0\) is the true y-intercept of the line and \(\beta_1\) is the true slope of the line.
A residual is the difference between the observed value of \(Y\) for a given \(X\) and the predicted value of \(Y\) on the regression line for the same \(X\). It can be expressed as: \[ Residual = Y - \hat Y = Y - (b_0 + b_1 X) \]
To check all the requirements for bivariate inference you will need to create a scatterplot of \(X\) and \(Y\), a residual plot, and a histogram of the residuals.
We conduct a hypothesis test on bivariate data to know if there is a linear relationship between the two variables. To determine this, we test the slope (\(\beta_1\)) on whether or not it equals zero. The appropriate hypotheses for this test are: \[ \begin{array}{1cl} H_0: & \beta_1=0 \\ H_a: & \beta_1\ne0 \end{array} \]
For bivariate inference we use software to calculate the sample
coefficients, residuals, test statistic, \(P\)-value, and confidence intervals of the
true linear regression coefficients.
Copyright © 2020 Brigham Young University-Idaho. All rights reserved.