Logistic Regression

Regression for a qualitative binary response variable $(Y_i = 0$ or $1)$. The explanatory variables can be either quantitative or qualitative.

Simple Logistic Regression Model

Regression for a qualitative binary response variable $(Y_i = 0$ or $1)$ using a single (typically quantitative) explanatory variable.

Overview

The probability that $Y_i = 1$ given the observed value of $x_i$ is called $\pi_i$ and is modeled by the equation

Math Code

$$
  P(Y_i = 1|\, x_i) = \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i
$$

$P($ The “P” stands for “Probability that…” $Y_i$ The response variable. The “i” denotes that this is the y-value for individual “i”, where “i” is 1, 2, 3,… and so on up to $n$, the sample size. $= 1$ Equals 1… This states that we are assuming that the probability that the response variable $Y_i$ is a 1 for the current individual. $| x_i)$ Given $x_i$… in other words, the “|” says “given” and $x_i$ means the x-value of the current individual. $=$ Equals sign. $\displaystyle\frac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}}$ The logistic regression equation where $e=2.71828...$ is the “natural constant” number and $\beta_0$ is the y-intercept and $\beta_1$ is teh slope. $= \pi_i$ The $\pi_i$ stands for the probability of individual $i$ having a y-value equal to 1 given their $x_i$ value. It is the short hand notation for $P(Y_i = 1 |x_i)$. (It is NOT the number 3.14…)

The coefficents $\beta_0$ and $\beta_1$ are difficult to interpret directly. Typicall $e^{\beta_0}$ and $e^{\beta_1}$ are interpreted instead. The value of $e^{\beta_0}$ or $e^{\beta_1}$ denotes the relative change in the odds that $Y_i=1$. The odds that $Y_i=1$ are $\frac{\pi_i}{1-\pi_i}$.

Examples: challenger | mouse

R Instructions

Console Help Command: ?glm()

Perform a Logistic Regression

YourGlmName This is some name you come up with that will become the R object that stores the results of your logistic regression glm() command. <- This is the “left arrow” assignment operator that stores the results of your glm() code into YourGlmName. glm( glm( is an R function that stands for “General Linear Model”. It works in a similar way that the lm( function works except that it requires a family= option to be specified at the end of the command. Y Y is your binary response variable. It must consist of only 0’s and 1’s. Since TRUE’s = 1’s and FALSE’s = 0’s in R, Y could be a logical statement like (Price > 100) or (Animal == “Cat”) if your Y-variable wasn’t currently coded as 0’s and 1’s. ~ The tilde symbol ~ is used to tell R that Y should be treated as a function of the explanatory variable X. X, X is the explanatory variable (typically quantitative) that will be used to explain the probability that the response variable Y is a 1. data = NameOfYourDataset,
NameOfYourDataset is the name of the dataset that contains Y and X. In other words, one column of your dataset would be called Y and another column would be called X. family=binomial) The family=binomial command tells the glm( function to perform a logistic regression. It turns out that glm can perform many different types of regressions, but we only study it as a tool to perform a logistic regression in this course.
summary(YourGlmName) The summary command allows you to print the results of your logistic regression that were previously saved in YourGlmName.

Example output from a regression. Hover each piece to learn more.

Call:
glm(formula = am ~ disp, family = binomial, data = mtcars) This is simply a statement of your original glm(…) “call” that you made when performing your regression. It allows you to verify that you ran what you thought you ran in the glm(…).

Deviance Residuals: Deviance residuals are a measure of how far the fitted probability for $\pi_i$ has differed from the actual outcome of $Y_i$ in terms of the log of the fitted probability space. (This is a fairly complicated idea.)
Min -1.5651 “min” gives the value of the residual that is furthest below the regression line. Ideally, the magnitude of this value would be about equal to the magnitude of the largest positive residual (the max) because the hope is that the residuals are normally distributed around the line.	1Q -0.6648 “1Q” gives the first quartile of the residuals, which will always be negative, and ideally would be about equal in magnitude to the third quartile.	Median -0.2460 “Median” gives the median of the residuals, which would ideally would be about equal to zero. Note that because the regression line is the least squares line, the mean of the residuals will ALWAYS be zero, so it is never included in the output summary. This particular median value of -0.2460 is a little smaller than zero than we would hope for and suggests a right skew in the data because the mean (0) is greater than the median (-0.2460) witnessing the residuals are right skewed. This can also be seen in the maximum being much larger in magnitude than the minimum.	3Q 0.7276 “3Q” gives the third quartile of the residuals, which would ideally would be about equal in magnitude to the first quartile. In this case, it is pretty close, which helps us see that the first quartile of residuals on either side of the line is behaving fairly normally.	Max 2.2691 “Max” gives the maximum positive residuals, which would ideally would be about equal in magnitude to the minimum residual. In this case, it is much larger than the minimum, which helps us see that the residuals are likely right skewed.

Coefficients: Notice that in your glm(…) you used only $Y$ and $X$. You did type out any coefficients, i.e., the $\beta_0$ or $\beta_1$ of the regression model. These coefficients are estimated by the glm(…) function and displayed in this part of the output along with standard errors, t-values, and p-values.
	Estimate To learn more about the “Estimates” of the “Coefficients” see the “Explanation” tab, “Estimating the Model Parameters” section for details.	Std. Error To learn more about the “Standard Errors” of the “Coefficients” see the “Explanation” tab, “Inference for the Model Parameters” section.	z value The test statistic is a regular old z-score. It is most reliable when the sample size is “large.” It is a measurement of the number of standard errors the estimate is from 0.	Pr(>\|z\|) This is the p-value, the probability of observing a test statistic more extreme than Z.
(Intercept) This always says “Intercept” for any glm(…) you run in R. That is because R always assumes there is a y-intercept for your regression function.	2.630849 This is the estimate of the y-intercept, $\beta_0$. It is called $b_0$. It is the value of the log of the odds that $Y_i=1$ when $x_i$ is zero. Remember to use $e^{b_0}$ to interpret this values actual effect on the odds.	1.050170 This is the standard error of $b_0$. It tells you how much $b_0$ varies from sample to sample. The closer to zero, the better.	2.505 The test statistic for testing the hypothesis that $\beta_0 = 0$.	0.01224 This is the p-value of the test of the hypothesis that $\beta_0 = 0$. It measures the probability of observing a z-score as extreme as the one observed. To compute it yourself in R, use `pnorm(-abs(your z-value))*2`.	* This is called a “star”. One star means significant at the 0.1 level of $\alpha$.
disp This is always the name of your X-variable in your glm(Y ~ X, …).	-0.014604 This is the estimate of the slope, $\beta_1$. It is called $b_1$. It is the change in the log of the odds that $Y_i = 1$ as X is increased by 1 unit. Remember to use $e^{b_1}$ to compute the actual effect on the odds.	0.005168 This is the standard error of $b_1$. It tells you how much $b_1$ varies from sample to sample. The closer to zero, the better.	-2.826 This is the test statistic for testing the hypothesis that $\beta_1 = 0$.	0.00471 This is the p-value of the test of the hypothesis that $\beta_1 = 0$. To compute it yourself in R, use `pnorm(-abs(your z-value))*2`	** This is called a “star”. Three stars means significant at the 0.001 level of $\alpha$.

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ’*’ 0.05 ‘.’ 0.1 ‘ ’ 1 These “codes” explain what significance level the p-value is smaller than based on how many “stars” * the p-value is labeled with in the Coefficients table above.

(Dispersion parameter for binomial family taken to be 1) This is a simplifying assumption of the logistic regression. Overdispersion is a common problem with logistic regression data, but is typically ignored. Unless you become an expert in statistics, this is not something you need to worry about.

Null Deviance: The deviance of the null model. This is the model that excludes any information from the x-variable, i.e., $\beta_1=0$.

43.230

on 31 degrees of freedom The residual degrees of freedom. The higher this number, the more reliable the p-values will be from the logistic regression.

Residual deviance: The sum of log of the squared residuals. Essentially the resulting statistic of a goodness of fit test measuring how well the data works with the logistic regression model. Using pchisq(residual deviance, df residual deviance, lower.tail=FALSE) gives the p-value for this goodness of fit test. However, the residual deviance only follows a chi-squared distribution with df residual deviance when there are many repeated x-values, and all x-values have at least a few replicates. 29.732 This can be calculated by sum(log(myglm$res^2)). on 30 degrees of freedom This is $n-p$ where $n$ is the sample size and $p$ is the number of parameters in the regression model. In this case, there is a sample size of 32 and two parameters, $\beta_0$ and $\beta_1$, so 32-2 = 30.

AIC: As stated in the R Help file for ?glm, “A version of Akaike’s An Information Criterion…” The AIC is useful for comparing different models for the same Y-variable. The glm model with the lowest AIC (which can go negative) is the best model.

33.732 The AIC for this particular model is 33.732. So if a different model (using the same Y-variable as this model) can get a lower AIC, it is a better model.

Number of Fisher Scoring iterations: If you have taken a class in Numerical Analysis, this tells you how many iterations of the maximization algorithm were required before converging to the “Estimates” of the parameters $\beta_0$ and $\beta_1$ found in the summary.

5 This implementation of glm required 5 Fisher Scoring iterations to converge. Fewer iterations hints that the model is a better fit than when many iterations are required.

Diagnose the Goodness-of-Fit

There are two ways to check the goodness of fit of a logistic regression model.

Option 1: Hosmer-Lemeshow Goodness-of-Fit Test (Most Common)

To check the goodness of fit of a logistic regression model where there are few or no replicated $x$-values use the Hosmer-Lemeshow Test.

library(ResourceSelection) This loads the ResourceSelection R package so that you can access the hoslem.test() function. You may need to run the code: install.packages(“ResourceSelection”) first.
hoslem.test( This R function performs the Hosmer-Lemeshow Goodness of Fit Test. See the “Explanation” file to learn about this test. YourGlmName YourGlmName is the name of your glm(…) code that you created previously. $y, ALWAYS type a “y” here. This gives you the actual binary (0,1) y-values of your logistic regression. The goodness of fit test will compare these actual values to your predicted probabilities for each value in order to see if the model is a “good fit.” YourGlmName YourGlmName is the name you used to save the results of your glm(…) code. $fitted, ALWAYS type “fitted” here. This gives you the fitted probabilities $\pi_i$ of your logistic regression. g=10) The “g=10” is the default option for the value of g. The g is the number of groups to run the goodness of fit test on. Just leave it at 10 unless you are told to do otherwise. Ask your teacher for more information if you are interested.

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  myglm$y, myglm$fitted
## X-squared = 5.7327, df = 8, p-value = 0.6771

Note that the null hypothesis of the goodness-of-fit test is that “the logistic regression is a good fit.” So we actually don’t want to “reject the null” in this case. So a large p-value here means our logistic regression fits the data satisfactorily. A small p-value implies a poor fit and the results of the logistic regression should not be fully trusted.

Option 2: Deviance Goodness-of-fit Test (Less Common)

In some cases, there are many replicated $x$-values for all x-values, i.e., each value of x is repeated more than 50 times. Though this is rare, it is good to use the deviance goodness-of-fit test whenever this happens.

pchisq( The pchisq command allows you to compute p-values from the chi-squared distribution. residual deviance, The residual deviance is shown at the bottom of the output of your summary(YourGlmName) and should be typed in here as a number like 25.3. df for residual deviance, The df for the residual deviance is also shown at the bottom of the output of your summary(YourGlmName). lower.tail=FALSE) This command ensures you find the probability of the chi-squared distribution being as extreme or more extreme than the observed value of residual deviance.

## 
## Call:
## glm(formula = am ~ disp, family = binomial, data = mtcars)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5651  -0.6648  -0.2460   0.7276   2.2691  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  2.630849   1.050170   2.505  0.01224 * 
## disp        -0.014604   0.005168  -2.826  0.00471 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 29.732  on 30  degrees of freedom
## AIC: 33.732
## 
## Number of Fisher Scoring iterations: 5

## [1] 0.479439

The null hypothesis of the goodness-of-fit test is that the logistic regression is a good fit of the data. So a large p-value (like 0.479) is good because it allows us to trust the results of our logistic regression. When the p-value becomes very small, we must “reject the null” and conclude a poor fit, which implies that we should not trust the results of the logistic regression.

Plot the Regression

plot( The plot function allows us to draw a scatterplot where the y-axis is only 0’s or 1’s and the x-axis is quantitative. Y Y should be some logical statement like Fail > 0 or sex == “B” or height < 68. In other words, Y needs to be a collection of 0’s and 1’s. ~ The tilde is read “Y on X.” X, X is some quantitative variable. data= Tell the plot which data set to use. X and Y are columns of that data set. YourDataSet The name of your data set. ) Closing parenthesis for plot(…) function.
curve( A function in R that draws a “curve” on a plot. Using add=TRUE puts the curve onto the current plot. exp( This allows you to compute e to the power of something. So exp(1) = e, exp(2) = e^2 and so on. $b_0$ + This is the “Intercept” estimate from your logistic regression summary output. $b_1$ This is the slope estimate from your logistic regression summary output. *x The curve function demands you ALWAYS use a lower-case “x” in this function. ) Closing parenthesis. / The division symbol. (1 + Required for the formula. exp($b_0$ + $b_1$*x) This needs to match exactly the first version of this statement. ), Closing parenthesis for the denominator. add = TRUE) This makes the curve be added to the current plot. If it is left out, the curve will be drawn in a new plot.

ggplot( The ggplot(…) function is used to create a basic ggplot frame. data=YourDataSetName, Use this to specify the name of your data set. aes( The aes(…) function stands for “aesthetics” and tells the ggplot which variables to match up with the x-axis and y-axis of the graph as well as other visual things like the type of plotting characters, size of plotting characters, color, fill, and so on. x=X, Use x=nameOfYourXvariable to declare the x-axis of your graph. y=Y Use y=nameOfYourYvariable to declare the y-axis of your graph. If your y-variable is a logical expression, like height>60 then you must use y=as.numeric(height>60). ) Closing parenthesis for aes(…) function. ) Closing parenthesis for ggplot(…) function. + The plus sign adds a new layer to the ggplot.
geom_point( Tells the plot to add the physical geometry of “points” to the plot. ) Closing parenthesis for the geom_point() function. + Add another layer to the plot.
geom_smooth( Add a smoothing line to the graph. method=“glm”, This adds a general linear model to the graph. method.args = list(family=“binomial”), This tells the method=“glm” to choose specifically the “binomial” model, otherwise known as the “logistic regression” model. se=FALSE Turn off the displaying of the confidence band around the logistic regression. You can turn this on if you know what it means. ) Closing parenthesis for the geom_smooth() function. + Add another layer to the ggplot.
theme_bw() Give the graph a basic black and white theme. Other themes are possible, see ?theme_ in your Console.

## `geom_smooth()` using formula = 'y ~ x'

Predict Probabilities

To predict the probability that $Y_i=1$ for a given $x$-value, use the code

predict( The predict() function allows us to use the regression model that was obtained from glm() to predict the probability that $Y_i = 1$ for a given $X_i$. YourGlmName, YourGlmName is the name of the object you created when you performed your logistic regression using glm(). newdata = The newdata = command allows you to specify the x-values for which you want to obtain predicted probabilities that $Y_i=1$. data.frame(xVariableName = someNumericValue), The xVariableName is the same one you used in your glm(y ~ x, …) statement for “x”. Input any desired value for the “someNumericValue” spot. Then, the predict code uses the logistic regression model equation to calculate a predicted probability that $Y_i = 1$ for the given $x_i$ value that you specify. type = “response”) The type = “response” options specifies that you want predicted probabilities. There are other options available. See ?predict.glm for details.

myglm <- glm(am ~ disp, data = mtcars, family = binomial)
predict(myglm, newdata = data.frame(disp = 200), type = "response")

##         1 
## 0.4280016

Explanation

Simple Logistic Regression is used when

the response variable is binary $(Y_i=0$ or $1)$, and
there is a single explanatory variable $X$ that is typically quantitative but could be qualitative (if $X$ is binary or ordinal).

The Model

Since $Y_i$ is binary (can only be 0 or 1) the model focuses on describing the probability that $Y_i=1$ for a given scenario. The probability that $Y_i = 1$ given the observed value of $x_i$ is called $\pi_i$ and is modeled by the equation

\[ P(Y_i = 1|\, x_i) = \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i \]

The assumption is that for certain values of $X$ the probability that $Y_i=1$ is higher than for other values of $X$.

Interpretation

This model for $\pi_i$ comes from modeling the log of the odds that $Y_i=1$ using a linear regression, i.e., \[ \log\underbrace{\left(\frac{\pi_i}{1-\pi_i}\right)}_{\text{Odds for}\ Y_i=1} = \underbrace{\beta_0 + \beta_1 x_i}_{\text{linear regression}} \] Beginning to solve this equation for $\pi_i$ leads to the intermediate, but important result that \[ \underbrace{\frac{\pi_i}{1-\pi_i}}_{\text{Odds for}\ Y_i=1} = e^{\overbrace{\beta_0 + \beta_1 x_i}^{\text{linear regression}}} = e^{\beta_0}e^{\beta_1 x_i} \] Thus, while the coefficients $\beta_0$ and $\beta_1$ are difficult to interpret directly, $e^{\beta_0}$ and $e^{\beta_1}$ have a valuable interpretation. The value of $e^{\beta_0}$ is interpreted as the odds for $Y_i=1$ when $x_i = 0$. It may not be possible for a given model to have $x_i=0$, in which case $e^{\beta_0}$ has no interpretation. The value of $e^{\beta_1}$ denotes the proportional change in the odds that $Y_i=1$ for every one unit increase in $x_i$.

Notice that solving the last equation for $\pi_i$ results in the logistic regression model presented at the beginning of this page.

Hypothesis Testing

Similar to linear regression, the hypothesis that \[ H_0: \beta_1 = 0 \\ H_a: \beta_1 \neq 0 \] can be tested with a logistic regression. If $\beta_1 = 0$, then there is no relationship between $x_i$ and the log of the odds that $Y_i = 1$. In other words, $x_i$ is not useful in predicting the probability that $Y_i = 1$. If $\beta_1 \neq 0$, then there is information in $x_i$ that can be utilized to predict the probability that $Y_i = 1$, i.e., the logistic regression is meaningful.

Checking Model Assumptions

The model assumptions are not as clear in logistic regression as they are in linear regression. For our purposes we will focus only on considering the goodness of fit of the logistic regression model. If the model appears to fit the data well, then it will be assumed to be appropriate.

Deviance Goodness of Fit Test

If there are replicated values of each $x_i$, then the deviance goodness of fit test tests the hypotheses \[ H_0: \pi_i = \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} \] \[ H_a: \pi_i \neq \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} \]

Hosmer-Lemeshow Goodness of Fit Test

If there are very few or no replicated values of each $x_i$, then the Hosmer-Lemeshow goodness of fit test can be used to test these same hypotheses. In each case, the null assumes that logistic regression is a good fit for the data while the alternative is that logistic regression is not a good fit.

Prediction

One of the great uses of Logistic Regression is that it provides an estimate of the probability that $Y_i=1$ for a given value of $x_i$. This probability is often referred to as the risk that $Y_i=1$ for a certain individual. For example, if $Y_i=1$ implies a person has a disease, then $\pi_i=P(Y_i=1)$ represents the risk of individual $i$ having the disease based on their value of $x_i$, perhaps a measure of their cholesterol or some other predictor of the disease.

Multiple Logistic Regression Model

Logistic regression for multiple explanatory variables that can either be quantitative or qualitative or a mixture of the two.

Overview

Select a model to see interpretation details, an example, and R Code help.

\[ P(Y_i = 1|\, x_i) = \frac{e^{\beta_0 + \beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i \]

The Simple Logistic Regression model uses a single x-variable once: $X_i$.

Parameter	Effect
$\beta_0$	Y-intercept of the Model. Only interpreted by computing $e^{\beta_0}$, which gives the “baseline odds” of the model. Technically only meaningful for when $x_i=0$ is a reasonable value.
$\beta_1$	Slope term of the model. Interpreted by computing $e^{\beta_1}$, which gives the multiplicative change in the odds for each 1 unit increase in x. Say, $e^{\beta_1} = 1.2$, then the odds are 1.2 times what they were before whenever x is increased by 1 unit. That’s a twenty percent increase in odds.

\[ P(Y_i = 1| X_i) = \frac{e^{\beta_0 + \beta_1 X_i + \beta_2 X_i^2}}{1 + e^{\beta_0 + \beta_1 X_{i } + \beta_2 X_{i}^2}} \]

The Quadratic Logistic Regression model uses the same $X$-variable twice, once with a $\beta_1 X_i$ slope term and once with a $\beta_2 X_i^2$ quadratic term.

Parameter	Effect
$\beta_0$	Y-intercept of the Model. Interpreted by computing $e^{\beta_0}$, which gives the baseline odds of a success. Only interpreteable when $X_i=0$ is meaningful.
$\beta_1$	Controls the x-position of the center point of the “vertex” of the quadratic logistic model by $\frac{-\beta_1}{2\cdot\beta_2}$. Not directly interpretable, even with $e^{\beta_1}$, as the effect of $\beta_1$ is not independent of $\beta_2$.
$\beta_2$	Controls the concavity and “steepness” of the Model: negative values result in a logistic model with a maximum point on the curve, positive values result in a minimum point on the curve; large values imply “steeper” curves and low values imply “flatter” curves. Also involved in the position of the vertex, see $\beta_1$’s explanation. Not directly interpretable.

An Example

Using the airquality data set, we run the following “quadratic” logistic regression. Pay careful attention to how the mathematical model for $P(Y_i=1 | X_i) = \ldots$ is translated to R-Code inside of glm(...) by using the log(Odds) model instead of the $P(Y_i = 1 | X_i)$ model.

\[ \underbrace{\log\left(\overbrace{\frac{\pi_i}{1-\pi_i}}^{\text{Odds}\ Y_i = 1}\right)}_\text{Temp>80} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\text{y-int}} + \overbrace{\beta_1}^{\stackrel{\text{slope}}{\text{term}}} \underbrace{X_{i}}_\text{Month} \underbrace{+}_{+} \overbrace{\beta_2}^{\stackrel{\text{quadratic}}{\text{term}}} \underbrace{X_{i}^2}_\text{I(Month^2)} \] Then, by using the statement in glm(..., family=binomial) the model that is put into the glm is translated back to the logistic model for you. Note that whether we look at the log(Odds) model or the $P(Y_i=1)$ model, the values of each $\beta$ are the same.

glm.quad <- A name we made up for our “quadratic” logistic regression. glm( R function glm used to perform generalized linear regressions, among which the logistic regression model is a specific example. But glm can do much more. Temp > 75 Y-variable, should be 0’s and 1’s. In this case the Temp > 75 statement is translated into TRUE (1) and FALSE (0) values. It may be preferred to use a mutate(y = ifelse(Temp > 75, 1, 0)) statement prior to running the glm to create a column of 0’s and 1’s. ~ The tilde ~ is what glm(…) uses to state the regression equation $Y_i = ...$ for the log of the odds linear equation. Notice that the ~ is not followed by $\beta_0 + \beta_1$ like $Y_i = ...$. Instead, $X_{i}$ (Month in this case) is the first term following ~. This is because the $\beta$’s are going to be estimated by the glm(…). These “Estimates” can be found using summary(glmObject) and looking at the Estimates column in the output. Month $X_{i}$, should be quantitative. + The plus + is used between each term in the model. Note that only the x-variables are included in the glm(…) from the $\text{log of the odds} = ...$ model. No beta’s are included. I(Month^2) $X_{i}^2$, where the function I(…) protects the squaring of Month from how glm(…) would otherwise interpret that statement. The I(…) function must be used anytime you raise an x-variable to a power in the glm(…) statement. , data=airquality This is the data set we are using for the regression. , family=binomial This declares that a logistic regression will be performed instead of a linear regression. )
Closing parenthsis for the glm(…) function.
Press Enter to run the code. … Click to View Output.

glm.quad <- glm(Temp > 75 ~ Month + I(Month^2), data=airquality, family=binomial)
emphasize.strong.cols(1)
pander(summary(glm.quad)$coefficients, )

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-47.99	7.733	-6.206	5.442e-10
Month	13.87	2.23	6.22	4.978e-10
I(Month^2)	-0.9452	0.1543	-6.124	9.13e-10

The estimates shown in the summary output table above approximate the $\beta$’s in the log of the odds logistic regression model:

$\beta_0$ is estimated by the (Intercept) value of -47.99,
$\beta_1$ is estimated by the Month value of 13.87, and
$\beta_2$ is estimated by the I(Month^2) value of -0.9452.

Because the estimate of the $\beta_2$ term is negative (-0.9452), this parabola will “open down” (concave). This tells us that probabilitiy of the high temperature exceeding 75 degrees F will increase to a point, then decrease again. The vertex of this parabola will be at $-b_1/(2b_2) = -(13.87)/(2\cdot (-0.9452)) = 7.337072$ months, which tells us that the highest average temperature will occur around mid July (7.34 months to be exact). The y-intercept is -47.99, which would be an odds of nearly zero, $e^{-47.99}\approx 1.44e-21$, that the temperature would exceed 75 degrees F if it were possible for the month to be “month zero.” Since this is not possible, the y-intercept is not meaningful for this model.

Note that interpreting either $\beta_1$ or $\beta_2$ by themselves is quite difficult because they both work with together with $X_{i}$.

\[ \log\left(\overbrace{\frac{\pi_i}{1-\pi_i}}^\text{odds of Temp >75}\right) \approx \overbrace{-47.99}^\text{y-int} + \overbrace{13.87}^{\stackrel{\text{slope}}{\text{term}}} X_{i} + \overbrace{-0.9452}^{\stackrel{\text{quadratic}}{\text{term}}} X_{i}^2 \]

The regression function is drawn as follows. Be sure to look at the “Code” to understand how this graph was created using the ideas in the equation above.

Using Base R

plot(Temp>75 ~ Month, data=airquality, col="skyblue", pch=21, bg="gray83", main="Quadratic Model using airquality data set", cex.main=1)

#get the "Estimates" automatically:
b <- coef(glm.quad)
# Then b will have 3 numbers stored inside:
# b[1] is the estimate of beta_0: -47.99
# b[2] is the estimate of beta_1: 13.87
# b[3] is the estimate of beta_2: -0.9452
curve(exp(b[1] + b[2]*x + b[3]*x^2)/(1+exp(b[1] + b[2]*x + b[3]*x^2)), col="skyblue", lwd=2, add=TRUE)

Using ggplot2

#get the "Estimates" automatically:
b <- coef(glm.quad)
# Then b will have 3 estimates:
# b[1] is the estimate of beta_0: -47.99
# b[2] is the estimate of beta_1: 13.87
# b[3] is the estimate of beta_2: -0.9452

ggplot(airquality, aes(y=ifelse(Temp>75,1,0), x=Month)) +
  geom_point(pch=21, bg="gray83", color="skyblue", alpha=0.1, size=6) +
  #geom_smooth(method="lm", se=F, formula = y ~ poly(x, 2)) + #easy way, but the more involved manual way using stat_function (see below) is more dynamic.
  stat_function(fun = function(x) exp(b[1] + b[2]*x + b[3]*x^2)/(1+exp(b[1] + b[2]*x + b[3]*x^2)), color="skyblue") +
  labs(title="Quadratic Model using airquality data set")

\[ Y_i = \overbrace{\underbrace{\beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \beta_3 X_i^3}_{E\{Y_i\}}}^\text{Cubic Model} + \epsilon_i \]

The Cubic model uses the same $X$-variable thrice, once with a $\beta_1 X_i$ term, once with a $\beta_2 X_i^2$ term, and once with a $\beta_3 X_i^3$ term. The $X_i^3$ term is called the “cubic” term.

Parameter	Effect
$\beta_0$	Y-intercept of the Model.
$\beta_1$	No clear interpretation, but could be called the “base slope coefficient” and contributes to the position of the inflection points of the cubic function.
$\beta_2$	No clear interpretation, but it also contributes to the location of the inflection points.
$\beta_3$	This is the coefficient of the cubic term. No clear interpretation, but it determines the concavity of the model by its sign.

An Example

Using the CO2 data set, we run the following “cubic” regression.

\[ \underbrace{Y_i}_\text{uptake} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\text{y-int}} + \overbrace{\beta_1}^{\stackrel{\text{slope}}{\text{term}}} \underbrace{X_{i}}_\text{conc} + \overbrace{\beta_2}^{\stackrel{\text{quadratic}}{\text{term}}} \underbrace{X_{i}^2}_\text{I(conc^2)} + \overbrace{\beta_3}^{\stackrel{\text{cubic}}{\text{term}}} \underbrace{X_{i}^3}_\text{I(conc^3)} + \epsilon_i \]

lm.cubic <- A name we made up for our “cubic” regression. lm( R function lm used to perform linear regressions in R. The lm stands for “linear model”. uptake Y-variable, should be quantitative. ~ The tilde ~ is what lm(…) uses to state the regression equation $Y_i = ...$. Notice that the ~ is not followed by $\beta_0 + \beta_1$ like $Y_i = ...$. Instead, $X_i$ is the first term following ~. This is because the $\beta$’s are going to be estimated by the lm(…). These estimates can be found using summary(lmObject). conc $X_{i}$, should be quantitative. + The plus + is used between each term in the model. Note that only the x-variables are included in the lm(…) from the $Y_i = ...$ model. No beta’s are included. I(conc^2) $X_{i}^2$, where the function I(…) protects the squaring of conc from how lm(…) would otherwise interpret that statement. The I(…) function must be used anytime you raise an x-variable to a power in the lm(…) statement. + The plus + is used between each term in the model. Note that only the x-variables are included in the lm(…) from the $Y_i = ...$ model. No beta’s are included. I(conc^3) $X_{i}^3$, where the function I(…) protects the cubing of conc from how lm(…) would otherwise interpret that statement. The I(…) function must be used anytime you raise an x-variable to a power in the lm(…) statement. , data=CO2 This is the data set we are using for the regression. )
Closing parenthsis for the lm(…) function.
Press Enter to run the code. … Click to View Output.

lm.cubic <- lm(uptake ~ conc + I(conc^2) + I(conc^3), data=CO2)
pander(summary(lm.cubic)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-1.483	5.043	-0.2941	0.7694
conc	0.1814	0.0416	4.36	3.83e-05
I(conc^2)	-0.0003063	9.067e-05	-3.378	0.00113
I(conc^3)	1.601e-07	5.512e-08	2.905	0.004745

The estimates shown above approximate the $\beta$’s in the regression model: $\beta_0$ is estimated by the (Intercept) value of -1.483, $\beta_1$ is estimated by the conc value of 0.1814, $\beta_2$ is estimated by the I(conc^2) value of -0.0003063, and $\beta_3$ is estimated by the I(conc^3) value of 1.601e-07, which translates to 0.0000001601.

Because the estimate of the $\beta_3$ term is positive, this cubic model will “open up”. In other words, as the function moves from left to right, it will go off to positive infinity (up). If the term would have been negative, then the function would head to negative infinity (down) instead.

\[ \hat{Y}_i = \overbrace{-1.483}^\text{y-int} + \overbrace{0.1814}^{\stackrel{\text{slope}}{\text{term}}} X_{i} + \overbrace{-0.0003063}^{\stackrel{\text{quadratic}}{\text{term}}} X_{i}^2 + \overbrace{1.601e-07}^{\stackrel{\text{cubic}}{\text{term}}} X_{i}^3 \]

The regression function is drawn as follows. Be sure to look at the “Code” to understand how this graph was created using the ideas in the equation above.

Using Base R

plot(uptake ~ conc, data=CO2, col="skyblue", pch=21, bg="gray83", main="Cubic Model using CO2 data set", cex.main=1)

#get the "Estimates" automatically:
b <- coef(lm.cubic)
# Then b will have 4 estimates:
# b[1] is the estimate of beta_0: -1.483
# b[2] is the estimate of beta_1: 0.1814
# b[3] is the estimate of beta_2: -0.0003063
# b[4] is the estimate of beta_3: 1.601e-07
curve(b[1] + b[2]*x + b[3]*x^2 + b[4]*x^3, col="skyblue", lwd=2, add=TRUE)

Using ggplot2

#get the "Estimates" automatically:
b <- coef(lm.cubic)
# Then b will have 4 estimates:
# b[1] is the estimate of beta_0: -1.483
# b[2] is the estimate of beta_1: 0.1814
# b[3] is the estimate of beta_2: -0.0003063
# b[4] is the estimate of beta_3: 1.601e-07

ggplot(CO2, aes(y=uptake, x=conc)) +
  geom_point(pch=21, bg="gray83", color="skyblue") +
  #geom_smooth(method="lm", se=F, formula = y ~ poly(x, 3)) + #easy way, but the more involved manual way using stat_function (see below) is more dynamic.
  stat_function(fun = function(x) b[1] + b[2]*x + b[3]*x^2 + b[4]*x^3, color="skyblue") +
  labs(title="Cubic Model using CO2 data set")

It should be stated, that the cubic function is not the best fit for this data. However, it is a lot better than just a simple line, or a quadratic model, as shown below.

plot(uptake ~ conc, data=CO2, col="skyblue", pch=21, bg="gray83", main="Cubic Model using CO2 data set", cex.main=1)

#get the "Estimates" automatically:
b <- coef(lm.cubic)
# Then b will have 4 estimates:
# b[1] is the estimate of beta_0: -1.483
# b[2] is the estimate of beta_1: 0.1814
# b[3] is the estimate of beta_2: -0.0003063
# b[4] is the estimate of beta_3: 1.601e-07
curve(b[1] + b[2]*x + b[3]*x^2 + b[4]*x^3, col="skyblue", lwd=2, add=TRUE)
b <- coef(lm(uptake ~ conc + I(conc^2), data=CO2))
curve(b[1] + b[2]*x + b[3]*x^2, col="firebrick", lwd=2, add=TRUE)
b <- coef(lm(uptake ~ conc, data=CO2))
curve(b[1] + b[2]*x, col="orange", lwd=2, add=TRUE)

\[ Y_i = \overbrace{\underbrace{\beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i} X_{2i}}_{E\{Y_i\}}}^\text{Two-lines Model} + \epsilon_i \]

\[ X_{2i} = \left\{\begin{array}{ll} 1, & \text{Group B} \\ 0, & \text{Group A} \end{array}\right. \]

The so called “two-lines” model uses a quantitative $X_{1i}$ variable and a 0,1 indicator variable $X_{2i}$. It is a basic example of how a “dummy variable” or “indicator variable” can be used to turn qualitative variables into quantitative terms. In this case, the indicator variable $X_{2i}$, which is either 0 or 1, produces two separate lines: one line for Group A, and one line for Group B.

Parameter	Effect
$\beta_0$	Y-intercept of the Model.
$\beta_1$	Controls the slope of the “base-line” of the model, the “Group 0” line.
$\beta_2$	Controls the change in y-intercept for the second line in the model as compared to the y-intercept of the “base-line” line.
$\beta_3$	Called the “interaction” term. Controls the change in the slope for the second line in the model as compared to the slope of the “base-line” line.

An Example

Using the mtcars data set, we run the following “two-lines” regression. Note that am has only 0 or 1 values: View(mtcars).

\[ \underbrace{Y_i}_\text{mpg} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{baseline}}} + \overbrace{\beta_1}^{\stackrel{\text{slope}}{\text{baseline}}} \underbrace{X_{1i}}_\text{qsec} + \overbrace{\beta_2}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{2i}}_\text{am} + \overbrace{\beta_3}^{\stackrel{\text{change in}}{\text{slope}}} \underbrace{X_{1i}X_{2i}}_\text{qsec:am} + \epsilon_i \]

lm.2lines <- A name we made up for our “two-lines” regression. lm( R function lm used to perform linear regressions in R. The lm stands for “linear model”. mpg Y-variable, should be quantitative. ~ The tilde ~ is what lm(…) uses to state the regression equation $Y_i = ...$. Notice that the ~ is not followed by $\beta_0 + \beta_1$ like $Y_i = ...$. Instead, $X_{1i}$ is the first term following ~. This is because $\beta$’s are going to be estimated by the lm(…). These estimates can be found using summary(lmObject). qsec $X_{1i}$, should be quantitative. + The plus + is used between each term in the model. Note that only the x-variables are included in the lm(…) from the $Y_i = ...$ model. No beta’s are included. am $X_{2i}$, an indicator or 0,1 variable. This term allows the y-intercept of the two lines to differ. + The plus + is used between each term in the model. Note that only the x-variables are included in the lm(…) from the $Y_i = ...$ model. No beta’s are included. qsec:am $X_{1i}X_{2i}$ the interaction term. This allows the slopes of the two lines to differ. , data=mtcars This is the data set we are using for the regression. )
Closing parenthsis for the lm(…) function.
Press Enter to run the code. … Click to View Output.

lm.2lines <- lm(mpg ~ qsec + am + qsec:am, data=mtcars)
pander(summary(lm.2lines)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-9.01	8.218	-1.096	0.2823
qsec	1.439	0.45	3.197	0.003432
am	-14.51	12.48	-1.163	0.2548
qsec:am	1.321	0.7017	1.883	0.07012

The estimates shown above approximate the $\beta$’s in the regression model: $\beta_0$ is estimated by the (Intercept), $\beta_1$ is estimated by the qsec value of 1.439, $\beta_2$ is estimated by the am value of -14.51, and $\beta_3$ is estimated by the qsec:am value of 1.321.

This gives two separate equations of lines.

Automatic Transmission (am==0, $X_{2i} = 0$) Line

\[ \hat{Y}_i = \overbrace{-9.01}^{\stackrel{\text{y-int}}{\text{baseline}}} + \overbrace{1.439}^{\stackrel{\text{slope}}{\text{baseline}}} X_{1i} \]

Manual Transmission (am==1 , $X_{2i} = 1$) Line

\[ \hat{Y}_i = \underbrace{(\overbrace{-9.01}^{\stackrel{\text{y-int}}{\text{baseline}}} + \overbrace{-14.51}^{\stackrel{\text{change in}}{\text{y-int}}})}_{\stackrel{\text{y-intercept}}{-23.52}} + \underbrace{(\overbrace{1.439}^{\stackrel{\text{slope}}{\text{baseline}}} +\overbrace{1.321}^{\stackrel{\text{change in}}{\text{slope}}})}_{\stackrel{\text{slope}}{2.76}} X_{1i} \]

These lines are drawn as follows. Be sure to look at the “Code” to understand how this graph was created using the ideas in the two equations above.

Using Base R

plot(mpg ~ qsec, data=mtcars, col=c("skyblue","orange")[as.factor(am)], pch=21, bg="gray83", main="Two-lines Model using mtcars data set", cex.main=1)

legend("topleft", legend=c("Baseline (am==0)", "Changed-line (am==1)"), bty="n", lty=1, col=c("skyblue","orange"), cex=0.8)

#get the "Estimates" automatically:
b <- coef(lm.2lines)
# Then b will have 4 estimates:
# b[1] is the estimate of beta_0: -9.0099
# b[2] is the estimate of beta_1:  1.4385
# b[3] is the estimate of beta_2: -14.5107
# b[4] is the estimate of beta_3: 1.3214
curve(b[1] + b[2]*x, col="skyblue", lwd=2, add=TRUE)  #baseline (in blue)
curve((b[1] + b[3]) + (b[2] + b[4])*x, col="orange", lwd=2, add=TRUE) #changed line (in orange)

Using ggplot2

#get the "Estimates" automatically:
b <- coef(lm.2lines)
# Then b will have 4 estimates:
# b[1] is the estimate of beta_0: -9.0099
# b[2] is the estimate of beta_1:  1.4385
# b[3] is the estimate of beta_2: -14.5107
# b[4] is the estimate of beta_3: 1.3214

ggplot(mtcars, aes(y=mpg, x=qsec, color=factor(am))) +
  geom_point(pch=21, bg="gray83") +
  #geom_smooth(method="lm", se=F) + #easy way, but only draws the full interaction model. The manual way using stat_function (see below) is more involved, but more dynamic.
  stat_function(fun = function(x) b[1] + b[2]*x, color="skyblue") + #am==0 line
  stat_function(fun = function(x) (b[1]+b[3]) + (b[2]+b[4])*x,color="orange") + #am==1 line 
  scale_color_manual(name="Transmission (am)", values=c("skyblue","orange")) +
  labs(title="Two-lines Model using mtcars data set")

\[ Y_i = \overbrace{\underbrace{\beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i}X_{2i}}_{E\{Y_i\}}}^\text{3D Model} + \epsilon_i \]

The so called “3D” regression model uses two different quantitative x-variables, an $X_{1i}$ and an $X_{2i}$. Unlike the two-lines model where $X_{2i}$ could only be a 0 or a 1, this $X_{2i}$ variable is quantitative, and can take on any quantitative value.

Parameter	Effect
$\beta_0$	Y-intercept of the Model
$\beta_1$	Slope of the line in the $X_1$ direction.
$\beta_2$	Slope of the line in the $X_2$ direction.
$\beta_3$	Interaction term that allows the model, which is a plane in three-dimensional space, to “bend”. If this term is zero, then the regression surface is just a flat plane.

An Example

Here is what a 3D regression looks like when there is no interaction term. The two x-variables of Month and Temp are being used to predict the y-variable of Ozone.

\[ \underbrace{Y_i}_\text{Ozone} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{baseline}}} + \overbrace{\beta_1}^{\stackrel{\text{slope}}{\text{baseline}}} \underbrace{X_{1i}}_\text{Temp} + \overbrace{\beta_2}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{2i}}_\text{Month} + \epsilon_i \]

air_lm <- lm(Ozone ~ Temp + Month, data= airquality)
pander(air_lm$coefficients)

(Intercept)	Temp	Month
-139.6	2.659	-3.522

Notice how the slope, $\beta_1$, in the “Temp” direction is estimated to be 2.659 and the slope in the “Month” direction, $\beta_2$, is estimated to be -3.522. Also, the y-intercept, $\beta_0$, is estimated to be -139.6.

## Hint: library(car) has a scatterplot 3d function which is simple to use
#  but the code should only be run in your console, not knit.

## library(car)
## scatter3d(Y ~ X1 + X2, data=yourdata)



## To embed the 3d-scatterplot inside of your html document is harder.
#library(plotly)
#library(reshape2)

#Perform the multiple regression
air_lm <- lm(Ozone ~ Temp + Month, data= airquality)

#Graph Resolution (more important for more complex shapes)
graph_reso <- 0.5

#Setup Axis
axis_x <- seq(min(airquality$Temp), max(airquality$Temp), by = graph_reso)
axis_y <- seq(min(airquality$Month), max(airquality$Month), by = graph_reso)

#Sample points
air_surface <- expand.grid(Temp = axis_x, Month = axis_y, KEEP.OUT.ATTRS=F)
air_surface$Z <- predict.lm(air_lm, newdata = air_surface)
air_surface <- acast(air_surface, Month ~ Temp, value.var = "Z") #y ~ x

#Create scatterplot
plot_ly(airquality, 
        x = ~Temp, 
        y = ~Month, 
        z = ~Ozone,
        text = rownames(airquality), 
        type = "scatter3d", 
        mode = "markers") %>%
  add_trace(z = air_surface,
            x = axis_x,
            y = axis_y,
            type = "surface")

Here is a second view of this same regression with what is called a contour plot, contour map, or density plot.

mycolorpalette <- colorRampPalette(c("skyblue2", "orange"))
filled.contour(x=axis_x, y=axis_y, z=matrix(air_surface$Z, length(axis_x), length(axis_y)), col=mycolorpalette(26))

Including the Interaction Term

Here is what a 3D regression looks like when the interaction term is present. The two x-variables of Month and Temp are being used to predict the y-variable of Ozone.

air_lm <- lm(Ozone ~ Temp + Month + Temp:Month, data= airquality)
pander(air_lm$coefficients)

(Intercept)	Temp	Month	Temp:Month
-3.915	0.77	-23.01	0.2678

Notice how all coefficient estimates have changed. The y-intercept, $\beta_0$ is now estimated to be $-3.915$. The slope term, $\beta_1$, in the Temp-direction is estimated as $0.77$, while the slope term, $\beta_2$, in the Month-direction is estimated to be $-23.01$. This change in estimated coefficiets is due to the presence of the interaction term’s coefficient, $\beta_3$, which is estimated to be $0.2678$. As you should notice in the graphic, the interaction model allows the “slopes” in each direction to change, creating a “curved” surface for the regression surface instead of a flat surface.

#Perform the multiple regression
air_lm <- lm(Ozone ~ Temp + Month + Temp:Month, data= airquality)

#Graph Resolution (more important for more complex shapes)
graph_reso <- 0.5

#Setup Axis
axis_x <- seq(min(airquality$Temp), max(airquality$Temp), by = graph_reso)
axis_y <- seq(min(airquality$Month), max(airquality$Month), by = graph_reso)

#Sample points
air_surface <- expand.grid(Temp = axis_x, Month = axis_y, KEEP.OUT.ATTRS=F)
air_surface <- air_surface %>% mutate(Z=predict.lm(air_lm, newdata = air_surface))
air_surface <- acast(air_surface, Month ~ Temp, value.var = "Z") #y ~ x

#Create scatterplot
plot_ly(airquality, 
        x = ~Temp, 
        y = ~Month, 
        z = ~Ozone,
        text = rownames(airquality), 
        type = "scatter3d", 
        mode = "markers") %>%
  add_trace(z = air_surface,
            x = axis_x,
            y = axis_y,
            type = "surface")

And here is that same plot as a contour plot.

air_surface <- expand.grid(Temp = axis_x, Month = axis_y, KEEP.OUT.ATTRS=F)
air_surface$Z <- predict.lm(air_lm, newdata = air_surface)
mycolorpalette <- colorRampPalette(c("skyblue2", "orange"))
filled.contour(x=axis_x, y=axis_y, z=matrix(air_surface$Z, length(axis_x), length(axis_y)), col=mycolorpalette(27))

\[ Y_i = \overbrace{\underbrace{\beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots + \beta_{p-1}X_{p-1,i}}_{E\{Y_i\}}}^\text{"High Dimensional Models"} + \epsilon_i \]

The so called “HD”, or “High Dimensional”, regression model uses three or more different quantitative x-variables, an $X_{1i}$, an $X_{2i}$, and at least an $X_{3i}$, but could use many, many other variables as well. Unlike the 3D model where the final regression could be shown as either a contour plot or a 3D-graphic, the high dimensional model exists in 4 or more dimensions. Thus, it is impossible to graph this model in its full form. Further, it isn’t really even possible to “mentally connect” with this type of model is it exists beyond what our 3D minds can really comprehend.

Parameter	Effect
$\beta_0$	Y-intercept of the Model
$\beta_1$	Slope of the line in the $X_1$ direction.
$\beta_2$	Slope of the line in the $X_2$ direction.
$...$	Slopes in other directions depending on how many other variables are included in the model.
$\beta_{p-1}$	Final term in the model where there are $p$ total $\beta$’s. The reason for the $p-1$ on the last term is because we started with $\beta_0$ for the first term, leaving $\beta_{p-1}$ as the last term.

An Example

Suppose we used three x-variables of Wind, Temp, and Solar.R to predict the y-variable of Ozone.

\[ \underbrace{Y_i}_\text{Ozone} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{baseline}}} + \overbrace{\beta_1}^{\stackrel{\text{slope in}}{\text{Wind Direction}}} \underbrace{X_{1i}}_\text{Wind} + \overbrace{\beta_2}^{\stackrel{\text{slope in}}{\text{Temp Direction}}} \underbrace{X_{2i}}_\text{Temp} + \overbrace{\beta_3}^{\stackrel{\text{slope in}}{\text{Solar.R Direction}}} \underbrace{X_{3i}}_\text{Solar.R} + \epsilon_i \]

air_lm <- lm(Ozone ~ Wind + Temp + Solar.R, data= airquality)
pander(air_lm$coefficients)

(Intercept)	Wind	Temp	Solar.R
-64.34	-3.334	1.652	0.05982

Notice how the slope, $\beta_1$, in the “Wind” direction is estimated to be -3.334. The slope in the “Temp” direction, $\beta_2$, is estimated to be 1.652. The slope in the “Solar.R” direction, $\beta_3$, is estimated to be 0.05982. Also, the y-intercept, $\beta_0$, is estimated to be -64.34.

Visualizing this model is not really possible in its full form. However, we can draw the regression from three different angles or vantage points. This is a limited view of the full regression model, but at least provides some visual understanding. To do this, we draw $Y$ against each $X$-variable in separate scatterplots, one for each $X$-variable used in our model.

b <- coef(air_lm)

par(mfrow=c(1,3))

  plot(Ozone ~ Wind, data=airquality)
  curve(b[1] + b[2]*x + b[3]*79 + b[4]*205, add=TRUE, col="skyblue")
  # The x-variable of this plot is "Wind"
  # The values of Temp=79 and Solar.R=205 are fixed at some interesting value,
  # in this case, their respective medians.

  plot(Ozone ~ Temp, data=airquality)
  curve(b[1] + b[2]*9.7 + b[3]*x + b[4]*205, add=TRUE, col="orange")
  # The x-variable of this plot is "Temp"
  # The values of Wind=9.7 and Solar.R=205 are fixed at some interesting value,
  # in this case, their respective medians.
  
  plot(Ozone ~ Solar.R, data=airquality)
  curve(b[1] + b[2]*9.7 + b[3]*79 + b[4]*x, add=TRUE, col="firebrick")

  # The x-variable of this plot is "Solar.R"
  # The values of Wind = 9.7 and Temp=79 are fixed at some interesting value,
  # in this case, their respective medians.

The coefficient $\beta_j$ is interpreted as the change in the expected value of $Y$ for a unit increase in $X_{j}$, holding all other variables constant, for $j=1,\ldots,p-1$. However, this interpretation breaks down when higher order terms (like $X^2$) or interaction terms (like $X1:X2$) are included in the model.

See the Explanation tab for details about possible hypotheses here.

The probability that $Y_i = 1$ given the observed data $(x_{i1},\ldots,x_{ip})$ is called $\pi_i$ and is modeled by the equation

\[ P(Y_i = 1|\, x_{i1},\ldots,x_{ip}) = \frac{e^{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}}}{1+e^{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} }} = \pi_i \]

The coefficents $\beta_0,\beta_1,\ldots,\beta_p$ are difficult to interpret directly. Typically $e^{\beta_k}$ for $k=0,1,\ldots,p$ is interpreted instead. The value of $e^{\beta_k}$ denotes the relative change in the odds that $Y_i=1$. The odds that $Y_i=1$ are $\frac{\pi_i}{1-\pi_i}$.

Examples: GSS

R Instructions

Console Help Command: ?glm()

Perform the Logistic Regression

To perform a logistic regression in R use the commands

YourGlmName This is some name you come up with that will become the R object that stores the results of your logistic regression glm() command. <- This is the “left arrow” assignment operator that stores the results of your glm() code into YourGlmName. glm( glm( is an R function that stands for “General Linear Model”. It works in a similar way that the lm( function works except that it requires a family= option to be specified at the end of the command. Y Y is your binary response variable. It must consist of only 0’s and 1’s. Since TRUE’s = 1’s and FALSE’s = 0’s in R, Y could be a logical statement like (Price > 100) or (Animal == “Cat”) if your Y-variable wasn’t currently coded as 0’s and 1’s. ~ The tilde symbol ~ is used to tell R that Y should be treated as a function of the explanatory variable X. X1 X1 is the first explanatory variable (typically quantitative) that will be used to explain the probability that the response variable Y is a 1. * The times symbol allows a shortcut for writing X1 + X2 + X1:X2 = X1*X2. X2 X2 is second the explanatory variable either quantitative or qualitative that will be used to explain the probability that the response variable Y is a 1. …, In theory, you could have many other explanatory variables, interaction terms, or even squared, cubed, or other transformations of terms added to this model. data = NameOfYourDataset,
NameOfYourDataset is the name of the dataset that contains Y and X. In other words, one column of your dataset would be called Y and another column would be called X. family=binomial) The family=binomial command tells the glm( function to perform a logistic regression. It turns out that glm can perform many different types of regressions, but we only study it as a tool to perform a logistic regression in this course.
summary(YourGlmName) The summary command allows you to print the results of your logistic regression that were previously saved in YourGlmName.

Example output from a regression. Hover each piece to learn more.

Call:
glm(formula = weight > 100 ~ Time*Diet, family = binomial, data = ChickWeight) This is simply a statement of your original glm(…) “call” that you made when performing your regression. It allows you to verify that you ran what you thought you ran in the glm(…).

Deviance Residuals: Deviance residuals are a measure of how far the fitted probability for $\pi_i$ has differed from the actual outcome of $Y_i$ in terms of the log of the fitted probability space. (This is a fairly complicated idea.)
Min -2.9446 “min” gives the value of the residual that is furthest below the regression line. Ideally, the magnitude of this value would be about equal to the magnitude of the largest positive residual (the max) because the hope is that the residuals are normally distributed around the line.	1Q -0.2364 “1Q” gives the first quartile of the residuals, which will always be negative, and ideally would be about equal in magnitude to the third quartile.	Median 0.0000 “Median” gives the median of the residuals, which would ideally would be about equal to zero. Note that because the regression line is the least squares line, the mean of the residuals will ALWAYS be zero, so it is never included in the output summary.	3Q 0.2776 “3Q” gives the third quartile of the residuals, which would ideally would be about equal in magnitude to the first quartile. In this case, it is pretty close, which helps us see that the first quartile of residuals on either side of the line is behaving fairly normally.	Max 1.9968 “Max” gives the maximum positive residuals, which would ideally would be about equal in magnitude to the minimum residual. In this case, it is much larger than the minimum, which helps us see that the residuals are likely right skewed.

Coefficients: Notice that in your glm(…) you used only $Y$ and $X$. You did type out any coefficients, i.e., the $\beta_0$ or $\beta_1$ of the regression model. These coefficients are estimated by the glm(…) function and displayed in this part of the output along with standard errors, t-values, and p-values.
	Estimate To learn more about the “Estimates” of the “Coefficients” see the “Explanation” tab, “Estimating the Model Parameters” section for details.	Std. Error To learn more about the “Standard Errors” of the “Coefficients” see the “Explanation” tab, “Inference for the Model Parameters” section.	z value The Z-score testing the hypothesis that $\beta_j=0$	Pr(>\|z\|) The p-value for the corresponding Z-score.
(Intercept) This always says “Intercept” for any glm(…) you run in R. That is because R always assumes there is a y-intercept for your regression function.	-4.97603 This is the estimate of the y-intercept, $\beta_0$. It is called $b_0$. It is the average y-value when X is zero.	0.65316 This is the standard error of $b_0$. It tells you how much $b_0$ varies from sample to sample. The closer to zero, the better.	-7.618 Z-score for $b_0$.	2.57e-14 This is the p-value of the test of the hypothesis that $\beta_0 = 0$. It measures the probability of observing a Z-value as extreme as the one observed. To compute it yourself in R, use `pnorm(-abs(your z-value))*2`.	*** This is called a “star”. Three stars means significant at the 0.01 level of $\alpha$.
Time This is always the name of your X-variable in your glm(Y ~ X, …).	0.39111 This is the estimate of the slope, $\beta_1$. It is called $b_1$. It is the change in the average y-value as X is increased by 1 unit.	0.04979 This is the standard error of $b_1$. It tells you how much $b_1$ varies from sample to sample. The closer to zero, the better.	7.854 Z-score for $b_1$.	4.02e-15 This is the p-value of the test of the hypothesis that $\beta_1 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`	*** This is called a “star”. Three stars means significant at the 0.01 level of $\alpha$.
Diet2 This is always the name of your X-variable in your glm(Y ~ X, …).	0.58283 This is the estimate of the change in the intercept, $\beta_2$. It is called $b_2$.	1.04061 This is the standard error of $b_2$. It tells you how much $b_2$ varies from sample to sample. The closer to zero, the better.	0.560 Z-score for $b_2$.	0.5754 This is the p-value of the test of the hypothesis that $\beta_2 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`
Diet3 This is always the name of your X-variable in your glm(Y ~ X, …).	-8.44476 This is the estimate of the change in the intercept, $\beta_3$. It is called $b_3$.	4.32324 This is the standard error of $b_3$. It tells you how much $b_3$ varies from sample to sample. The closer to zero, the better.	-1.953 Z-score for $b_3$.	0.0508 This is the p-value of the test of the hypothesis that $\beta_3 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`	. This is called a period. It means significant at the 0.1 level of $\alpha$.
Diet4 This is always the name of your X-variable in your glm(Y ~ X, …).	-67.41995 This is the estimate of the change in the intercept, $\beta_4$. It is called $b_4$.	3768.10023 This is the standard error of $b_4$. It tells you how much $b_4$ varies from sample to sample. The closer to zero, the better.	-0.018 Z-score for $b_4$.	0.9857 This is the p-value of the test of the hypothesis that $\beta_2 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`
Time:Diet2 This is always the name of your X-variable in your glm(Y ~ X, …).	0.02391 This is the estimate of the change in the slope, $\beta_5$. It is called $b_5$.	0.08679 This is the standard error of $b_5$. It tells you how much $b_5$ varies from sample to sample. The closer to zero, the better.	0.275 Z-score for $b_5$.	0.7830 This is the p-value of the test of the hypothesis that $\beta_2 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`
Time:Diet3 This is always the name of your X-variable in your glm(Y ~ X, …).	1.20955 This is the estimate of the change in the slope, $\beta_6$. It is called $b_6$.	0.51249 This is the standard error of $b_6$. It tells you how much $b_6$ varies from sample to sample. The closer to zero, the better.	2.360 Z-score for $b_6$.	0.0183 This is the p-value of the test of the hypothesis that $\beta_2 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`	* This is called a “star”. One star means significant at the 0.05 level of $\alpha$.
Time:Diet4 This is always the name of your X-variable in your glm(Y ~ X, …).	8.83168 This is the estimate of the change in the slope, $\beta_7$. It is called $b_7$.	471.01259 This is the standard error of $b_7$. It tells you how much $b_7$ varies from sample to sample. The closer to zero, the better.	0.019 Z-score for $b_7$.	0.9850 This is the p-value of the test of the hypothesis that $\beta_2 = 0$. To compute it yourself in R, use `pt(-abs(your t-value), df of your regression)*2`

---

(Dispersion parameter for binomial family taken to be 1) A simplifying assumption of the logistic regression. This can be changed if you know what you are doing. See ?summary.glm and look at the dispersion option for details. Unless you are pursuing an advanced degree in statistics, it is not recommended that you explore this option.

Null Deviance: The deviance of the null model.

800.44 In this case, the null deviance is 800.44

on 577 degrees of freedom The residual degrees of freedom for the null model.

Residual deviance: The test statistic from the chi-squared goodness of fit test of the logistic regression.

256.30 This is computed by the sum(log(myglm$res^2)). As long as this value is similar in size to the degrees of freedom, then the model is a good fit.

on 570 degrees of freedom This is $n-p$ where $n$ is the sample size and $p$ is the number of parameters in the regression model. In this case, there is a sample size of 578 and five parameters, $\beta_0$, $\beta_1$, $\beta_2$, $\beta_3$, $\beta_4$, $\beta_5$,$\beta_6$, and $\beta_7$ so 578-8 = 570.

AIC: A form of the Akiake Information Criterion. Smaller values of the AIC indicate a better fitting model.

272.3 If a model for the same Y-variable can be found with an AIC value lower than 272.3, then it is a better fit of the data.

20 This implementation of glm required 20 Fisher Scoring iterations to converge. Fewer iterations hints that the model is a better fit than when many iterations are required.

Diagnose the Goodness-of-Fit

There are two ways to check the goodness of fit of a logistic regression model.

Option 1: Hosmer-Lemeshow Goodness-of-Fit Test

To check the goodness of fit of a logistic regression model where there are few or no replicated $x$-values use the Hosmer-Lemeshow Test.

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  myglm$y, myglm$fitted
## X-squared = 18.212, df = 8, p-value = 0.0197

Option 2: Deviance Goodness-of-fit Test

In some cases, there are many replicated $x$-values for all x-values. Though this is rare, it is good to use the deviance goodness-of-fit test whenever this happens.

myglm <- glm(weight>100 ~ Time*Diet, data = ChickWeight, family = binomial)
summary(myglm)

## 
## Call:
## glm(formula = weight > 100 ~ Time * Diet, family = binomial, 
##     data = ChickWeight)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9446  -0.2364   0.0000   0.2776   1.9968  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -4.97603    0.65316  -7.618 2.57e-14 ***
## Time           0.39111    0.04979   7.854 4.02e-15 ***
## Diet2          0.58283    1.04061   0.560   0.5754    
## Diet3         -8.44476    4.32324  -1.953   0.0508 .  
## Diet4        -67.41995 3768.10023  -0.018   0.9857    
## Time:Diet2     0.02391    0.08679   0.275   0.7830    
## Time:Diet3     1.20955    0.51249   2.360   0.0183 *  
## Time:Diet4     8.83168  471.01259   0.019   0.9850    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 800.44  on 577  degrees of freedom
## Residual deviance: 256.30  on 570  degrees of freedom
## AIC: 272.3
## 
## Number of Fisher Scoring iterations: 20

pchisq(256.30, 570, lower.tail = TRUE)

## [1] 6.585816e-33

Predict Probabilities

To predict the probability that $Y_i=1$ for a given $x$-value, use the code

predict( The predict() function allows us to use the regression model that was obtained from glm() to predict the probability that $Y_i = 1$ for a given $X_i$. YourGlmName, YourGlmName is the name of the object you created when you performed your logistic regression using glm(). newdata = The newdata = command allows you to specify the x-values for which you want to obtain predicted probabilities that $Y_i=1$. NewDataFrame, Typically, NewDataFrame is created in real time using the data.frame( X1 = c(Value 1, Value 2, …), X2 = c(Value 1, Value 2, …), …) command. You should see the GSS example file for an example of how to use this function. type = “response”) The type = “response” options specifies that you want predicted probabilities. There are other options available. See ?predict.glm for details.

##         1 
## 0.7314498

Plot the Regression

b <- coef(myglm) This stores the estimated coefficients from the regression into the “b” vector.
palette( The palette() function allows you to specify the colors R chooses for the plot. c(“SomeColor”,“DifferentColor”,…) Specify as many colors for the palette as you wish. R will choose a color for each group that you specify later on in your plot(…) code. )) Closing parentheses.
plot( Create the binary scatterplot for the logistic regression. Y Note that Y must be binary (0,1) values. If it is not, use a logical statement to make it binary, like height>60 or sex==“B”. ~ The formula operator in R. X1, The first x-variable in your glm code. data = YourDataSet, Specify the name of your data set. pch = 16) Select the type of plotting characters to use for the plot.
curve( Use the curve function to add the logistic regression curve to the plot. exp( The exp(…) function computes e^(stuff) in R and stands for the “exponential function.” (b[1]+ The first coefficient found in b is the y-intercept. b[2] The second coefficient found in b is the slope term. *x) The curve(…) function requires that you call the x-variable “x” although you can change this behavior using xname=“SomeOtherName” if you want. / The logistic model is e^(stuff) / (1 + e^(stuff)). (1+exp(b[1]+b[2]x)) The denominator of the logistic regression. Be careful to group the entire denominator (1+exp(stuff)). col = palette()[1], Pull the first color from the color palette for the first curve. add = TRUE) Add the curve to the current plot.
curve(exp((b[1]+b[3])+(b[2]+b[4])x) Note how the numerator of this second logistic curve uses and adjusted y-intercept (b[1]+b[3]) and an adjusted slope (b[2]+b[4])x. / Begin the denominator. (1+exp((b[1]+b[3])+(b[2]+b[4])x)) The denominator consists of (1+exp(stuff)). , Begin optional parameters. col = palette()[2], Set the color of this second curve to the second color in the palette. add = TRUE) Add the curve to the current plot.
legend( The legend function adds a legend to the current plot. “topright”, Typically the legend is placed in the top-right corner of the graph. But it could be placed “top”, “topleft”, “left”, “bottomleft”, “bottom”, “center”, “right”, or “bottomright”. legend = Why you have to write legend again, no one knows, but you do. c(“Lable 1”, “Label 2”), These are the values that will appear in the legend, one entry on each line of the legend. col = palette(), This specifies the colors of the symbols in the legend. First color goes with “Label 1”, and second goes with “Label 2” and so on. lty = 1, Specifies that solid lines should be used in the legend. If you wanted dots instead, use pch=16. If you wanted dashed lines, use lty=2. Many other options exist. bty = Specifies the “box type” (bty) that should be drawn around the legend. ‘n’) By placing a “no” option (‘n’) here, no box is drawn around the legend.

ggplot(data=YourDataName, Initialize a ggplot. aes( Declare the aesthetics of the graph. x=X1, Set the x-axis to be your X1 variable. y=Y Set the y-axis to be your Y-variable. ) Closing parenthesis for aes(…) function. ) Closing parenthesis for ggplot(…) function. + Add a layer to the ggplot.
geom_point( Add dots to the ggplot. ) Closing parenthesis for the geom_point() function. + Add a layer to the ggplot.
geom_smooth(method=“glm”, method.args=list(family=“binomial”), se=FALSE, Add the logistic regression curve to the plot. aes(color=“X2) Create a different logistic regression curve for each group found in X2, and give each curve a different color. ) Closing parenthesis for the geom_smooth() function. + Add a layer to the plot.
theme_bw() Add a black and white them to the plot.

ggplot(ChickWeight, aes(Time, as.numeric(weight>100))) +
  geom_point() +
  stat_smooth(aes(color = Diet), method = "glm", method.args = list(family = "binomial"), se = FALSE) +
  theme_bw() +
  labs(y = "weight>100")

## `geom_smooth()` using formula = 'y ~ x'

Explanation

Multiple Logistic Regression is used when

the response variable is binary $(Y_i=0$ or $1)$, and
there are multiple explanatory variables $X_1,\ldots,X_p$ that can be either quantitative or qualitative.

The Model

Very little changes in multiple logistic regression from Simple Logistic Regression. The probability that $Y_i = 1$ given the observed data $(x_{i1},\ldots,x_{ip})$ is called $\pi_i$ and is modeled by the expanded equation

\[ P(Y_i = 1|\, x_{i1},\ldots,x_{ip}) = \frac{e^{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}}}{1+e^{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} }} = \pi_i \]

The assumption is that for certain combinations of $X_1,\ldots,X_p$ the probability that $Y_i=1$ is higher than for other combinations.

Interpretation

The model for $\pi_i$ comes from modeling the log of the odds that $Y_i=1$ using a linear regression, i.e., \[ \log\underbrace{\left(\frac{\pi_i}{1-\pi_i}\right)}_{\text{Odds for}\ Y_i=1} = \underbrace{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}}_{\text{linear regression}} \] Beginning to solve this equation for $\pi_i$ leads to the intermediate, but important result that \[ \underbrace{\frac{\pi_i}{1-\pi_i}}_{\text{Odds for}\ Y_i=1} = e^{\overbrace{\beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}}^{\text{liear regression}}} = e^{\beta_0}e^{\beta_1 x_{i1}}\cdots e^{\beta_p x_{ip}} \] As in Simple Linear Regression, the values of $e^{\beta_0}$, $e^{\beta_1}$, $\ldots$, $e^{\beta_p}$ are interpreted as the proportional change in odds for $Y_i=1$ when a given $x$-variable experiences a unit change, all other variables being held constant.

Checking the Model Assumptions

Diagnostics are the same in multiple logistic regression as they are in simple logistic regression.

Prediction

The idea behind prediction in multiple logistic regression is the same as in simple logistic regression. The only difference is that more than one explanatory variable is used to make the prediction of the risk that $Y_i=1$.

Coefficients: Notice that in your glm(…) you used only \(Y\) and \(X\). You did type out any coefficients, i.e., the \(\beta_0\) or \(\beta_1\) of the regression model. These coefficients are estimated by the glm(…) function and displayed in this part of the output along with standard errors, t-values, and p-values.
	Estimate To learn more about the “Estimates” of the “Coefficients” see the “Explanation” tab, “Estimating the Model Parameters” section for details.	Std. Error To learn more about the “Standard Errors” of the “Coefficients” see the “Explanation” tab, “Inference for the Model Parameters” section.	z value The test statistic is a regular old z-score. It is most reliable when the sample size is “large.” It is a measurement of the number of standard errors the estimate is from 0.	Pr(>\|z\|) This is the p-value, the probability of observing a test statistic more extreme than Z.
(Intercept) This always says “Intercept” for any glm(…) you run in R. That is because R always assumes there is a y-intercept for your regression function.	2.630849 This is the estimate of the y-intercept, \(\beta_0\). It is called \(b_0\). It is the value of the log of the odds that \(Y_i=1\) when \(x_i\) is zero. Remember to use \(e^{b_0}\) to interpret this values actual effect on the odds.	1.050170 This is the standard error of \(b_0\). It tells you how much \(b_0\) varies from sample to sample. The closer to zero, the better.	2.505 The test statistic for testing the hypothesis that \(\beta_0 = 0\).	0.01224 This is the p-value of the test of the hypothesis that \(\beta_0 = 0\). It measures the probability of observing a z-score as extreme as the one observed. To compute it yourself in R, use `pnorm(-abs(your z-value))*2`.	* This is called a “star”. One star means significant at the 0.1 level of \(\alpha\).
disp This is always the name of your X-variable in your glm(Y ~ X, …).	-0.014604 This is the estimate of the slope, \(\beta_1\). It is called \(b_1\). It is the change in the log of the odds that \(Y_i = 1\) as X is increased by 1 unit. Remember to use \(e^{b_1}\) to compute the actual effect on the odds.	0.005168 This is the standard error of \(b_1\). It tells you how much \(b_1\) varies from sample to sample. The closer to zero, the better.	-2.826 This is the test statistic for testing the hypothesis that \(\beta_1 = 0\).	0.00471 This is the p-value of the test of the hypothesis that \(\beta_1 = 0\). To compute it yourself in R, use `pnorm(-abs(your z-value))*2`	** This is called a “star”. Three stars means significant at the 0.001 level of \(\alpha\).

Parameter	Effect
\(\beta_0\)	Y-intercept of the Model. Only interpreted by computing \(e^{\beta_0}\), which gives the “baseline odds” of the model. Technically only meaningful for when \(x_i=0\) is a reasonable value.
\(\beta_1\)	Slope term of the model. Interpreted by computing \(e^{\beta_1}\), which gives the multiplicative change in the odds for each 1 unit increase in x. Say, \(e^{\beta_1} = 1.2\), then the odds are 1.2 times what they were before whenever x is increased by 1 unit. That’s a twenty percent increase in odds.

Parameter	Effect
\(\beta_0\)	Y-intercept of the Model. Interpreted by computing \(e^{\beta_0}\), which gives the baseline odds of a success. Only interpreteable when \(X_i=0\) is meaningful.
\(\beta_1\)	Controls the x-position of the center point of the “vertex” of the quadratic logistic model by \(\frac{-\beta_1}{2\cdot\beta_2}\). Not directly interpretable, even with \(e^{\beta_1}\), as the effect of \(\beta_1\) is not independent of \(\beta_2\).
\(\beta_2\)	Controls the concavity and “steepness” of the Model: negative values result in a logistic model with a maximum point on the curve, positive values result in a minimum point on the curve; large values imply “steeper” curves and low values imply “flatter” curves. Also involved in the position of the vertex, see \(\beta_1\)’s explanation. Not directly interpretable.

Parameter	Effect
\(\beta_0\)	Y-intercept of the Model
\(\beta_1\)	Slope of the line in the \(X_1\) direction.
\(\beta_2\)	Slope of the line in the \(X_2\) direction.
\(\beta_3\)	Interaction term that allows the model, which is a plane in three-dimensional space, to “bend”. If this term is zero, then the regression surface is just a flat plane.

Parameter	Effect
\(\beta_0\)	Y-intercept of the Model
\(\beta_1\)	Slope of the line in the \(X_1\) direction.
\(\beta_2\)	Slope of the line in the \(X_2\) direction.
\(...\)	Slopes in other directions depending on how many other variables are included in the model.
\(\beta_{p-1}\)	Final term in the model where there are \(p\) total \(\beta\)’s. The reason for the \(p-1\) on the last term is because we started with \(\beta_0\) for the first term, leaving \(\beta_{p-1}\) as the last term.