How to set up and analyze hypothesis tests for 1-sample proportions
How to create confidence intervals for 1-sample proportions
How to set up and analyze hypothesis tests for 2-sample proportions
How to create confidence intervals for the difference between 2-sample proportions
One-Sample Proportions
Hypothesis Tests
By now, the framework of hypothesis testing should start to feel familiar.
Set up a Null and alternative hypothesis in terms of a population parameter
Use data to calculate a test statistic
Calculate a P-value (the probability of getting a test statistic as extreme or more extreme than the one I observed if the null hypothesis were true)
Compare P-value to
State your conclusion in context of the problem
The process is no different. We want to use data to make a conclusion about the unknown population proportion, .
Left Handedness Among Visual Arts Majors
There is a belief in a link between creativity and left-handedness. Left-handed people make up about 10% of the population. Let’s test to see if BYU-I Visual Arts majors have a higher proportion of left-handed people than the general population.
Define our null and alternative hypotheses:
Note: In stats we typically use Greek letters to designate population parameters. For proportions we us the Latin letter, , because was already taken.
We collect a random sample of 77 Visual Arts majors and find that 10 of them are left handed. The sample proportion is:
Even though our sample proportion is bigger than 0.10, we understand sampling variability and want to see if it is statistically significantly higher than 0.10.
The prop.test() function behaves very much like the t.test() function. We have to input , , the null hypothesized , and the alternative hypothesis:
prop.test(x =10, n =77, p=0.10, alternative="greater")
1-sample proportions test with continuity correction
data: 10 out of 77, null probability 0.1
X-squared = 0.46753, df = 1, p-value = 0.2471
alternative hypothesis: true p is greater than 0.1
95 percent confidence interval:
0.07423612 1.00000000
sample estimates:
p
0.1298701
Conclusion: Because the p-value, 0.2471, is greater than , we fail to reject the null hypothesis. We have insufficient evidence to suggest that BYU-I Visual Arts majors have a higher proportion of left-handed people than the general population.
Checking Requirements
Recall that we must check that we have a big enough sample size to trust our p-value. To do this, we check that there are more than 10 expected number of success and failures for a given sample size:
QUESTION: Are both and greater than 10? ANSWER:
It looks like we don’t have enough data to assume that the distribution of is normal. Our p-value may not be appropriate.
Confidence Intervals
We can use prop.test() to calculate confidence intervals as well.
Recall: Confidence intervals do not depend on null and alternative hypotheses so we omit that information in the prop.test() function.
Explanation: I am 95% confident that the true proportion of left-handed Visual Arts majors is between 0.067 and 0.230.
Your Turn: Trump Support in Idaho
In the 2024 presidential election, Donald Trump won 67% of the vote in Idaho. After months in office, you would like to see if support for Trump has decreased from his share of the vote. You sample 772 registered voters and get 566 responses. Of the 566 responses, 364 say they approve of President Trump.
Perform a hypothesis test that tests if the presidential approval is less than his share of the vote:
Question: What is the value of the test statistics for this test? Answer:
Question: What is the P-Value? Answer:
Question: State your conclusion in context of this problem: Answer:
QUESTION: Can we trust the P-value? (Check and )
ANSWER:
Create a Confidence Interval for Idaho’s Presidential Approval:
QUESTION: Explain your confidence interval in context of the research question: ANSWER:
2-Sample Proportion Tests
Two sample proportion tests are used to compare proportions between two independent groups. We here discuss these tests and provide examples of their application in R.
The hypothesis test should not be surprising:
where represents the unknown population proportion for group 1 and represents the unknown population proportion for group 2.
Two Sample Proportion Test
The R code modifies slightly the prop.test() code for the one-sample proportion test. We simply add 2 values for X and 2 values for N. Generically, this looks like:
prop.test(x=c(x1, x2), n=c(n1, n2), alternative = "")
where you must specify whether or not you thing group 1 is “less”, “greater”, or “two.sided” for not equal to.
Example 1: Voting Behaviour by Gender
Suppose we want to test if women are more likely to identify as Democrat than men. We sample 250 men and 250 women and measure their political affiliation. We find that 80 men identify as Democrat and 102 females identify as Democrat.
Just as with the two-sample t-test for means, we must define a reference group. In this example, we will use females as the reference group so that our alternative will be relative to that group.
We will use
prop.test(x =c(102, 80), n =c(250, 250), alternative ="greater")
2-sample test for equality of proportions with continuity correction
data: c(102, 80) out of c(250, 250)
X-squared = 3.8099, df = 1, p-value = 0.02548
alternative hypothesis: greater
95 percent confidence interval:
0.01350993 1.00000000
sample estimates:
prop 1 prop 2
0.408 0.320
We can also create a confidence interval for the difference:
Confidence intervals for differences can be positive and negative. In this example, a negative number would indicate that Females are less likely to be Democrat and a positive number means they are more likely to be Democrat.
Our confidence interval is just above zero on the lower end. We are 95% confident that females are between 0.000% and 17.6% more likely to be Democrat than men.
Test Requirments
Just as with 1-sample proportion tests, we must validate that we have a large enough sample size to ensure that is approximately normally distributed. When we have 2 samples, however, we must check both ’s. For both hypothesis testing and confidence intervals we check:
Requirements for Hypothesis Testing and Confidence Intervals
An easy R calculator to check this is:
# All must be true:x1 <-102n1 <-250phat1 <- x1/n1n1*phat1 >=10
[1] TRUE
n1*(1-phat1) >=10
[1] TRUE
x2 <-80n2 <-250phat2 <- x2 / n2n2*phat2 >=10
[1] TRUE
n2*(1-phat2) >=10
[1] TRUE
Soccer Popularity on the Rise?
Soccer is becoming much more popular in the United States. We would like to test if this is being driven by demographic shifts in the population where the younger generation is more likely to favor soccer.
A researcher samples 524 individuals under 40 and 655 individuals older than 40 and asks what their preferred sport is. Of the 524 respondents under 40, 44 identified soccer as their favorite sport. Of the 655 respondents over 40, 27 identified soccer as their favorite sport.
Perform a 2-sample proportion test to determine if significantly more younger people identify soccer as their favorite sport.
QUESTION: State your Null and Alternative Hypotheses:
QUESTION: Perform the appropriate analysis:
prop.test()
Error in prop.test(): argument "x" is missing, with no default
QUESTION: Are the requirements for the hypothesis test and confidence interval satisfied?
# All must be true:x1 <-n1 <-phat1 <- x1/n1n1*phat1 >=10
[1] FALSE
n1*(1-phat1) >=10
[1] FALSE
x2 <-n2 <-phat2 <- x2 / n2n2*phat2 >=10
[1] FALSE
n2*(1-phat2) >=10
[1] FALSE
ANSWER:
Create and interpret the confidence interval for the difference in the proportions.
QUESTION: Explain your confidence interval in context of the research question: ANSWER: