One and Two Sample Proportion Tests

Introduction

In this section, we will learn:

How to set up and analyze hypothesis tests for 1-sample proportions
How to create confidence intervals for 1-sample proportions
How to set up and analyze hypothesis tests for 2-sample proportions
How to create confidence intervals for the difference between 2-sample proportions

One-Sample Proportions

Hypothesis Tests

By now, the framework of hypothesis testing should start to feel familiar.

Set up a Null and alternative hypothesis in terms of a population parameter
Use data to calculate a test statistic
Calculate a P-value (the probability of getting a test statistic as extreme or more extreme than the one I observed if the null hypothesis were true)
Compare P-value to $α$
State your conclusion in context of the problem

The process is no different. We want to use data to make a conclusion about the unknown population proportion, $p$ .

Left Handedness Among Visual Arts Majors

There is a belief in a link between creativity and left-handedness. Left-handed people make up about 10% of the population. Let’s test to see if BYU-I Visual Arts majors have a higher proportion of left-handed people than the general population.

Define our null and alternative hypotheses:

$H_{0} : p = 0.10$ $H_{a} : p > 0.10$ $α = 0.05$

Note: In stats we typically use Greek letters to designate population parameters. For proportions we us the Latin letter, $p$ , because $π$ was already taken.

We collect a random sample of 77 Visual Arts majors and find that 10 of them are left handed. The sample proportion is:

$\hat{p} = \frac{X}{N} = \frac{10}{77} = 0.1298701$

Even though our sample proportion is bigger than 0.10, we understand sampling variability and want to see if it is statistically significantly higher than 0.10.

The prop.test() function behaves very much like the t.test() function. We have to input $X$ , $N$ , the null hypothesized $p$ , and the alternative hypothesis:

prop.test(x = 10, n = 77, p=0.10, alternative="greater")


    1-sample proportions test with continuity correction

data:  10 out of 77, null probability 0.1
X-squared = 0.46753, df = 1, p-value = 0.2471
alternative hypothesis: true p is greater than 0.1
95 percent confidence interval:
 0.07423612 1.00000000
sample estimates:
        p 
0.1298701

Conclusion: Because the p-value, 0.2471, is greater than $α = 0.05$ , we fail to reject the null hypothesis. We have insufficient evidence to suggest that BYU-I Visual Arts majors have a higher proportion of left-handed people than the general population.

Checking Requirements

Recall that we must check that we have a big enough sample size to trust our p-value. To do this, we check that there are more than 10 expected number of success and failures for a given sample size:

$n p = 77 * 0.10 = 7.7$

$n (1 - p) = 77 * 0.9 = 69.3$ QUESTION: Are both $n p$ and $n (1 - p)$ greater than 10?
ANSWER:

It looks like we don’t have enough data to assume that the distribution of $\hat{p}$ is normal. Our p-value may not be appropriate.

Confidence Intervals

We can use prop.test() to calculate confidence intervals as well.

Recall: Confidence intervals do not depend on null and alternative hypotheses so we omit that information in the prop.test() function.

prop.test(x=10, n = 77)$conf.int

[1] 0.06738384 0.23042038
attr(,"conf.level")
[1] 0.95

Explanation: I am 95% confident that the true proportion of left-handed Visual Arts majors is between 0.067 and 0.230.

Your Turn: Trump Support in Idaho

In the 2024 presidential election, Donald Trump won 67% of the vote in Idaho. After months in office, you would like to see if support for Trump has decreased from his share of the vote. You sample 772 registered voters and get 566 responses. Of the 566 responses, 364 say they approve of President Trump.

Perform a hypothesis test that tests if the presidential approval is less than his share of the vote:

$H_{0} : p =$

$H_{a} : p$

$α =$

Question: What is the value of the test statistics for this test?
Answer:

Question: What is the P-Value?
Answer:

Question: State your conclusion in context of this problem:
Answer:

QUESTION: Can we trust the P-value? (Check $n p$ and $n (1 - p)$ )

ANSWER:

Create a $(1 - α)$ Confidence Interval for Idaho’s Presidential Approval:

QUESTION: Explain your confidence interval in context of the research question:
ANSWER:

2-Sample Proportion Tests

Two sample proportion tests are used to compare proportions between two independent groups. We here discuss these tests and provide examples of their application in R.

The hypothesis test should not be surprising:

$H_{0} : p_{1} = p_{2}$ $H_{a} : p_{1} (<, >, \neq) p_{2}$ where $p_{1}$ represents the unknown population proportion for group 1 and $p_{2}$ represents the unknown population proportion for group 2.

Two Sample Proportion Test

The R code modifies slightly the prop.test() code for the one-sample proportion test. We simply add 2 values for X and 2 values for N. Generically, this looks like:

prop.test(x=c(x1, x2), n=c(n1, n2), alternative = "")

where you must specify whether or not you thing group 1 is “less”, “greater”, or “two.sided” for not equal to.

Example 1: Voting Behaviour by Gender

Suppose we want to test if women are more likely to identify as Democrat than men. We sample 250 men and 250 women and measure their political affiliation. We find that 80 men identify as Democrat and 102 females identify as Democrat.

Just as with the two-sample t-test for means, we must define a reference group. In this example, we will use females as the reference group so that our alternative will be relative to that group.

$H_{0} : p_{f e m a l e D e m} = p_{m a l e D e m}$ $H_{a} : p_{f e m a l e D e m} > p_{m a l e D e m}$ We will use $α = 0.05$

prop.test(x = c(102, 80), n = c(250, 250), alternative = "greater")


    2-sample test for equality of proportions with continuity correction

data:  c(102, 80) out of c(250, 250)
X-squared = 3.8099, df = 1, p-value = 0.02548
alternative hypothesis: greater
95 percent confidence interval:
 0.01350993 1.00000000
sample estimates:
prop 1 prop 2 
 0.408  0.320

We can also create a confidence interval for the difference:

prop.test(x = c(102, 80), n = c(250, 250))$conf.int

[1] 5.904086e-06 1.759941e-01
attr(,"conf.level")
[1] 0.95

Confidence intervals for differences can be positive and negative. In this example, a negative number would indicate that Females are less likely to be Democrat and a positive number means they are more likely to be Democrat.

Our confidence interval is just above zero on the lower end. We are 95% confident that females are between 0.000% and 17.6% more likely to be Democrat than men.

Test Requirments

Just as with 1-sample proportion tests, we must validate that we have a large enough sample size to ensure that $\hat{p}$ is approximately normally distributed. When we have 2 samples, however, we must check both $\hat{p}$ ’s. For both hypothesis testing and confidence intervals we check:

Requirements for Hypothesis Testing and Confidence Intervals

$n_{1} {\hat{p}}_{1} \geq 10$ $n_{1} (1 - {\hat{p}}_{1}) \geq 10$ $n_{2} {\hat{p}}_{2} \geq 10$ $n_{2} (1 - {\hat{p}}_{2}) \geq 10$

An easy R calculator to check this is:

# All must be true:

x1 <- 102
n1 <- 250
phat1 <- x1/n1

n1*phat1 >= 10

[1] TRUE

n1*(1-phat1) >=10

[1] TRUE

x2 <- 80
n2 <- 250
phat2 <- x2 / n2

n2*phat2 >= 10

[1] TRUE

n2*(1-phat2) >=10

[1] TRUE

Soccer Popularity on the Rise?

Soccer is becoming much more popular in the United States. We would like to test if this is being driven by demographic shifts in the population where the younger generation is more likely to favor soccer.

A researcher samples 524 individuals under 40 and 655 individuals older than 40 and asks what their preferred sport is. Of the 524 respondents under 40, 44 identified soccer as their favorite sport. Of the 655 respondents over 40, 27 identified soccer as their favorite sport.

Perform a 2-sample proportion test to determine if significantly more younger people identify soccer as their favorite sport.

QUESTION: State your Null and Alternative Hypotheses:

$H o :$

$H a :$

$α =$

QUESTION: Perform the appropriate analysis:

prop.test()

Error in prop.test(): argument "x" is missing, with no default

QUESTION: Are the requirements for the hypothesis test and confidence interval satisfied?

# All must be true:

x1 <- 
n1 <- 
phat1 <- x1/n1

n1*phat1 >= 10

[1] FALSE

n1*(1-phat1) >=10

[1] FALSE

x2 <- 
n2 <- 
phat2 <- x2 / n2

n2*phat2 >= 10

[1] FALSE

n2*(1-phat2) >=10

[1] FALSE

ANSWER:

Create and interpret the confidence interval for the difference in the proportions.

QUESTION: Explain your confidence interval in context of the research question:
ANSWER: