In statistics, one-sample proportion tests are used to compare proportions or percentages to a hypothesized value. These tests are useful when dealing with categorical data. We here discuss these tests and provide examples of their application in R.
The hypothesis test should look very familiar:
\[H_0: p = p_0\]\[H_a: p\; (<,>,\neq) \: p_0\] where \(p_0\) is some hypothesized value for the population proportion.
So far, we have been using Greek letters to represent population parameters. We deviate from that now due to the fact that the Greek letter for p, \(\pi\), already has a long-established meaning in mathematics. In the hypothesis definition above, \(p\) represents the population proportion.
One Sample Proportion Test
The one sample proportion test is used to determine whether the proportion of successes based on a single sample differs significantly from a hypothesized value.
Example 1: One Sample Proportion Test
Suppose we want to test whether the proportion of students who passed an exam is significantly less than 0.75. We have a sample of 100 students, of which 72 passed.
\[\hat{p} = \frac{X}{N} = \frac{72}{100} = .72\]
If we want to test if this is significantly less than 75%, we can use the prop.test() which is very similar to t.test(). Instead of putting in a sample mean, \(\bar{x}\), with a hypothesized \(\mu\), we put in \(X\) and \(N\) and a hypothesized \(p\). Setting the alternative and confidence level operates the same as t.test().
Confidence intervals for proportions for proportions can also be obtained just as with t.test().
# One Sample Proportion Test Example# Hypothesized proportion: 0.75# Sample size: 100# Number of successes: 72prop.test(x =72, n =100, p =0.75, alternative ="less", conf.level = .9)
1-sample proportions test with continuity correction
data: 72 out of 100, null probability 0.75
X-squared = 0.33333, df = 1, p-value = 0.2819
alternative hypothesis: true p is less than 0.75
90 percent confidence interval:
0.0000000 0.7782396
sample estimates:
p
0.72
prop.test(x =72, n =100, conf.level = .9)$conf.int
Recall that the sampling distribution for \(\hat{p}\) is approximately normally distributed when our sample has more than 10 expeted “successes” and more than 10 expected “failures”. We test this by looking at
Hypothesis Test Requirements:
\[np \geq 10\]\[n(1-p) \geq 10\]
We use p, not \(\hat{p}\) for hypothesis testing because hypothesis testing always assumes the null hypothesis is true. Confidence intervals, on the other hand, make no such assumption.
To see if the calculated confidence interval is appropriate, we use \(\hat{p}\).
Confidence Interval Requirements:\[n\hat{p} \geq 10\]\[n(1-\hat{p}) \geq 10\] Can we trust the p-value and confidence interval?
You can use a calculator for this, or simply use R as a calculator:
x <-72n <-100p_hat <- x/np <- .75# For Hypothesis Testing:n*p >=10
[1] TRUE
n*(1-p) >=10
[1] TRUE
# For Confidence Intervals:n*p_hat >=10
[1] TRUE
n*(1-p_hat) >=10
[1] TRUE
Example 2: Handedness
Suppose the United States national average percent of left-handed people is 11%. A researcher wants to know if visual arts majors are significantly more likely to be left handed. She samples 250 visual arts majors and finds that 36 are left handed.
Perform a one-sample proportion to see if visual arts majors are significantly more left-handed than the general population.
State the null and alternative hypotheses and your significance level.
\[H_0: p = \]\[H_a: p \]\[\alpha = \]
Question: What is the value of the test statistics for this test? Answer:
Question: What is the P-Value? Answer:
Question: State your conclusion in context of this problem: Answer:
Make a \((1-\alpha)\) level confidence interval for the true population proportion.
Question: Interpret the confidence interval in context of the question: Answer:
Question: Are the test requirements for the normality of \(\hat{p}\) satisfied? Answer:
Question: Are the requirements for a confidence interval for \(p\) satisfied? Answer: