Lesson 12: Inference for the Mean of Differences (Two Dependent Samples)

Lesson Outcomes
Example of Paired Data: Pre- and Post-test Scores
Hypothesis Tests
Confidence Intervals
- Mountain Pine Beetle Attacks
- Sleep Inducing Drugs
Summary
Navigation

Optional Videos for this Lesson

Lesson Outcomes

By the end of this lesson, you should be able to do the following:

Recognize when a mean of differences (two dependent samples) inferential procedure is appropriate
Create numerical and graphical summaries of the data
Perform a hypothesis test for the mean of differences (two dependent samples) using the following steps:
1. State the null and alternative hypotheses
2. Calculate the test-statistic, degrees of freedom and P-value of the test using software
3. Assess statistical significance in order to state the appropriate conclusion for the hypothesis test
4. Check the requirements for the hypothesis test
Create a confidence interval for the mean of differences (two dependent samples) using the following steps:
1. Calculate a confidence interval using software
2. Interpret the confidence interval
3. Check the requirements of the confidence interval

Example of Paired Data: Pre- and Post-test Scores

In education, it is very common for researchers to conduct studies in which they administer a pre-test, provide some instruction, and then give a post-test. The difference between the post- and pre-test scores is a measure of the student’s progress. In this case, it would not make much sense to only look at the mean score on the pre-test and compare it to the mean score on the post-test.

This is called a matched-pairs design or we say we have dependent samples. Matched-pairs (or paired-data) designs typically involve only one population, and a pair of observations is drawn on the individuals selected for the sample. In the context of the educational study, the two observations are student’s scores on (1) the pre-test and (2) the post-test. If a student is selected to participate in the pre-test (i.e., they are selected to be part of group 1), they are automatically selected to participate in the post-test (i.e., they are chosen to be in group 2 automatically.)

There is a lot of merit in subtracting the individual scores and looking at the mean gain. The researchers are not really interested in the students knowledge before the instruction. This is used as a baseline to measure how much was gained during the instruction. There is great value in looking at the difference. This removes the effect of the individual students’ ability, and it measures their learning during the unit.

To analyze the data, the researchers first find the difference in the post- and pre-test scores. At that point, the data have been reduced to a list of numbers (representing the increase in scores). Now, the researchers can conduct inference on the mean of these values. In other words, they can do a hypothesis test for the mean of the difference in the post- and pre-test scores.

A hypothesis test for two means with paired data (dependent samples) is conducted in the same way as a hypothesis test for a single mean with $\sigma$ unknown. The only exception is that the pairs of data must be subtracted before you start any computations. From a practical perspective, after you subtract, then you apply the one-sample procedures you have already learned. So, there is nothing new that you need to learn to compute a confidence interval for two means with paired data; just that we will be using a different sheet in the Math221 Statistics Toolbox that automatically calculates the differences.

We will first explore an application of pre- and post-testing in a weight loss study.

Hypothesis Tests

Mahon’s Weight Loss Study

Background

Annie Mahon and other researchers in Wayne Campbell’s nutrition lab studied the weight loss of $n=27$ middle aged women who consumed a prescribed low-calorie diet. The women’s weights were recorded (in kilograms) at the beginning of the study and after the nine-week diet period. The data are given in the file Mahon.xlsx. An excerpt of the data is given below.

Subject	Pre	Post
1	62.5	56.1
2	88.8	80.2
3	74.7	70.8
$\vdots$	$\vdots$	$\vdots$
26	76.3	73.8
27	82.1	77.9

Notice the structure of the data. The weight of each subject was measured before the study and at the conclusion of the study. Each person provided a pre-study weight and a post-study weight. Stated differently, the pre-study weights and the post-study weights are paired. For each row of data, both of these numbers came from the same person. When we collect two observations of the same measurement on each subject, we call it paired data. Sometimes paired data are called dependent samples.

Answer the following question:

The researchers measured the initial weights of the women prior to the study, even though they were not particularly interested in this value. What was the purpose of measuring the pre-study weights?

Subject	Post	Pre	Difference
1	56.1	62.5	56.1 $-$ 62.5 = -6.4
2	80.2	88.8	80.2 $-$ 88.8 = -8.6
3	70.8	74.7	70.8 $-$ 74.7 = -3.9
$\vdots$	$\vdots$	$\vdots$	$\vdots$
26	73.8	76.3	73.8 $-$ 76.3 = -2.5
27	77.9	82.1	77.9 $-$ 82.1 = -4.2


Mean:	$\bar d = -435.535$
Standard Deviation:	$s_d = 17.082$
Sample Size:	$n = 170$

Subject	Control (no drug)	L-Hyoscyamine	Difference
1	0.6	1.3	0.7
2	3	1.4	-1.6
3	4.7	4.5	-0.2
4	5.5	4.3	-1.2
5	6.2	6.1	-0.1
6	3.2	6.6	3.4
7	2.5	6.2	3.7
8	2.8	3.6	0.8
9	1.1	1.1	0
10	2.9	4.9	2
11	-	6.3	-


Mean:	$\bar d = 0.75$ hours
Standard Deviation:	$s_d = 1.79$ hours
Sample Size:	$n = 10$

Lesson 12: Inference for the Mean of Differences (Two Dependent Samples)

Optional Videos for this Lesson

Lesson Outcomes

Example of Paired Data: Pre- and Post-test Scores

Hypothesis Tests

Mahon’s Weight Loss Study

Nosocomial Infections

Additional Worked Examples

Effect of Stressful Classical Music on Your Metabolism

Cost of Airline Tickets

Confidence Intervals

Mountain Pine Beetle Attacks

Sleep Inducing Drugs

Summary

Navigation