This document uses the Angell
dataset from library(car)
to determine if there is greater mobility
between the East
and the West
among the cities in the U.S. (around 1950).
First, because this file is being used to demonstrate the Wilcoxon Rank Sum Test, we need to isolate the data to two groups, East
and West
. We will do this by combining S
and NE
to be E
and combining MW
and W
to be W
. We will use the library(tidyverse)
and the function recode
to do this. Notice how the dataset is modifed by the recode command in the code below.
Angell2 <- Angell %>%
mutate(area = recode(region, S="E", NE="E", MW="W"))
# alternatively we could have used:
# Angell2 <- Angell %>%
# mutate(area = mapvalues(region,
# from = c("S", "MW"),
# to = c("E", "W")))
rownames(Angell2) <- rownames(Angell)
pander(Angell)
moral | hetero | mobility | region | |
---|---|---|---|---|
Rochester | 19 | 20.6 | 15 | E |
Syracuse | 17 | 15.6 | 20.2 | E |
Worcester | 16.4 | 22.1 | 13.6 | E |
Erie | 16.2 | 14 | 14.8 | E |
Milwaukee | 15.8 | 17.4 | 17.6 | MW |
Bridgeport | 15.3 | 27.9 | 17.5 | E |
Buffalo | 15.2 | 22.3 | 14.7 | E |
Dayton | 14.3 | 23.7 | 23.8 | MW |
Reading | 14.2 | 10.6 | 19.4 | E |
Des_Moines | 14.1 | 12.7 | 31.9 | MW |
Cleveland | 14 | 39.7 | 18.6 | MW |
Denver | 13.9 | 13 | 34.5 | W |
Peoria | 13.8 | 10.7 | 35.1 | MW |
Wichita | 13.6 | 11.9 | 42.7 | MW |
Trenton | 13 | 32.5 | 15.8 | E |
Grand_Rapids | 12.8 | 15.7 | 24.2 | MW |
Toledo | 12.7 | 19.2 | 21.6 | MW |
San_Diego | 12.5 | 15.9 | 49.8 | W |
Baltimore | 12 | 45.8 | 12.1 | E |
South_Bend | 11.8 | 17.9 | 27.4 | MW |
Akron | 11.3 | 20.4 | 22.1 | MW |
Detroit | 11.1 | 38.3 | 19.5 | MW |
Tacoma | 10.9 | 17.8 | 31.2 | W |
Flint | 9.8 | 19.3 | 32.2 | MW |
Spokane | 9.6 | 12.3 | 38.9 | W |
Seattle | 9 | 23.9 | 34.2 | W |
Indianapolis | 8.8 | 29.2 | 23.1 | MW |
Columbus | 8 | 27.4 | 25 | MW |
Portland_Oregon | 7.2 | 16.4 | 35.8 | W |
Richmond | 10.4 | 65.3 | 24.9 | S |
Houston | 10.2 | 49 | 36.1 | S |
Fort_Worth | 10.2 | 30.5 | 36.8 | S |
Oklahoma_City | 9.7 | 20.7 | 47.2 | S |
Chattanooga | 9.3 | 57.7 | 27.2 | S |
Nashville | 8.6 | 57.4 | 25.4 | S |
Birmingham | 8.2 | 83.1 | 25.9 | S |
Dallas | 8 | 36.8 | 37.8 | S |
Louisville | 7.7 | 31.5 | 19.4 | S |
Jacksonville | 6 | 73.7 | 27.7 | S |
Memphis | 5.4 | 84.5 | 26.7 | S |
Tulsa | 5.3 | 23.8 | 44.9 | S |
Miami | 5.1 | 50.2 | 41.8 | S |
Atlanta | 4.2 | 70.6 | 32.6 | S |
pander(Angell2)
moral | hetero | mobility | region | area | |
---|---|---|---|---|---|
Rochester | 19 | 20.6 | 15 | E | E |
Syracuse | 17 | 15.6 | 20.2 | E | E |
Worcester | 16.4 | 22.1 | 13.6 | E | E |
Erie | 16.2 | 14 | 14.8 | E | E |
Milwaukee | 15.8 | 17.4 | 17.6 | MW | W |
Bridgeport | 15.3 | 27.9 | 17.5 | E | E |
Buffalo | 15.2 | 22.3 | 14.7 | E | E |
Dayton | 14.3 | 23.7 | 23.8 | MW | W |
Reading | 14.2 | 10.6 | 19.4 | E | E |
Des_Moines | 14.1 | 12.7 | 31.9 | MW | W |
Cleveland | 14 | 39.7 | 18.6 | MW | W |
Denver | 13.9 | 13 | 34.5 | W | W |
Peoria | 13.8 | 10.7 | 35.1 | MW | W |
Wichita | 13.6 | 11.9 | 42.7 | MW | W |
Trenton | 13 | 32.5 | 15.8 | E | E |
Grand_Rapids | 12.8 | 15.7 | 24.2 | MW | W |
Toledo | 12.7 | 19.2 | 21.6 | MW | W |
San_Diego | 12.5 | 15.9 | 49.8 | W | W |
Baltimore | 12 | 45.8 | 12.1 | E | E |
South_Bend | 11.8 | 17.9 | 27.4 | MW | W |
Akron | 11.3 | 20.4 | 22.1 | MW | W |
Detroit | 11.1 | 38.3 | 19.5 | MW | W |
Tacoma | 10.9 | 17.8 | 31.2 | W | W |
Flint | 9.8 | 19.3 | 32.2 | MW | W |
Spokane | 9.6 | 12.3 | 38.9 | W | W |
Seattle | 9 | 23.9 | 34.2 | W | W |
Indianapolis | 8.8 | 29.2 | 23.1 | MW | W |
Columbus | 8 | 27.4 | 25 | MW | W |
Portland_Oregon | 7.2 | 16.4 | 35.8 | W | W |
Richmond | 10.4 | 65.3 | 24.9 | S | E |
Houston | 10.2 | 49 | 36.1 | S | E |
Fort_Worth | 10.2 | 30.5 | 36.8 | S | E |
Oklahoma_City | 9.7 | 20.7 | 47.2 | S | E |
Chattanooga | 9.3 | 57.7 | 27.2 | S | E |
Nashville | 8.6 | 57.4 | 25.4 | S | E |
Birmingham | 8.2 | 83.1 | 25.9 | S | E |
Dallas | 8 | 36.8 | 37.8 | S | E |
Louisville | 7.7 | 31.5 | 19.4 | S | E |
Jacksonville | 6 | 73.7 | 27.7 | S | E |
Memphis | 5.4 | 84.5 | 26.7 | S | E |
Tulsa | 5.3 | 23.8 | 44.9 | S | E |
Miami | 5.1 | 50.2 | 41.8 | S | E |
Atlanta | 4.2 | 70.6 | 32.6 | S | E |
Now we can compare the East
and West
with respect to their mobility
scores.
boxplot(mobility ~ area, data=Angell2, names=c("Eastern Cities","Western Cities"), ylab="Mobility Score", col='gray', boxwex=.25, main = "Geographic Mobility of U.S. Cities, 1950", xlab="Cities in the Western U.S. Show Higher Mobility")
It appears there may be a slight shift in medians with the West
being higher. Since the distibutions are similarly shaped (slightly right skewed), an official test of the hypotheses \[
H_0: \text{difference in medians} = 0
\] \[
H_a: \text{difference in medians} \neq 0
\] can be performed. Using a Wilcoxon Rank Sum Test (using the normal approximation with continuity correction due to ties in the data), we obtain a test statistic of \(W = 181\) and a p-value of \(0.2376\). There is insufficient evidence to reject the null. We conclude that any differences in medians demonstrated by the above boxplot is simply due to random sampling. The mobility scores for the entire U.S. appear to be the same on average (median) between the East
and West
.
To see the R Code that produced the Wilcoxon Test results reported above, click the code button to the right.
wilcox.test(mobility ~ area, data=Angell2)
## Warning in wilcox.test.default(x = c(15, 20.2, 13.6, 14.8, 17.5, 14.7,
## 19.4, : cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: mobility by area
## W = 181, p-value = 0.2376
## alternative hypothesis: true location shift is not equal to 0