This document uses the Angell dataset from library(car) to determine if there is greater mobility between the East and the West among the cities in the U.S. (around 1950).

First, because this file is being used to demonstrate the Wilcoxon Rank Sum Test, we need to isolate the data to two groups, East and West. We will do this by combining S and NE to be E and combining MW and W to be W. We will use the library(tidyverse) and the function recode to do this. Notice how the dataset is modifed by the recode command in the code below.

Angell2 <- Angell %>%
  mutate(area = recode(region, S="E", NE="E", MW="W"))

# alternatively we could have used:
# Angell2 <- Angell %>%
#    mutate(area = mapvalues(region,
#                            from = c("S", "MW"), 
#                            to = c("E", "W")))

rownames(Angell2) <- rownames(Angell)

Hide Data

Show Data

Original Angell Data

pander(Angell)
  moral hetero mobility region
Rochester 19 20.6 15 E
Syracuse 17 15.6 20.2 E
Worcester 16.4 22.1 13.6 E
Erie 16.2 14 14.8 E
Milwaukee 15.8 17.4 17.6 MW
Bridgeport 15.3 27.9 17.5 E
Buffalo 15.2 22.3 14.7 E
Dayton 14.3 23.7 23.8 MW
Reading 14.2 10.6 19.4 E
Des_Moines 14.1 12.7 31.9 MW
Cleveland 14 39.7 18.6 MW
Denver 13.9 13 34.5 W
Peoria 13.8 10.7 35.1 MW
Wichita 13.6 11.9 42.7 MW
Trenton 13 32.5 15.8 E
Grand_Rapids 12.8 15.7 24.2 MW
Toledo 12.7 19.2 21.6 MW
San_Diego 12.5 15.9 49.8 W
Baltimore 12 45.8 12.1 E
South_Bend 11.8 17.9 27.4 MW
Akron 11.3 20.4 22.1 MW
Detroit 11.1 38.3 19.5 MW
Tacoma 10.9 17.8 31.2 W
Flint 9.8 19.3 32.2 MW
Spokane 9.6 12.3 38.9 W
Seattle 9 23.9 34.2 W
Indianapolis 8.8 29.2 23.1 MW
Columbus 8 27.4 25 MW
Portland_Oregon 7.2 16.4 35.8 W
Richmond 10.4 65.3 24.9 S
Houston 10.2 49 36.1 S
Fort_Worth 10.2 30.5 36.8 S
Oklahoma_City 9.7 20.7 47.2 S
Chattanooga 9.3 57.7 27.2 S
Nashville 8.6 57.4 25.4 S
Birmingham 8.2 83.1 25.9 S
Dallas 8 36.8 37.8 S
Louisville 7.7 31.5 19.4 S
Jacksonville 6 73.7 27.7 S
Memphis 5.4 84.5 26.7 S
Tulsa 5.3 23.8 44.9 S
Miami 5.1 50.2 41.8 S
Atlanta 4.2 70.6 32.6 S

Modified Angell Data

pander(Angell2)
  moral hetero mobility region area
Rochester 19 20.6 15 E E
Syracuse 17 15.6 20.2 E E
Worcester 16.4 22.1 13.6 E E
Erie 16.2 14 14.8 E E
Milwaukee 15.8 17.4 17.6 MW W
Bridgeport 15.3 27.9 17.5 E E
Buffalo 15.2 22.3 14.7 E E
Dayton 14.3 23.7 23.8 MW W
Reading 14.2 10.6 19.4 E E
Des_Moines 14.1 12.7 31.9 MW W
Cleveland 14 39.7 18.6 MW W
Denver 13.9 13 34.5 W W
Peoria 13.8 10.7 35.1 MW W
Wichita 13.6 11.9 42.7 MW W
Trenton 13 32.5 15.8 E E
Grand_Rapids 12.8 15.7 24.2 MW W
Toledo 12.7 19.2 21.6 MW W
San_Diego 12.5 15.9 49.8 W W
Baltimore 12 45.8 12.1 E E
South_Bend 11.8 17.9 27.4 MW W
Akron 11.3 20.4 22.1 MW W
Detroit 11.1 38.3 19.5 MW W
Tacoma 10.9 17.8 31.2 W W
Flint 9.8 19.3 32.2 MW W
Spokane 9.6 12.3 38.9 W W
Seattle 9 23.9 34.2 W W
Indianapolis 8.8 29.2 23.1 MW W
Columbus 8 27.4 25 MW W
Portland_Oregon 7.2 16.4 35.8 W W
Richmond 10.4 65.3 24.9 S E
Houston 10.2 49 36.1 S E
Fort_Worth 10.2 30.5 36.8 S E
Oklahoma_City 9.7 20.7 47.2 S E
Chattanooga 9.3 57.7 27.2 S E
Nashville 8.6 57.4 25.4 S E
Birmingham 8.2 83.1 25.9 S E
Dallas 8 36.8 37.8 S E
Louisville 7.7 31.5 19.4 S E
Jacksonville 6 73.7 27.7 S E
Memphis 5.4 84.5 26.7 S E
Tulsa 5.3 23.8 44.9 S E
Miami 5.1 50.2 41.8 S E
Atlanta 4.2 70.6 32.6 S E

Now we can compare the East and West with respect to their mobility scores.

boxplot(mobility ~ area, data=Angell2, names=c("Eastern Cities","Western Cities"), ylab="Mobility Score", col='gray', boxwex=.25, main = "Geographic Mobility of U.S. Cities, 1950", xlab="Cities in the Western U.S. Show Higher Mobility")

It appears there may be a slight shift in medians with the West being higher. Since the distibutions are similarly shaped (slightly right skewed), an official test of the hypotheses \[ H_0: \text{difference in medians} = 0 \] \[ H_a: \text{difference in medians} \neq 0 \] can be performed. Using a Wilcoxon Rank Sum Test (using the normal approximation with continuity correction due to ties in the data), we obtain a test statistic of \(W = 181\) and a p-value of \(0.2376\). There is insufficient evidence to reject the null. We conclude that any differences in medians demonstrated by the above boxplot is simply due to random sampling. The mobility scores for the entire U.S. appear to be the same on average (median) between the East and West.


Appendix

Hide

Show

To see the R Code that produced the Wilcoxon Test results reported above, click the code button to the right.

wilcox.test(mobility ~ area, data=Angell2)
## Warning in wilcox.test.default(x = c(15, 20.2, 13.6, 14.8, 17.5, 14.7,
## 19.4, : cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  mobility by area
## W = 181, p-value = 0.2376
## alternative hypothesis: true location shift is not equal to 0