TrailSS <- subset(RailTrail, spring== "1" | summer== "1")
TrailSpring <- subset(RailTrail, spring== "1")
TrailSummer <- subset(RailTrail, summer== "1")

Background

The Pioneer Valley Planning Commission (PVPC) collected data north of Chesnut Street in FLorence, MA for ninety days from April 5, 2005 to November 15, 2005. The data is contained in the RailTrail dataset.

The PVPC wants to know if there are more rail trail users in the Spring than in the Summer so they can know if they need to do any extra advertisemnet for the Summer trail users. Therefore we want to know if \(\mu_1 - \mu_2\) (the population averages for spring and summer), or the difference between the volume of users in Spring and Summer is different than zero.

Formally, the null and alternative hypotheses are written as

\[ H_0: \mu_1 - \mu_2 = 0 \] \[ H_a: \mu_1 - \mu_2 \neq 0 \]

The level of significance will be set at \[ a = 0.05 \]

datatable(TrailSS, options=list(lengthMenu = c(10,30)))

Analysis

This boxplot shows the volume of trail users with Spring data on the left, and Summer on the right. It looks like that the amount of people on the trail could be very similar between the spring and summer, with a slightly higher volume of users in the summer.

boxplot(TrailSpring$volume, TrailSummer$volume, xlab= "Spring and Summer", ylab= "Volume of Trail Users", main= "Volume of Trail Users in \n the Spring vs. Summer", col= "steelblue2")

To test the previously stated null and alternative hypotheses, we will be using an independent samples t-test to find out if there is a difference in the amount of people using the trail in the spring versus the summer.

Before we do this, we must verify a few things about the data. We need to verify if the sampling data for the spring and summer are normally distributed. To verify this we will use QQ-Plots.

qqPlot(TrailSpring$volume, main= "Spring Trail Users")

qqPlot(TrailSummer$volume, main= "Summer Trail Users")

We can see that the data is approximately normal. There are a few outliers in the spring data, but the majority of the points are between the dotted lines. We will continue with the independent samples t-test.

Test Statistic df P-value Alternative Hypothesis
-1.7878 61.935 0.07871 two.sided

There is insufficient evidence to reject the null hypothesis \((p= 0.079 > a)\).

Interpretation

The data from the line graph looked like there might be approximately the same amount of people using the trail in the summer and the summer, and we showed that that is the case with our independent samples t-test. We failed to reject the null hypothesis, therefore there is not a big enough differnce in the number of people using the trail in the spring versus the summer to worry about extra advertisement from the PVPC.