2018 Yearly Analysis
Preparing Data for Analysis:
In order to prepare our data for analysis, we are going to load all our data in the form of lists. We are going to do the same by providing the paths to all the data and binding them together using rbind()
function. The data for the financial year is now stored in the the variable Trips
.
library("readxl")
library("ggplot2")
library("dygraphs")
setwd("D:/Case Study")
<- vector()
paths2018 <- c(paths2018, paste(getwd(),"/Yearly_Data/2018/Quarterly/Divvy_Trips_2018_Q1.xlsx", sep = ""))
paths2018 <- c(paths2018, paste(getwd(),"/Yearly_Data/2018/Quarterly/Divvy_Trips_2018_Q2.xlsx", sep = ""))
paths2018 <- c(paths2018, paste(getwd(),"/Yearly_Data/2018/Quarterly/Divvy_Trips_2018_Q3.xlsx", sep = ""))
paths2018 <- c(paths2018, paste(getwd(),"/Yearly_Data/2018/Quarterly/Divvy_Trips_2018_Q4.xlsx", sep = ""))
paths2018 <- c(paths2018)
AllPaths <- list()
Trips for(i in AllPaths)
{for(Mypath in i)
{<- rbind(Trips,read_excel(Mypath, col_types = c("guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","numeric")))
Trips
} }
Summary of Data:
summary(Trips)
## trip_id start_time end_time
## Min. :17536702 Min. :2018-01-01 00:12:00 Min. :2018-01-01 00:17:23
## 1st Qu.:18468202 1st Qu.:2018-05-17 10:41:59 1st Qu.:2018-05-17 11:02:33
## Median :19392132 Median :2018-07-08 15:51:59 Median :2018-07-08 16:27:38
## Mean :19489448 Mean :2018-07-08 10:20:20 Mean :2018-07-08 10:43:39
## 3rd Qu.:20289018 3rd Qu.:2018-08-21 17:12:03 3rd Qu.:2018-08-21 17:29:25
## Max. :21742442 Max. :2018-12-31 23:59:18 Max. :2019-01-06 02:36:16
##
## bikeid tripduration from_station_id from_station_name
## Min. : 1 Min. : 61 Min. : 2.0 Length:3126981
## 1st Qu.:1785 1st Qu.: 392 1st Qu.: 76.0 Class :character
## Median :3604 Median : 666 Median :165.0 Mode :character
## Mean :3489 Mean : 1399 Mean :189.2
## 3rd Qu.:5230 3rd Qu.: 1197 3rd Qu.:283.0
## Max. :6471 Max. :14336400 Max. :664.0
##
## to_station_id to_station_name usertype gender
## Min. : 2.0 Length:3126981 Length:3126981 Length:3126981
## 1st Qu.: 76.0 Class :character Class :character Class :character
## Median :165.0 Mode :character Mode :character Mode :character
## Mean :189.8
## 3rd Qu.:284.0
## Max. :664.0
##
## birthyear
## Min. :1895
## 1st Qu.:1977
## Median :1986
## Mean :1983
## 3rd Qu.:1991
## Max. :2005
## NA's :477663
Analysis Metrics:
- Customer-Sub Ratio
- Gender Demographic
- Month by Month
Customer-Sub Ratio:
The Customer-Sub ratio seems to start plateauing around the 80 - 20 % mark. Which looking at the industry standard is a healthy Customer-Sub ratio.
<- Trips$usertype
UserTypeCol <- 0
Subs <- 0
Customers
for(i in UserTypeCol)
if(i == "Subscriber")
<- Subs + 1 else
Subs <- Customers + 1
Customers pie(c(Subs,Customers),label = c(paste("Subscribers = ", round(Subs*100/(Subs + Customers), 2), "%"), paste("Customers = ", round(Customers*100/(Subs + Customers), 2), "%")))
Gender Demographic
The Gender Demographic seems to be a recurring problem every year. It would really be in the bike rental service’s best interest to advertise extensively to increase the sex ratio and further increase profits.
ggplot(data = Trips) +
geom_bar(mapping = aes(x = gender, fill = usertype), stat = "count")
Month by Month:
Before Looking at each month of the year, we need to change our date formats to enable us to use those metrics. Using the as.Date()
method, we create a new column named month
recording the month the trip started in.
$month <- format(as.Date(Trips$start_time), "%m") Trips
We can observe, the same pattern appears in this year’s analysis as well. People tend to make more trips using the bike rental service during the summer season which is consistent with the findings shown in the previous years discussing the possibility of advertising for summer season in order to drive profits.
<- as.data.frame(table(Trips$month))
xy2 $Var1 <- factor(xy2$Var1, levels = xy2$Var1[order(xy2$Freq)])
xy2
ggplot(xy2,aes(x=Var1, y = Freq, fill = Freq)) +
scale_fill_gradient(low = "yellow", high = "red") +
geom_bar(width=0.7, stat = "identity") +
coord_flip()