2019 Yearly Analysis
Preparing Data for Analysis:
In order to prepare our data for analysis, we are going to load all our data in the form of lists. We are going to do the same by providing the paths to all the data and binding them together using rbind()
function. The data for the financial year is now stored in the the variable Trips
.
library("readxl")
library("ggplot2")
library("dygraphs")
setwd("D:/Case Study")
<- vector()
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q1.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q2.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q3.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q4.xlsx", sep = ""))
paths2019 <- c(paths2019)
AllPaths <- list()
Trips for(i in AllPaths)
{for(Mypath in i)
{ <- rbind(Trips,read_excel(Mypath, col_types = c("guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","numeric")))
Trips
} }
Summary of Data:
summary(Trips)
## trip_id start_time end_time
## Min. :21742443 Min. :2019-01-01 00:04:37 Min. :2019-01-01 00:11:07
## 1st Qu.:22681250 1st Qu.:2019-05-16 07:32:01 1st Qu.:2019-05-16 07:45:38
## Median :23673970 Median :2019-07-10 21:21:24 Median :2019-07-10 21:47:31
## Mean :23737432 Mean :2019-07-10 01:14:10 Mean :2019-07-10 01:38:11
## 3rd Qu.:24515363 3rd Qu.:2019-08-21 22:30:55 3rd Qu.:2019-08-21 23:05:58
## Max. :25962904 Max. :2019-12-31 23:57:17 Max. :2020-01-21 13:54:35
##
## bikeid tripduration from_station_id from_station_name
## Min. : 1 Min. : 61 Min. : 1.0 Length:3166273
## 1st Qu.:1730 1st Qu.: 405 1st Qu.: 77.0 Class :character
## Median :3457 Median : 698 Median :174.0 Mode :character
## Mean :3387 Mean : 1441 Mean :201.3
## 3rd Qu.:5061 3rd Qu.: 1264 3rd Qu.:289.0
## Max. :6946 Max. :10628400 Max. :673.0
##
## to_station_id to_station_name usertype gender
## Min. : 1.0 Length:3166273 Length:3166273 Length:3166273
## 1st Qu.: 77.0 Class :character Class :character Class :character
## Median :174.0 Mode :character Mode :character Mode :character
## Mean :202.1
## 3rd Qu.:290.0
## Max. :673.0
##
## birthyear
## Min. :1759
## 1st Qu.:1979
## Median :1987
## Mean :1984
## 3rd Qu.:1992
## Max. :2014
## NA's :434079
Analysis Metrics:
- Customer-Sub Ratio
- Gender Demographic
- Day by Day
Customer-Sub Ratio:
Here we observe a minor drop in Customer-Sub Ratio. The Customer-Sub ratio seems to start plateauing around the 80 - 20 % mark. Which looking at the industry standard is a healthy Customer-Sub ratio.
<- Trips$usertype
UserTypeCol <- 0
Subs <- 0
Customers
for(i in UserTypeCol)
if(i == "Subscriber")
<- Subs + 1 else
Subs <- Customers + 1
Customers pie(c(Subs,Customers),label = c(paste("Subscribers = ", round(Subs*100/(Subs + Customers), 2), "%"), paste("Customers = ", round(Customers*100/(Subs + Customers), 2), "%")))
<- c(59.14, 77.41, 80, 81.69, 77.88)
Sub_Percentage <- c(2015, 2016, 2017, 2018, 2019)
Years
ggplot(mapping = aes(x=Years, y=Sub_Percentage)) +
geom_area( fill="#69b3a2", alpha=0.4) +
geom_line(color="#69b3a2", size=2) +
geom_point(size=3, color="#69b3a2")
Gender Demographic:
ggplot(data = Trips) +
geom_bar(mapping = aes(x = gender, fill = usertype), stat = "count")
Day by Day:
Preparing data for day by day analysis: Here, we use the as.Date()
method to create a new column named day
storing the day the trip started. This allows us to analyze frequency of trips made on each day during the Financial Year 2019.
$day <- format(as.Date(Trips$start_time), "%A") Trips
As we can observe, Customers are way more likely to use the bike rental service on weekends whereas subscribers are way more likely to use it on weekdays. This indicates that subscribers may be using the service for their commute to work/school/etc., whereas Customers are using the service as a recreational activity. Keeping this in mind we can provide weekend subscription plans for customers who just want to use the service for as a recreational activity. This would help in further increasing the Subscriber to Customer ratio.
ggplot(data = Trips) +
geom_bar(mapping = aes(x = day, fill = day), stat = "count") +
coord_flip() +
facet_wrap(~usertype)