2019 Yearly Analysis

Preparing Data for Analysis:

In order to prepare our data for analysis, we are going to load all our data in the form of lists. We are going to do the same by providing the paths to all the data and binding them together using rbind() function. The data for the financial year is now stored in the the variable Trips.

library("readxl")
library("ggplot2")
library("dygraphs")
setwd("D:/Case Study")
paths2019 <- vector()
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q1.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q2.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q3.xlsx", sep = ""))
paths2019 <- c(paths2019, paste(getwd(),"/Yearly_Data/2019/Quarterly/Divvy_Trips_2019_Q4.xlsx", sep = ""))
AllPaths <- c(paths2019)
Trips <- list()
for(i in AllPaths)
{
  for(Mypath in i)
  { 
    Trips <- rbind(Trips,read_excel(Mypath, col_types = c("guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","numeric")))
  }
}

Summary of Data:

summary(Trips)
##     trip_id           start_time                     end_time                  
##  Min.   :21742443   Min.   :2019-01-01 00:04:37   Min.   :2019-01-01 00:11:07  
##  1st Qu.:22681250   1st Qu.:2019-05-16 07:32:01   1st Qu.:2019-05-16 07:45:38  
##  Median :23673970   Median :2019-07-10 21:21:24   Median :2019-07-10 21:47:31  
##  Mean   :23737432   Mean   :2019-07-10 01:14:10   Mean   :2019-07-10 01:38:11  
##  3rd Qu.:24515363   3rd Qu.:2019-08-21 22:30:55   3rd Qu.:2019-08-21 23:05:58  
##  Max.   :25962904   Max.   :2019-12-31 23:57:17   Max.   :2020-01-21 13:54:35  
##                                                                                
##      bikeid      tripduration      from_station_id from_station_name 
##  Min.   :   1   Min.   :      61   Min.   :  1.0   Length:3166273    
##  1st Qu.:1730   1st Qu.:     405   1st Qu.: 77.0   Class :character  
##  Median :3457   Median :     698   Median :174.0   Mode  :character  
##  Mean   :3387   Mean   :    1441   Mean   :201.3                     
##  3rd Qu.:5061   3rd Qu.:    1264   3rd Qu.:289.0                     
##  Max.   :6946   Max.   :10628400   Max.   :673.0                     
##                                                                      
##  to_station_id   to_station_name      usertype            gender         
##  Min.   :  1.0   Length:3166273     Length:3166273     Length:3166273    
##  1st Qu.: 77.0   Class :character   Class :character   Class :character  
##  Median :174.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :202.1                                                           
##  3rd Qu.:290.0                                                           
##  Max.   :673.0                                                           
##                                                                          
##    birthyear     
##  Min.   :1759    
##  1st Qu.:1979    
##  Median :1987    
##  Mean   :1984    
##  3rd Qu.:1992    
##  Max.   :2014    
##  NA's   :434079

Analysis Metrics:

  • Customer-Sub Ratio
  • Gender Demographic
  • Day by Day

Customer-Sub Ratio:

Here we observe a minor drop in Customer-Sub Ratio. The Customer-Sub ratio seems to start plateauing around the 80 - 20 % mark. Which looking at the industry standard is a healthy Customer-Sub ratio.

UserTypeCol <- Trips$usertype
Subs <- 0
Customers <- 0

for(i in UserTypeCol)
  if(i == "Subscriber")
    Subs <- Subs + 1 else
      Customers <- Customers + 1
pie(c(Subs,Customers),label = c(paste("Subscribers = ", round(Subs*100/(Subs + Customers), 2), "%"), paste("Customers = ", round(Customers*100/(Subs + Customers), 2), "%")))

Sub_Percentage <- c(59.14, 77.41, 80, 81.69, 77.88)
Years <- c(2015, 2016, 2017, 2018, 2019)

ggplot(mapping = aes(x=Years, y=Sub_Percentage)) +
  geom_area( fill="#69b3a2", alpha=0.4) +
    geom_line(color="#69b3a2", size=2) +
      geom_point(size=3, color="#69b3a2")

Gender Demographic:

ggplot(data = Trips) +
  geom_bar(mapping = aes(x = gender, fill = usertype), stat = "count")

Day by Day:

Preparing data for day by day analysis: Here, we use the as.Date() method to create a new column named day storing the day the trip started. This allows us to analyze frequency of trips made on each day during the Financial Year 2019.

  Trips$day <- format(as.Date(Trips$start_time), "%A")

As we can observe, Customers are way more likely to use the bike rental service on weekends whereas subscribers are way more likely to use it on weekdays. This indicates that subscribers may be using the service for their commute to work/school/etc., whereas Customers are using the service as a recreational activity. Keeping this in mind we can provide weekend subscription plans for customers who just want to use the service for as a recreational activity. This would help in further increasing the Subscriber to Customer ratio.

ggplot(data = Trips) + 
  geom_bar(mapping = aes(x = day, fill = day), stat = "count") +
    coord_flip() +    
      facet_wrap(~usertype)