2015 Yearly Analysis

The data for the year 2015 gives us a benchmark to base our analysis on as 2015 was the first year in business for the given company.

Preparing Data for Analysis:

In order to prepare our data, we are going to load all the data present in excel sheets in the form of lists. We are going to implement the same by using the rbind function.

Initially, we are going to add the paths to all of our data for the year 2015 load it into a variable named Trips

library("readxl")
library("ggplot2")
library("dygraphs")
setwd("D:/Case Study")
paths2015 <- vector()
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
AllPaths <- c(paths2015)
Trips <- list()
for(i in AllPaths)
{
  for(Mypath in i)
  { 
    Trips <- rbind(Trips,read_excel(Mypath, col_types = c("guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","numeric")))
  }
}

Summary of Data:

summary(Trips)
##     trip_id         start_time          end_time             bikeid    
##  Min.   :5943501   Length:3202266     Length:3202266     Min.   :   1  
##  1st Qu.:6113124   Class :character   Class :character   1st Qu.:1420  
##  Median :6280859   Mode  :character   Mode  :character   Median :2848  
##  Mean   :6279267                                         Mean   :2664  
##  3rd Qu.:6446639                                         3rd Qu.:3914  
##  Max.   :6611093                                         Max.   :4835  
##                                                                        
##   tripduration   from_station_id from_station_name  to_station_id  
##  Min.   :   60   Min.   :  2.0   Length:3202266     Min.   :  2.0  
##  1st Qu.:  498   1st Qu.: 73.0   Class :character   1st Qu.: 72.0  
##  Median :  871   Median :157.0   Mode  :character   Median :157.0  
##  Mean   : 1186   Mean   :173.9                      Mean   :174.3  
##  3rd Qu.: 1382   3rd Qu.:268.0                      3rd Qu.:268.0  
##  Max.   :85851   Max.   :511.0                      Max.   :511.0  
##                                                                    
##  to_station_name      usertype            gender            birthyear      
##  Length:3202266     Length:3202266     Length:3202266     Min.   :1899     
##  Class :character   Class :character   Class :character   1st Qu.:1975     
##  Mode  :character   Mode  :character   Mode  :character   Median :1983     
##                                                           Mean   :1980     
##                                                           3rd Qu.:1988     
##                                                           Max.   :1999     
##                                                           NA's   :1308480

Analysis Metrics:

  • Customer-Sub Ratio
  • Gender Demographic
  • Trip Duration

Customer-Sub Ratio:

The Customer-Sub Ratio metric enables us to understand how likely a customer is to subscribe to the bike rental service. This information is crucial in order to make data driven decisions and predictions.

The data collected here would be used as a benchmark in the upcoming years as a metric of performance.

UserTypeCol <- Trips$usertype
Subs <- 0
Customers <- 0

for(i in UserTypeCol)
  if(i == "Subscriber")
    Subs <- Subs + 1 else
      Customers <- Customers + 1
pie(c(Subs,Customers),label = c(paste("Subscribers = ", round(Subs*100/(Subs + Customers), 2), "%"), paste("Customers = ", round(Customers*100/(Subs + Customers), 2), "%")))

Gender Demographic:

Understanding the Gender demographic is key. This allows us to both understand who our target audience is and also if we are lacking in advertising to a specific gender. Here, as we can observe the customer base is male dominant. This could be a sign depicting lack of advertisement for female customers.

ggplot(data = Trips) +
  geom_bar(mapping = aes(x = gender, fill = usertype), stat = "count")