2015 Yearly Analysis
The data for the year 2015 gives us a benchmark to base our analysis on as 2015 was the first year in business for the given company.
Preparing Data for Analysis:
In order to prepare our data, we are going to load all the data present in excel sheets in the form of lists. We are going to implement the same by using the rbind
function.
Initially, we are going to add the paths to all of our data for the year 2015 load it into a variable named Trips
library("readxl")
library("ggplot2")
library("dygraphs")
setwd("D:/Case Study")
<- vector()
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015, paste(getwd(),"/Yearly_Data/2015/Quarterly/Divvy_Trips_2015_07.xlsx", sep = ""))
paths2015 <- c(paths2015)
AllPaths <- list()
Trips for(i in AllPaths)
{for(Mypath in i)
{ <- rbind(Trips,read_excel(Mypath, col_types = c("guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","numeric")))
Trips
} }
Summary of Data:
summary(Trips)
## trip_id start_time end_time bikeid
## Min. :5943501 Length:3202266 Length:3202266 Min. : 1
## 1st Qu.:6113124 Class :character Class :character 1st Qu.:1420
## Median :6280859 Mode :character Mode :character Median :2848
## Mean :6279267 Mean :2664
## 3rd Qu.:6446639 3rd Qu.:3914
## Max. :6611093 Max. :4835
##
## tripduration from_station_id from_station_name to_station_id
## Min. : 60 Min. : 2.0 Length:3202266 Min. : 2.0
## 1st Qu.: 498 1st Qu.: 73.0 Class :character 1st Qu.: 72.0
## Median : 871 Median :157.0 Mode :character Median :157.0
## Mean : 1186 Mean :173.9 Mean :174.3
## 3rd Qu.: 1382 3rd Qu.:268.0 3rd Qu.:268.0
## Max. :85851 Max. :511.0 Max. :511.0
##
## to_station_name usertype gender birthyear
## Length:3202266 Length:3202266 Length:3202266 Min. :1899
## Class :character Class :character Class :character 1st Qu.:1975
## Mode :character Mode :character Mode :character Median :1983
## Mean :1980
## 3rd Qu.:1988
## Max. :1999
## NA's :1308480
Analysis Metrics:
- Customer-Sub Ratio
- Gender Demographic
- Trip Duration
Customer-Sub Ratio:
The Customer-Sub Ratio metric enables us to understand how likely a customer is to subscribe to the bike rental service. This information is crucial in order to make data driven decisions and predictions.
The data collected here would be used as a benchmark in the upcoming years as a metric of performance.
<- Trips$usertype
UserTypeCol <- 0
Subs <- 0
Customers
for(i in UserTypeCol)
if(i == "Subscriber")
<- Subs + 1 else
Subs <- Customers + 1
Customers pie(c(Subs,Customers),label = c(paste("Subscribers = ", round(Subs*100/(Subs + Customers), 2), "%"), paste("Customers = ", round(Customers*100/(Subs + Customers), 2), "%")))
Gender Demographic:
Understanding the Gender demographic is key. This allows us to both understand who our target audience is and also if we are lacking in advertising to a specific gender. Here, as we can observe the customer base is male dominant. This could be a sign depicting lack of advertisement for female customers.
ggplot(data = Trips) +
geom_bar(mapping = aes(x = gender, fill = usertype), stat = "count")