Difference between a Cyclistic member and a regular user – Across the previous 12 months in Chicago

INTRODUCTION

The Cyclistic bike share analysis is a project based on a fictional company in Chicago. As an analyst, the goal of this project is to maximize the number of annual memberships which would lead to the growth of the company. In order to achieve this, the team wants to design a new marketing strategy to convert casual riders into annual members. The company launched in 2016 and has since then grown a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Goal: converting casual riders into annual members of the bike-share product. In order to tackle this, these key steps would be followed:

a. Ask

b. Prepare

c. Process

d. Analyse

e. Share

f. Act

ASK

Business task: Growth of the company by increasing the number of members through converting casual riders to annual members. Exploring how casual users use Cyclistic bikes differently from annual members. Key Stakeholders: The Director of Marketing, The Manager, Marketing Analytics Team and the Executive Team.

PREPARE

The data being used is the Divvy datasets which has been made available by Motivate International Inc. under this license. Cyclisitc is a fictional company hence using this public data to explore the different types of riders and their behaviours. Riders’ personally identifiable information has been removed from the data due to data privacy policies. Viewing the data in spreadsheet for the first quick glance, it is seen that the datasets have 13 features namely; ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng and member_casual. Using Python for data cleaning The data being used is the previous 12 months of data that is from June 2020 to May 2021. This is to check for errors and duplicates in the data and also combining the individual csv files into a single data frame(GIANT CSV FILE😊). This is can also be done in R by stacking the csvs on top of each other using the syntax rbind(). Note that, in Tableau, these individual csv files having the same columns can be combined as a union to form one table. Visualizations can also be done in Python and also R. In another blog posts, I will explore those options. Since the data did not need colossal cleaning and manipulation, using tableau to prepare insight and for sharing.

PROCESS

For this process using Tableau for both visualizing and manipulating the data. In answering the objective, relevant questions were asked such as;

  1. What is the average riding times for both users?
  2. In which month records the highest number of rides for both users?
  3. What happens on weekdays, in terms of number of rides taken by both users?
  4. What times do we start both types of riders on the road?
  5. What is the percentage of members to casual riders?

Checking for duplicates in the dataset before visualizing:

IF { FIXED [Ride_Id], [Started_At], [Ended_At], 
[start_station_name], [end_station_name]: (COUNT([Union]))}>1 THEN 'DUPLICATES'
ELSE 'ALL GOOD' END

Finding the number of trips or rides

COUNT([Ride_Id])

Which percentage of this is for members. Finding ride ids that belong to just member users

IF [Member_Casual] = 'member' THEN [Ride_Id] END

Number of trips taken by members

COUNT([members_rid_ id])

Percentage of rides taken by members

[number_rides_by_members]/[number_of_trips]

The same thing is done for casual rides, in the first calculated field, replace member with casual to filter out. Average riding time The calculated field below gives the riding time.

DATETIME([Ended_At] - [Started_At])

In order to calculate for the average ride length; naming this field ride_length_copy

[Ended_At] - [Started_At]

Change the type of the calculated field above to custom data format using hh:mm:ss

DATETIME(AVG( [ride_length_copy]))

Extracting the month from the start time

DATENAME('month', [Started_At])

Extracting the weekdays from the start time

DATENAME('weekday',  [Started_At])

ANALYSE

This is based on the questions asked in the Process Stage What is the average riding times for both users? For average ride length by each type of customer, filter out the type. For example for casual rides, drag the member_casual to the filter pane and choose causal.

TOTAL

MEMBER

CASUAL

In which month records the highest number of rides for both users?

Using the months and number of trips. Remember the data is from June 2020 to May 2021.

TOTAL

MEMBER

CASUAL

The peak of bike rides are around the summer months with August recording the highest bike rides of 622,361. 332700 from members and 289661 for casual riders.

 

What happens on weekdays, in terms of number of rides taken by both users?

TOTAL

MEMBER

CASUAL

The line represents the trend of trips taken from Sunday to Saturday.

Saturday recorded the highest number of bike rides in the weekdays. Saturdays is the weekend and people might take bike rides for leisure. It can be seen that as the week progresses into the weekend the number of rides grows. that’s from Monday to Saturday.

Causal users tend to use take more bike rides as compared to members on the weekends and less on the weekdays.

Can it be said that, some members may be using the bikes as commute to work?

 

What is the percentage of members to casual riders?

TOTAL

MEMBER

CASUAL

What times do we start both types of rides on the road?

Members ride as early as 05:59:59 am as compared to casual riders who can start at 06:00:46am. Showing the table breakdowns below for months and days

Most of the months have rides starting as early as after midnight. Some people do appreciate midnight ride tour of the city after all.

Share

In this step, visualizing the findings as seen in the Analyse progress and bring them all together.

Decided to make a fun visualization to project the findings of the analysis. Click here to interact with the dashboard.

The heatmap in the above dashboard shows the number of trips completed in the days of each month. That is from Sunday to Saturday. Currently showing the total for both user types for the line chart and the heatmap. Once the parameter is changed to either of the user type, this visualization would have to represent the user type.

Saturdays in August recorded the highest number of rides for both users.

TOTAL

MEMBER

CASUAL

 

ACT

Conclusion

In this step, recommendations are provided based on the findings from the Analyse stage.

  1. Weekends are for leisures. It is seen that casual riders tend to ride more on the weekend. In this situation, providing promotions or discounts to engage the these users to subscribe or become members to take advantage of the discounts or promotions.
  2. Summer is for vacation, visiting family, unwinding from work and having great time. During the summer to maximize and engage both types of users, members should be giving a boost to ride more by providing some incentives as motivation and causals also giving some packages to enable then subscribe and enjoy more of the services offered by Cyclistic.
  3. Rides are for exercising and clearing the mind. User Statistics could be a way to capture both casual users and members. the start time from the analyze shows early risers from working out or for unwinding. Providing an app to show users their ride statistics encompassing the health benefits and calories burnt would be a good way to engage both users.

This Project is based on the Google Data Analytics Professional Certificate.