Analysis of BayWheels Public Bikeshare Data on Google Cloud Platform

The analysis is performed with SQL on BigQuery and AI Platform to answer the following questions regarding BayWheels:

Initial EDA

Feature Selection

First, selected features of the bikeshare_trips data was extracted, modified as new fields, and stored as a view named initial. Obvious data cleaning was also carried out. On the pricing page of the Lyft Bay Wheels website, it mentions that single rides are charged \$ 2 for one ride up to 30 minutes. To filter data entry errors, duration_sec > 0 was used.

Based on the query result for part 1 question 1, we have found that the average ride duration for a customer is 538% longer than that by the subscriber. However, this result is dubious, as we can seen from the our initial query, the longer rides are mostly made by customers, and are unrepresentative of regular rides. The rides ranges from a few days to a ride longer than 6 months. Clearly, rides longer than a single day represent trips when the user has not returned the bikes to any stations.

In the following query we attempt to find a duration of usage that is representative of usage by most users, since averaging duration would lead to significant skew by longer usetime. The query result shows that the number of trips above 12 to 13 hours of usage cease to represent a countable portion of the data set (rounded to zero).

Due to the above reason, we modify the initial query to the following for a more robust data set that is representative of most users.

This question is relatively easy to answer. From the initial query, we can find trips that are not in the weekends, start stations are different from end stations, within rush hours, and should be a subscriber. Here we begin by assuming that "commuter trips" suggests that the user travels on this route daily during the weekday, and thus should be a subscriber. It is possible that there are users who commute that are not subscribers, but the number of trips should be very little, as the "customers" contributes to much less trips.

In fact, without eliminating the trips that are contributed by customers, the top ranking trips are all subscriber trips because the count for customer trips are too low to be in the top 5.

What then, are the top trips or areas of bike use by customers?

It looks like the top trips during the weekday by customers are connecting trips from other methods of travel, notably by ferry. Could it be that "commuter trips" made by customers, despite being relative rare, are mostly made by users living across the bay traveling in for work? Let us investigate.

Based on the query above, we find that there are indeed some a small percentage of customer trips amongst these trips that are made in the morning and in the afternoon. However, we note that customer trips does peak in the afternoon, suggesting increased usage after business hours. Whereas we can see that most of the subscriber trips are made during the morning and afternoon period, validating our previous assumption.

In conclusion, the 5 most popular commuter trips are in this following table, where most of the count are contibuted by actual commuting trips.

On the other hand, the 5 most popular commuter trips made by customers are in the following table, where commuting trips represents most likely a small portion of the trips.

In the above plot we observe the top 10 trips made by subscribers and customers, and their individual counts.

What are your recommendations for offers?

To make reasonable recommendation, we must first understand the revenue steam generated as a result of the rides made by each subscriber type. Based on the pricing page on BayWheels, the customers generate revenue on a per ride basis, and subsciber generate a base revenue of \$15 per month, \\$149 per year, \$5 per year, \\$10 per day based on their subscription plan.

Given our limited data without user information, estimation will be of limited use. However, we can calculate on average how much revenue each customer is generating per ride, and on average how much the subscribers are generating per ride in addition to their flat fees. Depending on what membership these subscribers have (monthly, annual, bikeshare for all, access pass), the true revenue generated per subscriber can vary greatly. We also assume that BayWheels generate more revenue on average on their subscribers.

We see that the customers generate on average \$7.55 per ride, whereas usual members generate only \\$0.07 per ride. This tell us very little about the true revenue generated by subscribers, but gives a true average of the revenue generated per ride for customers. In general adjusting pricing should be a last resort as it either reduce our revenue, or drives away users. Without further investigation with more data and user information, pricing adjustment will more likely lead to these situations. Nonetheless, it gives us a good sense of how much revenue is generated.

Another direction to tackle this problem can be found in the other table bikeshare table bigquery-public-data.san_francisco.bikeshare_status, which has records of the number of bikes available at each station overtime. With the following query we find the stations that are most often empty during commute hours.

Of these most often empty bike stations, we can identify those that are most often used with a JOIN query on our initial query.

In the above bar chart we identify the top 10 most often empty stations that are also most used, based on data in bigquery-public-data.san_francisco.bikeshare_status. If we compare this chart with the top subscriber commuter trips shown below, we observe that only Market at Sansome and Market at 10th are not trips in the top 10 popular start stations. Most likely there are times when users want to use the bikes when there are none for these stations.

The reccomendations for the offers made here is thus the following:

To divert customers who are not on their commute trips during commute hours, set up small stations close to the most popular commute start stations such as San Francisco Caltrain (Townsend at 4th) and San Francisco Caltrain 2 (330 Townsend). Offer limited time deals to some customers when they look for bikes at these start stations if they divert to using the small stations near by. Since there customers might not be on a commute trip, they are more likely to make the effort to get to these stations and receive the deals. On the other hand subscribers will have more bikes to use at the popular sations for their timely commute trips. This plan serves as a "stress releif" for high density areas during rush hours. Despite the up front cost and the reduced revenue due to deals, the service will ultimately generate more revenue when it accomodates more users.