Divvy Stations Vs. Their Neighbors

How different is each station from its 5 nearest neighbors?

Why?

Find stations where the trip counts are influenced more by the routes its on than by its neighborhood.
Find areas near the Divvy service boundaries that would be good expansion points.
Find lowest traffic stations in high traffic neighborhoods.

Overview

For every station, find the five closest stations.
Compute an "Expected" value for the the number of daily trips for the station, based on the nearest neighbors.
Compute an "Unexpected Score", which is the percent difference between the Expected and actual number of trips per day.
Blue circles mean the station had more trips than you'd think given the 5 neighbors, brown circles means it was below.
A big brown circle means the station was well below its neighbors.
A small circle means that it was normal for its neighborhood, not that it had fewer trips.
More details below.

Details

Cleaning and Normalization
- Removed trips with average speed greater than 30mph.
- Weighted every trip as a fraction of trips that day. (A trip on a summer day with 9,000 trips total counts for less than a trip on a snowy day with 300 trips total).
- Divided every weighted trip total per station by the number of days the station was online. Prevents a station beating its neighbors by being online for longer.
Finding Nearest Neighbors
- Committed a cardinal sin and treated lat/long coordinates as rectangular. Distance = sqrt(x^2 + y^2).
Computing Expected value from neighbors
- Computed weighted average of the normalized trip count of the neighbors. Weighting was inverse distance (a station 4 blocks away would have half the influence of one 2 blocks away (1/4 vs. 1/2)).
- The displayed values define trips as a trip that departs from that station. I also calculated the results for arrivals, but the final picture looks about the same.
Visualization
- Uses Google maps with bike layer turned on. (All the green lines in the map are Google's bike lane data.)
- Circle size is proportional to ((actual - expected) / actual) trip counts (with all normalization described above).
- Committed another cardinal sin and used the above metric as the radius, not the area of the circle. So bigger circles look bigger than they should.
- Also, the brown circles are effectively on a different scale than the blue. A high count station can be 500% its expected value, but a low count station can be at most 100% below the expectation.

References

The code to compute the data used for circle sizes:
- Bitbucket Repo: https://bitbucket.org/pgroves/divvy-station-outliers
Data with intermediate computations (computed by the python script in the Bitbucket repo): http://divvy.petergroves.com/day_normalized.json
Google Maps
- Basic map: https://developers.google.com/maps/documentation/javascript/examples/map-simple
- Add the bike lanes: https://developers.google.com/maps/documentation/javascript/examples/layer-bicycling
- Draw circles: https://developers.google.com/maps/documentation/javascript/examples/circle-simple
Previous example of project where we left out every data point, built a geospatial model with the rest of the data, and then compared:
- Minsker, B. S., Groves, P., and Beckmann, D. Optimizing Long Term Monitoring at a BP Site Using Multi-Objective Optimization, American Society of Civil Engineers (ASCE) Environmental, Water Resources Institute (EWRI) World Water, Environmental Resources Congress 2005, and Related Symposia, Anchorage, AK, 2005.
More about me: http://petergroves.com