Divvy Stations Vs. Their Neighbors
How different is each station from its 5 nearest neighbors?
Why?
- Find stations where the trip counts are influenced more by the
routes its on than by its neighborhood.
- Find areas near the Divvy service boundaries that would be good expansion
points.
- Find lowest traffic stations in high traffic neighborhoods.
Overview
- For every station, find the five closest stations.
- Compute an "Expected" value for the the number of daily trips for the station,
based on the nearest neighbors.
- Compute an "Unexpected Score", which is the percent difference
between the Expected and actual number of trips per day.
- Blue circles mean the station had more trips than you'd think
given the 5 neighbors, brown circles means it was below.
- A big brown circle means the station was well below its neighbors.
- A small circle means that it was normal for its neighborhood, not
that it had fewer trips.
- More details below.
Details
- Cleaning and Normalization
- Removed trips with average speed greater than 30mph.
- Weighted every trip as a fraction of trips that day. (A trip on a
summer day with 9,000 trips total counts for less than a trip on a snowy day
with 300 trips total).
- Divided every weighted trip total per station by the number of days the
station was online. Prevents a station beating its neighbors by being
online for longer.
- Finding Nearest Neighbors
- Committed a cardinal sin and treated lat/long coordinates as
rectangular. Distance = sqrt(x^2 + y^2).
- Computing Expected value from neighbors
- Computed weighted average of the normalized trip count of the
neighbors. Weighting was inverse distance (a station 4 blocks away
would have half the influence of one 2 blocks away (1/4 vs. 1/2)).
- The displayed values define trips as a trip that departs
from that station. I also calculated the results for arrivals, but the
final picture looks about the same.
- Visualization
- Uses Google maps with bike layer turned on. (All the green lines in
the map are Google's bike lane data.)
- Circle size is proportional to ((actual - expected) / actual) trip
counts (with all normalization described above).
- Committed another cardinal sin and used the above metric as the radius,
not the area of the circle. So bigger circles look bigger than they should.
- Also, the brown circles are effectively on a different scale than
the blue. A high count station can be 500% its expected value, but a
low count station can be at most 100% below the expectation.
References
- The code to compute the data used for circle sizes:
- Data with intermediate computations (computed by the python script in
the Bitbucket repo):
http://divvy.petergroves.com/day_normalized.json
- Google Maps
- Previous example of project where we left out every data point, built
a geospatial model with the rest of the data, and then compared:
-
Minsker, B. S., Groves, P., and Beckmann, D.
Optimizing Long Term Monitoring at a BP Site Using Multi-Objective Optimization,
American Society of Civil Engineers (ASCE) Environmental, Water Resources Institute (EWRI) World Water, Environmental Resources Congress
2005, and Related Symposia, Anchorage, AK, 2005.
- More about me: http://petergroves.com