Divvy data exploration project

A first glance at the Divvy data

authors: Peter Carbonetto, Gao Wang

Here, we will take a brief look at the data provided by Divvy.

We begin by loading a few packages, as well as some additional R functions implemented for this project.

In [1]:
library(data.table)
source("../code/functions.R")

Reading the data

Function read.divvy.data reads in the trip and station data from the Divvy CSV files. This function uses fread from the data.table package to quickly read in the data (it is much faster than read.table). This function also prepares the data, including the departure dates and times, so that they are easier to work with.

In [2]:
divvy <- read.divvy.data()
Reading station data from ../data/Divvy_Stations_2016_Q4.csv.
Reading trip data from ../data/Divvy_Trips_2016_Q1.csv.
Reading trip data from ../data/Divvy_Trips_2016_04.csv.
Reading trip data from ../data/Divvy_Trips_2016_05.csv.
Reading trip data from ../data/Divvy_Trips_2016_06.csv.
Reading trip data from ../data/Divvy_Trips_2016_Q3.csv.
Reading trip data from ../data/Divvy_Trips_2016_Q4.csv.
Preparing Divvy data for analysis in R.
Converting dates and times.

A first glance at the Divvy data

We have data on 581 Divvy stations across the city.

In [3]:
print(head(divvy$stations),row.names = FALSE)
                       name latitude longitude dpcapacity online_date
        2112 W Peterson Ave    41.99    -87.68         15   5/12/2015
              63rd St Beach    41.78    -87.58         23   4/20/2015
          900 W Harrison St    41.87    -87.65         19    8/6/2013
 Aberdeen St & Jackson Blvd    41.88    -87.65         15   6/21/2013
    Aberdeen St & Monroe St    41.88    -87.66         19   6/26/2013
   Ada St & Washington Blvd    41.88    -87.66         15  10/10/2013
In [4]:
nrow(divvy$stations)
581

We also have information about the >3 million trips taken on Divvy bikes in 2016.

In [5]:
print(head(divvy$trips),row.names = FALSE)
 trip_id           starttime bikeid tripduration from_station_id
 9080551 2016-03-31 23:53:00    155          841             344
 9080550 2016-03-31 23:46:00   4831          649             128
 9080549 2016-03-31 23:42:00   4232          210             350
 9080548 2016-03-31 23:37:00   3464         1045             303
 9080547 2016-03-31 23:33:00   1750          202             334
 9080546 2016-03-31 23:31:00   4302          638              67
             from_station_name to_station_id               to_station_name
 Ravenswood Ave & Lawrence Ave           458      Broadway & Thorndale Ave
       Damen Ave & Chicago Ave           213        Leavitt St & North Ave
     Ashland Ave & Chicago Ave           210     Ashland Ave & Division St
       Broadway & Cornelia Ave           458      Broadway & Thorndale Ave
   Lake Shore Dr & Belmont Ave           329 Lake Shore Dr & Diversey Pkwy
 Sheffield Ave & Fullerton Ave           304       Broadway & Waveland Ave
   usertype gender birthyear start.week start.day start.hour
 Subscriber   Male      1986         13  Thursday         23
 Subscriber   Male      1980         13  Thursday         23
 Subscriber   Male      1979         13  Thursday         23
 Subscriber   Male      1980         13  Thursday         23
 Subscriber   Male      1969         13  Thursday         23
 Subscriber   Male      1991         13  Thursday         23
In [6]:
nrow(divvy$trips)
3595383

Out of all the Divvy stations in Chicago, the one on Navy Pier (near the corner of Streeter and Grand) had the most activity by far.

In [7]:
departures <- table(divvy$trips$from_station_name)
as.matrix(head(sort(departures,decreasing = TRUE)))
Streeter Dr & Grand Ave90042
Lake Shore Dr & Monroe St51090
Theater on the Lake47927
Clinton St & Washington Blvd47125
Lake Shore Dr & North Blvd45754
Clinton St & Madison St41744

Divvy bikes at the University of Chicago

In subsequent analyses, we will also take a close look at the trip data for the main Divvy station on the University of Chicago campus. The Divvy bikes were rented almost 8,000 times in 2016 at this location.

In [8]:
sum(divvy$trips$from_station_name == "University Ave & 57th St",na.rm = TRUE)
7944

Session information

This is the version of Jupyter used to generate these results.

In [9]:
system("jupyter --version",intern = TRUE)
'4.3.0'

This is the version of R and the packages that were used to generate these results.

In [10]:
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.4

loaded via a namespace (and not attached):
 [1] R6_2.2.0        magrittr_1.5    IRdisplay_0.4.4 pbdZMQ_0.2-5   
 [5] tools_3.3.2     crayon_1.3.2    uuid_0.1-2      stringi_1.1.2  
 [9] IRkernel_0.7    jsonlite_1.5    stringr_1.2.0   digest_0.6.12  
[13] repr_0.12.0     evaluate_0.10.1

© 2017 Peter Carbonetto & Gao Wang

Exported from analysis/first-glance.ipynb committed by Peter Carbonetto on Wed Mar 7 03:16:30 2018 revision 11, c2d196c