Advertisements

How much did you walk this year?

final

The visual above shows the steps walked by me in 2017. So why do we need this ? Well a few reason:

  1. Its my data and i want to explore it
  2. I wanted to see how i can export data from iphone
  3. I wanted to see if i can use R to generate something i had seen on a website –

Now since we have got away with some of the obvious questions lets see how to do this in R. But wait, i did use inkscape a little (~ 1%) so its not entirely in R.

Background:

The size of circles shows the number of steps on a given day and over months. Since i am based on the east coast of USA the summer months shows more activity compared to winter months. Another interesting point to note is that i lived in the same neighborhood for 2017 and went to work to the same location, took the same subway from the same station hence the steps are almost the same on working days.

Data from Iphone:

The question is how to download the steps data from your Iphone to R? The answer to that is to download an app called “QS” . The QS App is free and allows us to download various health related data from our Iphone to a .csv format. The QS App also allows us to download daily or hourly data.

If we scroll through various options we come across distance which should be toggled on and then click “Create Table”. Once the app collects the data you can send it to your email. The data is sent as a .csv file.

Following is the image of distance data downloaded from my Iphone –

image of a csv file downloaded from Iphone
Image of the distance walked file downloaded from Iphone

The “Steps” column is the only column used for this visualization exercise. The size of each circle in our visual is the number of steps walked each day. 

Packages:

In order to create the visualization we will require following packages in R:

  • lubridate
  • dplyr
  • ggplot2

The lubridate package is used for date manipulation, dplyr for data cleaning and transformation. Finally the plot is genrated using the ggplot package functions. All the packages can be downloaded using the install.packages() function in R. 

install.packages(c("lubridate","ggplot2","dplyr"))

Load the packages in R:

To load all the installed packages we will use the library() function as follows:

library(lubridate)
library(dplyr)
library(ggplot2)

Data import and transformation:

The data downloaded from iphone is now ready to be imported in R. We will load the data in R using the read.csv() function. Also we will update the column names using the colnames() function to make it easier to understand the data they represent. Following code can be used to import data and update column names:

steps <- read.csv("Health Data.csv", stringsAsFactors = FALSE)
colnames(steps) <- c("Start","Finish","Distance","Steps")

We do observe that the Iphone data start and finish column contain both dates and time attribute which is bit different from what we are used to. For the current visualization we only need finish date column and we do not need time. Hence, we use some useful functions from lubridate library to extract only the date portion of the finish column.

First we have to coerce the finish column of the data to a date format. The dmy_hm() function from the lubridate library allows us to coerce the data of finish column to a date format.

steps$Finish <- dmy_hm(steps$Finish)

Next, we will use the mutate() function from dplyr package to create some additional columns such as mth, yr and dy.

steps %>% mutate(mth = month(steps$Finish,abbr=TRUE), yr = year(steps$Finish),dy = day(steps$Finish)) %>% 
          group_by(yr,mth,dy) %>% 
          summarise(Total_Steps= sum(Steps))-> steps_grp

We do need these additional columns as we would like to group the data by year, month and day. Note that the file we have used for this visual comprises of data for last 3 years and we only need one year data to generate our plot. Hence, grouping data will allow us to pick one year (i.e. 2017) to create our initial plot.

As shown in the code above , we will first group our data using the group_by() function and then summarize data using the summarise() function from dplyr package.

Note the use of sum() function within the summarise function as our data is hourly. We can avoid this step entirely if we download the daily data instead of hourly. Finally we will save this new summarised data as a new dataset called “steps_grp”. We have now successfully converted our hourly step data into a daily data.

Generating the plot:

The geom_point() function from ggplot2 library is used to generate our plot as follows:

ggplot(filter(steps_grp,yr==2017)) + geom_point(aes(x = mth, y = dy,size=Total_Steps, colour = factor(mth)),alpha =0.25)+
             scale_x_continuous(expand =c(0,0.5),breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),position ="top",
                                labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
            scale_y_reverse(breaks = seq(1,31,by=1))+
            scale_size_area(max_size = 20)+
            theme_minimal()+guides(colour=FALSE)+
            labs(x = "Month", y = "Days",size="Test Steps", title="Steps in 2017",caption="atmajitgohil.com", subtitle ="Iphone data")+
            theme(panel.background = element_rect(fill = "black"),
                  panel.grid.major = element_blank(),
                  panel.grid.minor = element_blank(),
                  plot.background = element_rect(fill = "black"),
                  axis.text = element_text(colour = "white"),
                  plot.title = element_text(colour="white"),
                  plot.subtitle = element_text(colour="white"),
                  plot.caption = element_text(colour="white"),
                  legend.background = element_rect(fill = "white")
                  )

The above code will generate the following plot.

Plot showing the steps walked in 2017

The code for this visualization may seem complicated if you have never used ggplot2 library. We will go over it step by step to make it easier to understand and replicate.

  1. ggplot() function is used to assign data for our plot and filter() function is used to filter only the 2017 data. Note that filter() function is a part of dplyr package but we can use it within the ggplot() to filter only the data we need to plot.
  2. geom_point() function is used to define the geometric shape of the plot. The aes() function is used to define the aesthetics of the plot, size of each point is defined using the size attribute, colour for each point is defined using the colour attribute. We want to size each point / circle based on the steps walked each day. Hence, the Steps columns in our data is assigned to size attribute.
  3. All the remaining functions from the ggplot2 library are added to add labels, and theme to our plot. The best way to understand their use is to refer to the ggplot2 library.

Now we have a plot which is close to finish. However, we would like the plot to remove the legend and move it to the top of the plot. Hence, we will export this plot as a.png format using the export button at the top of the plotting window. The exported plot can now be moved into inkscape for further beautification.

However, we would like to remove the legend from our preliminary plot and move it to the top of the plot. Hence, we will export the plot generated in R as .png format using the export button at the top of the plotting window. The exported plot can now be moved into inkscape (open source editing software) for further beautification.

Advertisements

One thought on “How much did you walk this year?

Add yours

Leave a Reply

WordPress.com.

Up ↑

%d bloggers like this: