pyramid plot

 Atmajit

Introduction:

In the current post i will describe one of the easier ways to generate a pyramid plot. The first time i came across a pyramid plot was on New York Times website. The NY times visualization used data from the American Cancer society to show new cases of cancer in 2007. The visualization can be viewed here.

I have also seen census bureau use pyramid plots to display distribution of population by age. Flowing data website used an animated pyramid plot to show prevalence of obesity in USA here

Packages:

We will require to download 2 different packages. The plotrix package will allow us to generate the pyramid plot. The tidyverse package is used to generate additional columns as well as data manipulation. Note that data is never in the form we require it to be.

#install.packages(c("plotrix","tidyverse") #should be ran only once
library(plotrix)
library(tidyverse)

Data:

The data for the pyramid plot was downloaded from American Cancer Society. Both the files used in the code can be found:

Data Cleaning:

Once the two files are downloaded we will read the data using the read.delim( ) function. It should be noted that i have used only a sample of the data since it is easier to display 9 cancer types compared to 55 different types of cancer in one single visualization.

I have retained all 55 categories of cancer types for the death.txt file to show how dplyr can be used to clean the data. The new tidyverse packages loads all the functions of dplyr and hence we do not need to load dplyr package anymore.

incidence= read.delim("incidence.txt" , stringsAsFactors = FALSE, na.strings = "n/a")
death=read.delim("death.txt",stringsAsFactors = FALSE, na.strings = "n/a")
incidence=incidence[,-2]
death= death[,-2]
incidence[is.na(incidence)]=0
death[is.na(death)]=0
colnames(incidence)= c("type","female","male")
colnames(death)= c("type","female","male")

To clean the data we need to go a step further. We know that we only need data from death data for just 9 types of cancer. We do this by using the inner_join( ) function. To learn more about this function type ?inner_join() in R console window.

Once we clean the data we create additional data fields using mutate( ) function from dplyr package. This is required as we like to plot the data inside the plotting margin window.

data= inner_join(incidence, death, by=c("type"))
colnames(data)= c("type", "in.female","in.male","de.female","de.male")
data= mutate(data, in.f= in.female/1000,
                    in.m = in.male/1000,
                    d.f=de.female/1000,
                    d.m= de.male/1000)

Plot

Finally, We generate the plot using the pyramid.plot( ) function of plotrix package. Note that there are other packages that would assist in generating the pyramid plot.

The types variable is created to label the plot.The first two arguments of the pyramid.plot() function are the data to be plotted on the left side and the right side of the plot. The laxlab and raxlab arguments allow to label the data on the left and right side. The gap argument allows to create gap between the left and right plots. We can play around with this argument to fit the labels well within the left and right plot.

We need to plot 2 sets of data for males and females. This can be achieved by using the add=TRUE argument to overlay the incidence data with the number of deaths. However, we have also passed a space argument to make the plot look similar to the one in New York Times article.

types= c("Breast","Esophagus","Kidney","Leukemia","Liver","Lung","Lymphoma","Ovary","Pancreas","Prostate")
pyramid.plot(data$in.f,data$in.m,
             laxlab= c(0,50,100,150,200,250),
             raxlab=c(0,50,100,150,200),
             top.labels=c("Female","Types of Cancer","Male"),labels=types,
             gap  =25, labelcex = .8, unit="$ in 000's",lxcol="#edf8e9", rxcol="#f2f0f7")
## [1] 5.1 4.1 4.1 2.1
pyramid.plot(data$d.f,data$d.m,
             laxlab= c(0,50,100,150,200,250),
             raxlab=c(0,50,100,150,200),
             top.labels=c("Female","","Male"),labels=types,space= 0.4,
             gap  =25, labelcex = 1, unit="",lxcol="#74c476", rxcol="#9e9ac8", add=TRUE)

pyramidplot

## [1] 4 2 4 2

I have only provided explanation for the most essential arguments used in pyramid.plot() function. In order to learn more about the function type ?pyramid.plot() function in R console window. The plot is missing legends which is essential. The legends can be added using the legend() function.

In order to make your plot look like the one in the NYtimes export the plot as a jpeg image and use your favorite editor to add text or labels. It is much easier to do this outside R.

The following is the entire code used to generate the plot.

#install.packages(c("plotrix","tidyverse") #should be ran only once
library(plotrix)
library(tidyverse)
incidence= read.delim("incidence.txt" , stringsAsFactors = FALSE, na.strings = "n/a")
death=read.delim("death.txt",stringsAsFactors = FALSE, na.strings = "n/a")
incidence=incidence[,-2]
death= death[,-2]
incidence[is.na(incidence)]=0
death[is.na(death)]=0
colnames(incidence)= c("type","female","male")
colnames(death)= c("type","female","male")
data= inner_join(incidence, death, by=c("type"))
colnames(data)= c("type", "in.female","in.male","de.female","de.male")
data= mutate(data, in.f=in.female/1000,
                    in.m = in.male/1000,
                    d.f=de.female/1000,
                    d.m= de.male/1000)

types= c("Breast","Esophagus","Kidney","Leukemia","Liver","Lung","Lymphoma","Ovary","Pancreas","Prostate")

pyramid.plot(data$in.f,data$in.m,
             laxlab= c(0,50,100,150,200,250),
             raxlab=c(0,50,100,150,200),
             top.labels=c("Female","Types of Cancer","Male"),labels=types,
             gap  =25, labelcex = .8, unit="$ in 000's",lxcol="#edf8e9", rxcol="#f2f0f7")

pyramid.plot(data$d.f,data$d.m,
             laxlab= c(0,50,100,150,200,250),
             raxlab=c(0,50,100,150,200),
             top.labels=c("Female","","Male"),labels=types,space= 0.4,
             gap  =25, labelcex = 1, unit="",lxcol="#74c476", rxcol="#9e9ac8", add=TRUE)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s