Tile / Grid Map of India – P2

This post is in continuation of the prior post where we saw the basic elements of constructing a tile map of India. In this post we will dive into the first 3 steps of this process described below:

 

  1. Get shapefile -> India shapefile
  2. Get the data -> literacy data
  3. Link your data to the shapefile ->  one in R after few data manipulation steps
  4. use the geogrid package to create a rectangle or a hexgon tile map

Get Shapefile :

We need a shapefile and we need to import the same in R. So let us go to google drive page and download all the files. We see that the folder contains 6 different shapefiles. The most relevant files are with the extension .csv and .shp.  Now copy and paste all these 6 files in a folder called india. We can call this folder with any name.

Now let us load all the libraries we will need:

library(ggplot2)
library(gridExtra)
library(viridis)
library(geogrid) # for the grid/tile map
library(dplyr)
library(stringr)

Now to load the shapefile in R simply run the following:

original_shapes <- read_polygons("india\\IND_adm1.shp") # shape file

This will load a shapefile in R. To access anypart of this file use  the name of R object followed by the @ sign as in -> original_shapes@. To get a quick view of all the columns in a shapefile type the following in R console window:

head(original_shapes@data)

Get the data:

Even though the shapefile has data its is the data related to shapefile. But we need to plot literacy data of India. So we need to connect the two pieces together. The shapefile in R along with the literacy data. The data can be downloaded here.

The data is sourced from Reserve Bank of India and comprises data for literacy rate for each census from 1951 to 2011(latest).  Now before we load the data in R note that Telangana as a state was created after 2011 census hence the data for that state is not present. But, the shapefile comprises of all the states including Telangana.  Also, note that the name of states in 2011 our data set do not exactly match the name of states in the shapefile. This is needed since we are trying to link our data with the shapefile.

df <- read.csv("literacy_rate.csv", stringsAsFactors = FALSE) %>% select(region,X2011)

The above line of code will import the data and only keep data for 2011. In order to run the code ensure that we set our working directory to the folder that contains the csv file.

Link your data to the shapefile:

Our data frame now contains 2 columns regions and X2011. We will first convert all the region names to title. Why do we do this? well a lot of times the census bureau data set contain names of states which are all uppercase and the shapefile has title case so as a general practice we try to match the name of states between shapefile and literacy data frame.

df[,1] <- sapply(df[,1],str_to_title)

The str_to_title() function from stringr package is handy function to conver the states name to title case. The following line tries to match the column name “NAME_1” in the shapefiles data to the our data frame.

original_shapes$NAME_1 %in% df$region

we will see about 8 mismatches i.e. FALSE. What this means is that there are 8 state names in shapefile which do not match the state names we have in our data frame. Following code will fix that :

df[which(df[,1]=="Andaman And Nicobar Islands"),1] <- "A & N Islands"
df[which(df[,1]=="Dadra And Nagar Haveli"),1] <- "D & N Haveli"
df[which(df[,1]=="Delhi"),1] <- "Delhi (UT)"
df[which(df[,1]=="Daman And Diu"),1] <- "Daman & Diu"
df[which(df[,1]=="Orissa"),1] <- "Odisha"
df[which(df[,1]=="Uttaranchal"),1] <- "Uttarakhand"
df[which(df[,1]=="Jammu And Kashmir"),1] <- "Jammu & Kashmir"

The above code may look intimidating but what we are doing here is the following:

  • Find the row which corresponds to for e.g.  Andaman And Nicobar Islands in our literacy data frame and
  • Assign it a new label A & N Islands.

If you explore the spatial data frame -> original_shapes@data carefully we will observe that the state names under NAME_1 column do not exactly match for 8 states in our literacy data.

Now if we run following code we should get all TRUE.

original_shapes$NAME_1 %in% df$region

Finally, we can link the data using the left_join() from dplyr package.

colnames(df) <- c("NAME_1", "y_2011") # updated column name to be more clear

original_shapes@data <- left_join(original_shapes@data, df , by="NAME_1")

if we run the head(original_shapes@data) command again we will see an additional column i.e. y_2011.  The left_join() fuction will give you an error if the state names between the two data frames are of different cases.

Use the geogrid package to create a rectangle or a hexagon tile map:

Before we plot a tile map lets us plot the choropleth map of India:

choropleth_map

This plot is generated using the following lines of code:

clean <- function(shape) {
 shape@data$id = rownames(shape@data)
 shape.points = fortify(shape, region="id")
 shape.df = merge(shape.points, shape@data, by="id")
}

result_df_raw <- clean(original_shapes)
rawplot <- ggplot(result_df_raw) +
 geom_polygon(aes(x = long, y = lat, fill = y_2011, group = group)) +
 coord_equal() +
 scale_fill_viridis() +
 guides(fill = FALSE) +
 theme_void()
rawplot

Now to generate a tile map use the following lines of code. I have generated tile map of India using geogrid author code as follows:

new_cells_reg <- calculate_grid(shape = original_shapes, grid_type = "regular", seed = 1)
resultreg <- assign_polygons(original_shapes, new_cells_reg)

result_df_reg <- clean(resultreg)

regplot <- ggplot(result_df_reg) +
 geom_polygon(aes(x = long, y = lat, fill = y_2011, group = group)) +
 geom_text(aes(V1, V2, label = ABB), size = 2, color = "white") +
 coord_equal() +
 scale_fill_viridis() +
 guides(fill = FALSE) +
 theme_void()

regplot

The calculate_grid() from the geogrid package is used create a grid or a tile map. The type argument in calculate_grid() can be changed to hexagon to plot a hexagon tile map instead of a rectangular tile map. The assign_polygon() from geogrid package is used to assign these rectangular polygons to our shapefile.

Next, we will use create a data frame result_df_reg using a clean(). Finally we use the ggplot package to plot the transformed polygon. The geom_polygon() creates the tile map and geom_text() adds labels.  We have used the label= ABB argument within the geom_text() function to label each tile with the abbreviation of the state.

Literacy rate of India
Tile map of india

Whats Next:

This tile map is still incomplete. To bring this to a state where we can use it to display the data we need to add colors which convey the meaning of underlying data, we need to add a legend that explains what each color represents.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

WordPress.com.

Up ↑

%d bloggers like this: