Recreating a graph from The Economist
New year, new me(ans, medians and modes).
Here is one of the visualisations I made while taking a course on R. It’s basic—a recreation of one of the graphs from The Economist, looking at the correlation between corruption perceptions and HDI, using ggplot.
Set working directory
setwd("~/Data Visualization Project")
Load packages
library(ggplot2)
library(dplyr)
library(tidyverse)
library(readr)
- Import Economist data
df <- read_csv("Economist_Assignment_Data.csv")
- Preview data
head(df)
- Create first layer i.e. data source
layer1 <- ggplot(df,aes(x=CPI,y=HDI,color=Region))
- Create second layer i.e. visuals i.e. scatterplot, and change the shape and size of the points (check cheat sheet for possible shapes)
pl <- layer1 + geom_point(shape=1,size=4) + scale_shape(solid=FALSE)
pl
- Create a trendline
pl.trend <- pl + geom_smooth(aes(group=1),method='lm',formula=y~log(x),se=FALSE,color='red')
pl.trend
- Adding text labels to the points
pl.text <- pl.trend + geom_text(aes(label=Country))
pl.text
- This shows ALL the country names, and is unreadable. I didn't write this part of the code, but it's to only label a select subset of countries
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan", "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spain", "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France", "United States", "Germany", "Britain", "Barbados", "Norway", "Japan", "New Zealand", "Singapore")
pl.country <- pl.trend + geom_text(aes(label = Country), color = "gray20", data = subset(df, Country %in% pointsToLabel),check_overlap = TRUE)
- The aes label and color are clear. The rest of the arguments are selecting only the subset of countries that were in the list. And to avoid them overlapping with each other.
- Changing the y- and x-axis labels and breaks to match The Economist's.
pl.final <- pl.country + scale_x_continuous(name='Corruption Perceptions Index, 2011 (10=least corrupt)',limits=c(1,10),breaks=1:10) + scale_y_continuous(name='Human Development Index,2011 (1=Best)',limits=c(0.2,1),breaks=c(0,0.2,0.4,0.6,0.8,1)) + ggtitle('Corruption and Human Development') + theme_bw()
pl.final
This was fun. Many mistakes were made and the help() command is a lifesaver, but the end result was worth it. Still a few things missing, but nothing a little Photoshop can’t fix. :)