R Visualizations – ggplot2 (PART-1)
Type of visualization using ggplot2 and their implementations using R-language:
. There are 8 different categories of models you may construct plots.
- A) Correlation:- Scatterplot, Scatterplot With Encircling, Jitter Plot, Counts Chart, Bubble Plot, Animated Bubble Plot, Marginal Histogram /Boxplot, Correlogram.
- B) Deviation:- Diverging Bars, Diverging Lollipop Chart, Diverging Dot Plot, Area Chart.
- C) Ranking: -Ordered Bar Chart, Lollipop Chart, Dot Plot, Slope Chart, Dumbbell Plot.
- D) Distribution: -Histogram, Density Plot, Box Plot, Dot + Box Plot, Tufte Boxplot, Violin Plot, Population Pyramid.
- E) Composition: -Waffle Chart, Pie Chart, Treemap, Bar Chart.
- F) Change:- Time Series Plots, Stacked Area Chart, Calendar Heat Map, Slope Chart, Seasonal Plot.
- G) Groups:– Dendrogram, Clusters.
- H) Spatial: -Open Street Map, Google Road Map, Google Hybrid Map
- Correlation
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Correlation between two variables.
Scatterplot
In Data Analysis Scatterplot is the most frequently used plot. The scatterplot is used to understand the nature of the relationship between two variables.
library(ggplot2) # load the library
theme_set(theme_bw())
data(“midwest”, package = “ggplot2”)
# midwest <- read.csv(“Your datset path”) # data source
# Scatterplot
sample <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method=”loess”, se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) +
labs(subtitle=”Area Vs Population”, y=”Population”, x=”Area”, title=”Scatterplot”,
caption = “Source: midwest”)
plot(sample)
Scatterplot With Encircling
I would encircle some specific group of points in the chart so as to draw those particular cases. This is done by the geom_encircle() in ggalt package.
Set the dataset to a new data frame that contains only the rows. You can expand the plot so as to pass outside the points. The color and size (thickness) parameters are changeable.
library(ggplot2)
library(ggalt)
midwest_choose <- midwest[midwest$poptotal > 350000 &
midwest$poptotal <= 500000 &
midwest$area > 0.01 &
midwest$area < 0.1, ]
ggplot(midwest, aes(x=area, y=poptotal)) + geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method=”loess”, se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) +
geom_encircle(aes(x=area, y=poptotal), data=midwest_select, color=”red”, size=2, expand=0.08) +
labs(subtitle=”Area Vs Population”, y=”Population”, x=”Area”, title=”Scatterplot + Encircle”,
caption=”Source: midwest”)
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Jitter Plot
Plot city mileage (cty) vs highway mileage (hwy) .
# load package and data
library(ggplot2)
data(mpg, package=”ggplot2″) #
theme_set(theme_bw())
sample <- ggplot(mpg, aes(cty, hwy))
sample + geom_point() + geom_smooth(method=”lm”, se=F) +
labs(subtitle=”mpg: city vs highway mileage”, y=”hwy”, x=”cty”,title=”Scatterplot with overlapping points”,
caption=”Source: midwest”)
This scatterplot gives you a clear idea of how the city mileage (city) and highway mileage (hwy) is well correlated to each other.
dim(mpg)
library(ggplot2)
data(mpg, package=”ggplot2″)
theme_set(theme_bw())
sample <- ggplot(mpg, aes(cty, hwy))
sample + geom_jitter(width = .5, size=1) + labs(subtitle=”mpg: city vs highway mileage”,
y=”hwy”, x=”cty”, title=”Jittered Points”)
Counts Chart
counts chart is used to solve the problem of data points overlap. Increase in data points overlaps, increase in size of the circle.
library(ggplot2)
data(mpg, package=”ggplot2″)
theme_set(theme_bw())
sample <- ggplot(mpg, aes(cty, hwy))
sample + geom_count(col=”tomato3″, show.legend=F) +labs(subtitle=”mpg: city vs highway mileage”,
y=”hwy”, x=”cty”, title=”Counts Plot”)
Bubble plot
The bubble chart is used to understand the relationship within the underlying groups based on A Categorical variable and Another continuous variable.
Bubble charts are more suitable if you have Multi-Dimensional data like there are numeric data in X and Y form and categorical data in color form and numeric variable data in size.
library(ggplot2)
data(mpg, package=”ggplot2″)
sample_select <- mpg[mpg$manufacturer %in% c(“audi”, “ford”, “honda”, “hyundai”), ]
theme_set(theme_bw())
sample <- ggplot(sample_select, aes(displ, cty)) + labs(subtitle=”mpg: Displacement vs City Mileage”,
title=”Bubble chart”)
sample + geom_jitter(aes(col=manufacturer, size=hwy)) + geom_smooth(aes(col=manufacturer),
method=”lm”, se=F)
Animated Bubble chart
The gganimate package is used to implement an animated bubble chart.
Set the aes(frame) to the specific column on which you want to animate. Another procedure-related to the plot is the same. You can use gganimate() after the plot is constructed.
library(ggplot2)
library(gganimate)
library(gapminder)
theme_set(theme_bw())
sample <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, frame = year)) +
geom_point() +geom_smooth(aes(group = year), method = “lm”, show.legend = FALSE) +
facet_wrap(~continent, scales = “free”) + scale_x_log10()
gganimate(sample, interval=0.2)
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Marginal Histogram / Boxplot
The marginal histogram is used to show the relationship and the distribution in the same chart. The margins of the scatterplot, there is a histogram of the X and Y variables.
It is implemented using the ‘ggExtra’ package. you could select to draw a marginal boxplot and density plot by setting the respective type option.
library(ggplot2)
library(ggExtra)
data(mpg, package=”ggplot2″)
theme_set(theme_bw())
sample_select <- mpg[mpg$hwy >= 35 & mpg$cty > 27, ]
sample <- ggplot(sample_select, aes(cty, hwy)) + geom_count() + geom_smooth(method=”lm”, se=F)
ggMarginal(sample, type = “histogram”, fill=”transparent”)
ggMarginal(sample, type = “boxplot”, fill=”transparent”)
Correlogram
The correlogram is used to examine the correlation of multiple continuous variables. The ggcorrplot package is used to implement Correlogram.
library(ggplot2)
library(ggcorrplot)
data(mtcars)
corr_sample <- round(cor(mtcars), 1)
ggcorrplot(corr_sample, hc.order = TRUE, type = “lower”, lab = TRUE, lab_size = 3, method=”circle”,
colors = c(“tomato2”, “white”, “springgreen3″), title=”Correlogram of mtcars”,
ggtheme=theme_bw)
- Deviation
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Diverging bars
To handle both negative and positive values we used Diverging Bars. The geom_bar() function is used to implement the diverging bar. geom_bar() can be used to make a bar chart and a histogram.
geom_bar() has the stat set to count, i.e. when you provide just a continuous X variable it tries to plot a histogram.
To Plot a bar chart we provide Set stat=identity , and both x and y inside aes() .
library(ggplot2)
theme_set(theme_bw())
data(“mtcars”)
mtcars$`car name` <- rownames(mtcars)
mtcars$sample_z <- round((mtcars$mpg – mean(mtcars$mpg))/sd(mtcars$mpg), 2)
mtcars$sample_type <- ifelse(mtcars$sample_z < 0, “below”, “above”)
mtcars <- mtcars[order(mtcars$sample_z), ]
mtcars$`car name` <- factor(mtcars$`car name`, levels = mtcars$`car name`)
ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +
geom_bar(stat=’identity’, aes(fill=sample_type), width=.5) +
scale_fill_manual(name=”Mileage”, labels = c(“Above Average”, “Below Average”),
values = c(“above”=”#00ba38”, “below”=”#f8766d”)) + labs(subtitle=”Normalised mileage from ‘mtcars'”,
title= “Diverging Bars”) + coord_flip()
Diverging Lollipop Chart
Lollipop chart looks more modern and use geom_point and geom_segment instead of geom._bar .
library(ggplot2)
theme_set(theme_bw())
ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +
geom_point(stat=’identity’, fill=”black”, size=6) +geom_segment(aes(y = 0, x = `car name`,
yend = sample_z, xend = `car name`), color = “black”) + geom_text(color=”white”, size=2) +
labs(title=”Diverging Lollipop Chart”, subtitle=”Normalized mileage from ‘mtcars’: Lollipop”) +
ylim(-2.5, 2.5) +coord_flip()
Diverging Dot Plot
library(ggplot2)
theme_set(theme_bw())
ggplot(mtcars, aes(x=`car name`, y=sample_z, label=sample_z)) +
geom_point(stat=’identity’, aes(col=sample_type), size=6) +
scale_color_manual(name=”Mileage”,labels = c(“Above Average”, “Below Average”),
values = c(“above”=”#00ba38”, “below”=”#f8766d”)) + geom_text(color=”white”, size=2) +
labs(title=”Diverging Dot Plot”, subtitle=”Normalized mileage from ‘mtcars’: Dotplot”) + ylim(-2.5, 2.5) +
coord_flip()
Area Chart
Area charts are used to plot a particular metric. The geom_area() function is used to implements this chart.
library(ggplot2)
library(quantmod)
data(“economics”, package = “ggplot2”)
economics$sample_perc <- c(0, diff(economics$psavert)/economics$psavert[-length(economics$psavert)])
brks <- economics$date[seq(1, length(economics$date), 12)]
lbls <- lubridate::year(economics$date[seq(1, length(economics$date), 12)])
ggplot(economics[1:100, ], aes(date, sample_perc)) + geom_area() +
scale_x_date(breaks=brks, labels=lbls) + theme(axis.text.x = element_text(angle=90)) +
labs(title=”Area Chart”, subtitle = “Perc Returns for Personal Savings”,
y=”% Returns for Personal savings”, caption=”Source: economics”)
- Ranking
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Ordered Bar Chart
Sample_cty_mpg <- aggregate(mpg$cty, by=list(mpg$manufacturer), FUN=mean)
colnames(sample_cty_mpg) <- c(“make”, “mileage”)
sample_cty_mpg <- sample_cty_mpg[order(cty_mpg$mileage), ]
sample_cty_mpg$make <- factor(sample_cty_mpg$make, levels = sample_cty_mpg$make)
head(sample_cty_mpg, 4)
library(ggplot2)
theme_set(theme_bw())
ggplot(sample_cty_mpg, aes(x=make, y=mileage)) +geom_bar(stat=”identity”, width=.5, fill=”tomato3″) +
labs(title=”Ordered Bar Chart”, subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
Lollipop Chart
library(ggplot2)
theme_set(theme_bw())
ggplot(sample_cty_mpg, aes(x=make, y=mileage)) + geom_point(size=3) +
geom_segment(aes(x=make,xend=make,y=0,yend=mileage)) +
labs(title=”Lollipop Chart”, subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
Dot Plot
library(ggplot2)
library(scales)
theme_set(theme_classic())
ggplot(sample_cty_mpg, aes(x=make, y=mileage)) + geom_point(col=”tomato2″, size=3) +
geom_segment(aes(x=make,xend=make,y=min(mileage),
yend=max(mileage)),linetype=”dashed”,size=0.1) +
labs(title=”Dot Plot”, subtitle=”Make Vs Avg. Mileage”, caption=”source: mpg”) +coord_flip()
Slope Chart
library(ggplot2)
library(scales)
theme_set(theme_classic())
data_f <- read.csv(“https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv”)
colnames(data_f) <- c(“continent”, “1952”, “1957”)
left_label <- paste(data_f$continent, round(data_f$`1952`),sep=”, “)
right_label <- paste(data_f$continent, round(data_f$`1957`),sep=”, “)
data_f$class <- ifelse((data_f$`1957` – data_f$`1952`) < 0, “red”, “green”)
sample <- ggplot(data_f) + geom_segment(aes(x=1, xend=2, y=`1952`, yend=`1957`, col=class),
size=. 75, show.legend=F) + geom_vline(xintercept=1, linetype=”dashed”, size=.1) +
geom_vline(xintercept=2, linetype=”dashed”, size=.1) +
scale_color_manual(labels = c(“Up”, “Down”), values = c(“green”=”#00ba38”, “red”=”#f8766d”)) +
labs(x=””, y=”Mean GdpPerCap”) + xlim(.5, 2.5) + ylim(0,(1.1*(max(data_f$`1952`, data_f$`1957`))))
sample <- sample + geom_text(label=left_label, y=data_f$`1952`, x=rep(1, NROW(data_f)), hjust=1.1, size=3.5)
sample <- sample + geom_text(label=right_label, y= data_f$`1957`, x=rep(2, NROW(data_f)), hjust=-0.1, size=3.5)
sample <- sample + geom_text(label=”Time 1″, x=1, y=1.1*(max(data_f $`1952`, data_f $`1957`)), hjust=1.2, size=5)
sample <- sample + geom_text(label=”Time 2″, x=2, y=1.1*(max(data_f $`1952`, data_f $`1957`)), hjust=-0.1, size=5)
sample + theme(panel.background = element_blank(),panel.grid = element_blank(),
axis.ticks = element_blank(),axis.text.x = element_blank(),
panel.border = element_blank(),plot.margin = unit(c(1,2,1,2), “cm”))
For Free, Demo classes Call: 8605110150
Registration Link: Click Here!
Dumbbell Plot
library(ggplot2)
library(ggalt)
theme_set(theme_classic())
health_sample <- read.csv(“https://raw.githubusercontent.com/selva86/datasets/master/health.csv”)
health_sample$Area <- factor(health_sample $Area, levels=as.character(health_sample $Area))
gg_sample <- ggplot(health_sample, aes(x=pct_2013, xend=pct_2014, y=Area, group=Area)) +
geom_dumbbell(color=”#a3c4dc”,size=0.75,point.colour.l=”#0e668b”) +
scale_x_continuous(label=percent) + labs(x=NULL, y=NULL,title=”Dumbbell Chart”,
subtitle=”Pct Change: 2013 vs 2014″, caption=”Source: https://github.com/hrbrmstr/ggalt”) +
theme(plot.title = element_text(hjust=0.5, face=”bold”),plot.background=element_rect(fill=”#f7f7f7″),
panel.background=element_rect(fill=”#f7f7f7″), panel.grid.minor=element_blank(),
panel.grid.major.y=element_blank(), panel.grid.major.x=element_line(),
axis.ticks=element_blank(), legend.position=”top”, panel.border=element_blank())
plot(gg_sample)
Sample Plot:-
- Sample_Numbers<-table(mtcars$cyl,mtcars$gear)
barplot(Sample_Numbers,main=’Automobile cylinder number grouped by number of gears’,col=c(‘red’,’orange’,’steelblue’), legend=rownames(Sample_Numbers),xlab=’Number of Gears’,
ylab=’count’)
- hist(airquality$Temp,col=’steelblue’,main=’Maximum Daily Temperature’,xlab=’Temperature (degrees Fahrenheit)’)
- Sample_x<-rnorm(10,mean=rep(1:5,each=2),sd=0.7)
Sample_y<-rnorm(10,mean=rep(c(1,9),each=5),sd=0.1)
data<-data.frame(x=Sample_x,y=Sample_y)
set.seed(143)
data_Sample<-as.matrix(data)[sample(1:10),]
heatmap(data_Sample)
- with(subset(airquality,Month==9),plot(Wind,Ozone,col=’steelblue’,pch=20,cex=1.5))
title(‘Wind and Temperature in NYC in September of 1973’)
- sample_cars<-transform(sample_cars,cyl=factor(cyl))
class(sample_cars$cyl)
boxplot(mpg~cyl,sample_cars,xlab=’Number of Cylinders’,ylab=’miles per gallon’,main=’miles per gallon for varied cylinders in automobiles’,cex.main=1.2)
- corr_sample <- cor(sample_cars)
corrplot(corr_sample)
corrplot(corr_sample, method = ‘number’,type = “lower”)
- airquality %>%
group_by(Day) %>%
summarise(mean_wind = mean(Wind)) %>%
ggplot() +geom_area(aes(x = Day, y = mean_wind)) +
labs(title = “Area Chart of Average Wind per Day”,
subtitle = “using airquality data”,y = “Mean Wind”)
Author:-
Rahul Pund