1 Multivariate data visualization

1.1 ggplot2

ggplot2 is based on the grammar of graphics, the idea that every plot can be built from key components: a data set, an aesthetics (aes), and a layer.

  • Data: The dataset to be visualized, typically provided as a data frame.

  • Aesthetics (aes): Mapping of variables in the data to visual properties such as x and y coordinates, color, size, shape, and more.

  • Layer: It takes the mapped data and display it as a representation of the data. Every layer consists of three important parts:

    1. Geometries (geoms): The type of plot or shapes to be drawn, such as points, lines, bars, and others.
    2. The statistical transformation that may compute new variables from the data and affect what of the data is displayed.
    3. The position adjustment that primarily determines where a piece of data is being displayed.



1.1.1 Installing ggplot2

install.packages("ggplot2")

1.1.2 Loading ggplot2

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3

1.1.3 Loading data: mpg

mpg includes information about the fuel economy of popular car models in 1999 and 2008, collected by the US Environmental Protection Agency, (http://fueleconomy.gov.)

data(mpg)
mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # ℹ 224 more rows

The variables are mostly self-explanatory:

  • cty and hwy record miles per gallon (mpg) for city and highway driving.

  • displ is the engine displacement in litres.

  • drv is the drivetrain: front wheel (f), rear wheel (r) or four wheel (4).

  • class is a categorical variable describing the “type” of car: two seater, SUV, compact, etc.



1.1.4 2D scatterplots, aesthetic attributes and faceting

ggplot(mpg, aes(x=cty, y=hwy)) +
  # to create a 2D scatterplot
  geom_point()

ggplot(mpg, aes(cty, hwy, colour = class)) + ### mapping class to color
  geom_point()

ggplot(mpg, aes(cty, hwy,color=class)) +
  geom_point() +
  facet_grid(year ~ drv) ### using facet to seperate data based on levels of the drv and year

1.1.4.1 Exericise

  1. Explore the 3-way relationship between displ, hwy and class by using the above syntax.



1.1.5 boxplots, histograms, 2D density and colors

ggplot(mpg, aes(drv, hwy)) + geom_boxplot(width=0.2)

ggplot(mpg, aes(displ)) + 
  geom_histogram(binwidth = 0.3) + 
  facet_wrap(~drv, ncol = 3)

ggplot(mpg, aes(cty, hwy)) +geom_density2d()

1.1.5.1 Change plot line colors

p=ggplot(mpg, aes(drv, hwy,color=drv)) + geom_boxplot(width=0.2)
p

It is also possible to change manually plot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
# Use custom color palettes
p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))

# Use brewer color palettes
p+scale_color_brewer(palette="Dark2")

# Use grey scale
p + scale_color_grey() 

h=ggplot(mpg, aes(displ, color = drv)) + 
  geom_histogram(binwidth = 0.3) + 
  facet_wrap(~drv, ncol = 3)
h+scale_color_brewer(palette="Set3")

ggplot(mpg, aes(cty, hwy)) +geom_density2d(color='#E69F00')

1.1.5.2 Change plot fill colors

p=ggplot(mpg, aes(drv, hwy,fill=drv)) + geom_boxplot(width=0.2)
p

It is also possible to change manually plot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes
# Use custom color palettes
p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))

# use brewer color palettes
p+scale_fill_brewer(palette="Dark2")

# Use grey scale
p + scale_fill_grey() 

h=ggplot(mpg, aes(displ, fill = drv)) + 
  geom_histogram(binwidth = 0.3) + 
  facet_wrap(~drv, ncol = 3)
h+scale_fill_brewer(palette="Set3")

1.1.5.3 Exericise

  1. Create boxplots/histograms using your farvorite colors/color palettes

You can find more about custom colors/ color palettes on https://r-graph-gallery.com/ggplot2-color.html.



1.2 plotly

plotly is a powerful library for creating interactive, web-based visualizations. With Plotly, users can create interactive plots like scatter plots, bar charts, heatmaps, and 3D plots.



1.2.1 Installing plotly

install.packages("plotly")

1.2.2 Loading plotly

library(plotly)
## Warning: package 'plotly' was built under R version 4.3.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout



1.2.3 3D scatter plot

fig <- plot_ly(mpg, x = ~cty, y = ~ hwy, z = ~displ, type = 'scatter3d',mode='markers',size=0.75, color = ~drv, colors = c("#999999", "#E69F00", "#56B4E9"))

fig



1.2.4 3D line plot with customed line color

count <- 3000

x <- c()
y <- c()
z <- c()
c <- c()

for (i in 1:count) {
  r <- i * (count - i)
  x <- c(x, r * cos(i / 30))
  y <- c(y, r * sin(i / 30))
  z <- c(z, i)
  c <- c(c, i)
}

data <- data.frame(x, y, z, c)

fig <- plot_ly(data, x = ~x, y = ~y, z = ~z, type = 'scatter3d', mode = 'lines',
        line = list(width = 4, color = ~c, colorscale = list(c(0,'#BA52ED'), c(1,'#FCB040'))))

fig



1.2.5 3D surface plot with kernel density estimation

kd <- with(mpg, MASS::kde2d(cty, hwy, n = 50))

fig <- plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% add_surface()

fig