Data Visualization with Lets Plot

Published

May 1, 2020

Introduction

“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey

This chapter introudces data visualization with the Lets Plot library. We briefly discus the grammar of graphics which is a useful paradigm for understanding the fundamentals of building graphs. Then we introudce the basics of Lets Plot and provide resources for further development.

Prerequisites

Lets Plot Install

For a slightly more interesting introduction, let’s look at the penguins dataset built into the lets_plot data library. This dataset contains measurements on species of penguins.

::: {.callout-caution, collapse=“false”} #### Caution R has a package that uses ggplot and there is also another Python library plotnine that uses the same ggplot syntax. If you don’t specify lets_plot when using ChatGPT, you may be led down the wrong code path. It is recommended you only use the ChatGPT that was designed for this course. :::

First, load the data by calling load_penguins from the palmerpenguins library as follows:

import pandas as pd
import numpy as np
from lets_plot import *
from palmerpenguins import load_penguins

LetsPlot.setup_html()

data = load_penguins()

To explore the relationship between bill length and bill depth, we can start with a basic scatterplot.

# sometime you have to run this cell twice to get the plot to show
(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm'), color='blue') 
    + ggtitle("The relationship between bill length and bill depth")
)
354045505560131415161718192021The relationship between bill length and bill depthbill_depth_mmbill_length_mm

When all species are lumped together in this scatterplot it doesn’t look like there is much of a relationship between the bill length and bill depth. We can improve our visualization by coloring the points based on species.

(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species')) 
    + ggtitle("The relationship between bill length and bill depth by species")
)
354045505560131415161718192021The relationship between bill length and bill depth by speciesbill_depth_mmbill_length_mmspeciesAdelieGentooChinstrap

We can also easily change the shape and size of the points in a scatterplot.

(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm')) 
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)
354045505560131415161718192021The relationship between bill length and bill depth by species with flipper length as sizebill_depth_mmbill_length_mmflipper_length_mm180190200210220230speciesAdelieGentooChinstrap

Now that’s going overboard! But hopefully the basic syntax for using lets_plot makes sense.

Making Lets-Plot More Presentable

In this section, we look at how to customize charts to be more informative and presentable. For example, column names in a dataset are rarely a good idea to present to someone not as intimately familiar with the data as you. We may also wish to highlight certain points, or draw attention to areas on a graph.

To begin, let’s return to a reasonable visualization of the penguins data. We will start by naming our ggplot chart object “plot” and changing the X and Y axis labels.

The way is to include the labs function in the plot inputs. The arguments in this function are the axis names and their desired labels.

plot = (
    ggplot(data)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
    + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
)

plot
354045505560131415161718192021The relationship between bill length and bill depth by species with flipper length as sizeBill Depth (mm)Bill Length (mm)Flipper Length (mm)180190200210220230SpeciesAdelieGentooChinstrap

The next method makes the same adjustments but modifies the chart “post hoc”. Start with a very basic chart, “plot” and use the + operator to add layers and modify titles.

plot = (
    ggplot(data)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)

plot = plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
plot
354045505560131415161718192021The relationship between bill length and bill depth by species with flipper length as sizeBill Depth (mm)Bill Length (mm)Flipper Length (mm)180190200210220230SpeciesAdelieGentooChinstrap


The 2 approaches above have the same outcome, but the latter example introduces a flexible lets_plot paradigm that can be extended to other chart additions and modifications. For example, if we want to add a reference line to “plot”, we can use the geom_hline() function.

plot = plot + geom_hline(yintercept=7, linetype='dotted', color='black')
plot
354045505560810121416182022The relationship between bill length and bill depth by species with flipper length as sizeBill Depth (mm)Bill Length (mm)Flipper Length (mm)180190200210220230SpeciesAdelieGentooChinstrap

We can add several different shapes, including circles, lines or rectanges using the .add_shape() method. This method specifies what type of shape to add to the graph given a set of coordinates (x0, x1, y0, y1). “.add_shape()” can be used to draw reference lines as well, but still requires a all 4 coordinates.

Here is a simple example first:

# Adds a horizontal line
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()

# Sample data with correct data types
data = {'x': [1.0, 2.0, 3.0, 4.0, 5.0], 'y': [5.0, 4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(data)

# Ensure data types are correct
df['x'] = df['x'].astype(float)
df['y'] = df['y'].astype(float)

# Create scatter plot
plot = ggplot(df, aes('x', 'y')) + geom_point()

# Add a rectangle shape using direct numeric values
plot += geom_rect(xmin=2.0, xmax=4.0, ymin=1.0, ymax=3.0, fill='red', alpha=0.3)

# Display the plot
plot.show()
11.522.533.544.5511.522.533.544.55yx

Now here is it applied to our penguin data:

# Load the penguins dataset
penguins = load_penguins()

# Create scatter plot
plot = (
    ggplot(penguins)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)

# Add labels
plot = plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')

# Add a rectangle shape using direct numeric values
plot += geom_rect(xmin=33.0, xmax=45.0, ymin=16.0, ymax=22.0, fill='red', alpha=0.3)

# Display the plot
plot
3540455055601416182022The relationship between bill length and bill depth by species with flipper length as sizeBill Depth (mm)Bill Length (mm)Flipper Length (mm)180190200210220230SpeciesAdelieGentooChinstrap

The code above introduces the geom_rect features, it is not always necessary for every situation. But hopefully this gives a flavor of what can be done.

Notice also, that the geom_ methods actually update plot. No need to overwrite the original object or create a new chart object for each modification.

Other geom_*itions

There are several other useful “post hoc” graph modifications that we only mention in this chapter. For further exploration, see Lets-Plot Documentation.

Use geom_text() to add text annotations at specific locations. Update axes by using scale_x_continuous() or scale_y_continuous() which allows you to modify gridlines and add units of measure like “%” or “$”. geom_vline() and geom_hline() allow you to add vertical or horizontal reference lines. The possibilities are almost endless!

Resources

This page has introduced the basics of lets_plot, but we have only looked at a scatterplot. For links to further documentation and a whole gallery of lets_plot possibilities, see Lets-Plot Documentation.

Back to top