Data Visualization with Lets Plot

Published

May 1, 2020

Introduction

“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey

This chapter introudces data visualization with the Lets Plot library. We briefly discus the grammar of graphics which is a useful paradigm for understanding the fundamentals of building graphs. Then we introudce the basics of Lets Plot and provide resources for further development.

Prerequisites

Lets Plot Install

For a slightly more interesting introduction, let’s look at the penguins dataset built into the lets_plot data library. This dataset contains measurements on species of penguins.

::: {.callout-caution, collapse=“false”} #### Caution R has a package that uses ggplot and there is also another Python library plotnine that uses the same ggplot syntax. If you don’t specify lets_plot when using ChatGPT, you may be led down the wrong code path. It is recommended you only use the ChatGPT that was designed for this course. :::

First, load the data by calling load_penguins from the palmerpenguins library as follows:

import pandas as pd
import numpy as np
from lets_plot import *
from palmerpenguins import load_penguins

LetsPlot.setup_html()

data = load_penguins()

To explore the relationship between bill length and bill depth, we can start with a basic scatterplot.

# sometime you have to run this cell twice to get the plot to show
(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm'), color='blue') 
    + ggtitle("The relationship between bill length and bill depth")
)

When all species are lumped together in this scatterplot it doesn’t look like there is much of a relationship between the bill length and bill depth. We can improve our visualization by coloring the points based on species.

(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species')) 
    + ggtitle("The relationship between bill length and bill depth by species")
)

We can also easily change the shape and size of the points in a scatterplot.

(
    ggplot(data) 
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm')) 
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)

Now that’s going overboard! But hopefully the basic syntax for using lets_plot makes sense.

Making Lets-Plot More Presentable

In this section, we look at how to customize charts to be more informative and presentable. For example, column names in a dataset are rarely a good idea to present to someone not as intimately familiar with the data as you. We may also wish to highlight certain points, or draw attention to areas on a graph.

To begin, let’s return to a reasonable visualization of the penguins data. We will start by naming our ggplot chart object “plot” and changing the X and Y axis labels.

The way is to include the labs function in the plot inputs. The arguments in this function are the axis names and their desired labels.

plot = (
    ggplot(data)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
    + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
)

plot

The next method makes the same adjustments but modifies the chart “post hoc”. Start with a very basic chart, “plot” and use the + operator to add layers and modify titles.

plot = (
    ggplot(data)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)

plot = plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')
plot


The 2 approaches above have the same outcome, but the latter example introduces a flexible lets_plot paradigm that can be extended to other chart additions and modifications. For example, if we want to add a reference line to “plot”, we can use the geom_hline() function.

plot = plot + geom_hline(yintercept=7, linetype='dotted', color='black')
plot

We can add several different shapes, including circles, lines or rectanges using the .add_shape() method. This method specifies what type of shape to add to the graph given a set of coordinates (x0, x1, y0, y1). “.add_shape()” can be used to draw reference lines as well, but still requires a all 4 coordinates.

Here is a simple example first:

# Adds a horizontal line
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()

# Sample data with correct data types
data = {'x': [1.0, 2.0, 3.0, 4.0, 5.0], 'y': [5.0, 4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(data)

# Ensure data types are correct
df['x'] = df['x'].astype(float)
df['y'] = df['y'].astype(float)

# Create scatter plot
plot = ggplot(df, aes('x', 'y')) + geom_point()

# Add a rectangle shape using direct numeric values
plot += geom_rect(xmin=2.0, xmax=4.0, ymin=1.0, ymax=3.0, fill='red', alpha=0.3)

# Display the plot
plot.show()

Now here is it applied to our penguin data:

# Load the penguins dataset
penguins = load_penguins()

# Create scatter plot
plot = (
    ggplot(penguins)
    + geom_point(aes(x='bill_length_mm', y='bill_depth_mm', color='species', shape='species', size='flipper_length_mm'))
    + ggtitle("The relationship between bill length and bill depth by species with flipper length as size")
)

# Add labels
plot = plot + labs(x='Bill Length (mm)', y='Bill Depth (mm)', color='Species', shape='Species', size='Flipper Length (mm)')

# Add a rectangle shape using direct numeric values
plot += geom_rect(xmin=33.0, xmax=45.0, ymin=16.0, ymax=22.0, fill='red', alpha=0.3)

# Display the plot
plot

The code above introduces the geom_rect features, it is not always necessary for every situation. But hopefully this gives a flavor of what can be done.

Notice also, that the geom_ methods actually update plot. No need to overwrite the original object or create a new chart object for each modification.

Other geom_*itions

There are several other useful “post hoc” graph modifications that we only mention in this chapter. For further exploration, see Lets-Plot Documentation.

Use geom_text() to add text annotations at specific locations. Update axes by using scale_x_continuous() or scale_y_continuous() which allows you to modify gridlines and add units of measure like “%” or “$”. geom_vline() and geom_hline() allow you to add vertical or horizontal reference lines. The possibilities are almost endless!

Resources

This page has introduced the basics of lets_plot, but we have only looked at a scatterplot. For links to further documentation and a whole gallery of lets_plot possibilities, see Lets-Plot Documentation.

Back to top