Data Visualization with seaborn
April 25, 2025
Graphs and charts let us explore and learn about the structure of the information we have in DataFrame
.
Good data visualizations make it easier to communicate our ideas and findings to other people.
A graphic should display as much information as it can, with the lowest possible cognitive strain to the viewer.
Strive for clarity.
Visualization is an iterative process.
category
or string
(object
in DataFrame
).float
.
astype('float')
to make a variable float
.For data visualization, integer
-type variables could be treated as either categorical or continuous, depending on the context of analysis.
If the values of an integer-type variable means an intensity or an order, the integer variable could be treated as continuous.
Otherwise, the integer variable is categorical.
We can use astype('int')
to make a variable int
.
From the plots with two or more variables, we want to see co-variation, the tendency for the values of two or more variables to vary together in a related way.
What type of co-variation occurs between variables?
A distribution of a variable
A relationship between two continuous variables (e.g., scatter plots, best fitting lines)
A time trend of a variable (e.g., line plots, best fitting lines and more)
Exploring how distributions, relationships, and time trends vary across categories can reveal important patterns and insights.
Please check out general tips on data visualization in the project page.
seaborn
seaborn
seaborn
is a Python data visualization library based on matplotlib
.
seaborn
Read the descriptions of variables in a DataFrame
if available.
Check the unit of an observation: Are all values in one single observation measured for:
DataFrame
longer/widerfloat
, int
, datetime64
, object
, category
).astype(DTYPE)
if needed.sns.histplot
, sns.scatterplot
)(x = , y = , hue = )
)FacetGrid(DATA, row = , col = ).map(GEOMETRIC_OBJECT, VARIABLES)
or col
/row
parameters in the function of geometric object)Pay attention to the unit of x
andy
axes.
Repeat the process until you are satisfied with the visualization output.
seaborn
DataFrame
s provided by the seaborn
library:titanic
and tips
DataFrames:sns.countplot()
data
: DataFrame.x
: Name of a categorical variable (column) in DataFramesns.countplot()
function to plot a bar charthue
: Name of a categorical variableWe can further break up the bars in the bar chart based on another categorical variable.
sns.histplot()
bins
: Number of binsbinwidth
: Width of each binsns.histplot()
function to plot a histogrambins
or binwidth
.
bins
/binwidth
bins
/binwidth
.sns.boxplot()
sns.boxplot()
function to plot a boxplot.sns.scatterplot()
and sns.lmplot()
x
: Name of a continuous variable on the horizontal axisy
: Name of a continuous variable on the vertical axisA scatter plot is used to display the relationship between two continuous variables.
We use sns.scatterplot()
function to plot a scatter plot.
To the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Suppose we are interested in the following question:
sns.lmplot()
adds a line that fits well into the scattered points.alpha
helps address many data points on the same location.
alpha
to number between 0 and 1.
alpha = 0
: Full transparencyalpha = 1
: No transparencyTo the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Using the fitted lines, let’s answer the following question:
sns.lineplot()
x
: Name of a continuous variable (often time variable) on the horizontal axisy
: Name of a continuous variable on the vertical axissns.lineplot()
draws a line by connecting the scattered points in order of the variable on the x-axis, so that it highlights exactly when changes occur.# New data
healthexp = (
sns.load_dataset("healthexp")
.sort_values(["Country", "Year"])
.query("Year <= 2020")
)
sns.lineplot(data = healthexp,
x = 'Year',
y = 'Life_Expectancy',
hue = 'Country')
.FacetGrid()
object with the data we will be using and define how it will be subset with the row
and col
arguments:.map()
method to run a plotting function on each of the subsets, passing along any necessary arguments.DataFrames
with seaborn
Let’s do Classwork 12!