Data Visualization with seaborn
April 25, 2025
Graphs and charts let us explore and learn about the structure of the information we have in DataFrame.
Good data visualizations make it easier to communicate our ideas and findings to other people.
A graphic should display as much information as it can, with the lowest possible cognitive strain to the viewer.
Strive for clarity.
Visualization is an iterative process.
category or string (object in DataFrame).float.
astype('float') to make a variable float.For data visualization, integer-type variables could be treated as either categorical or continuous, depending on the context of analysis.
If the values of an integer-type variable means an intensity or an order, the integer variable could be treated as continuous.
Otherwise, the integer variable is categorical.
We can use astype('int') to make a variable int.
From the plots with two or more variables, we want to see co-variation, the tendency for the values of two or more variables to vary together in a related way.
What type of co-variation occurs between variables?
A distribution of a variable
A relationship between two continuous variables (e.g., scatter plots, best fitting lines)
A time trend of a variable (e.g., line plots, best fitting lines and more)
Exploring how distributions, relationships, and time trends vary across categories can reveal important patterns and insights.
Please check out general tips on data visualization in the project page.
seabornseaborn
seaborn is a Python data visualization library based on matplotlib.
seabornRead the descriptions of variables in a DataFrame if available.
Check the unit of an observation: Are all values in one single observation measured for:
DataFrame longer/widerfloat, int, datetime64, object, category).astype(DTYPE) if needed.sns.histplot, sns.scatterplot)(x = , y = , hue = ))FacetGrid(DATA, row = , col = ).map(GEOMETRIC_OBJECT, VARIABLES) or col/row parameters in the function of geometric object)Pay attention to the unit of x andy axes.
Repeat the process until you are satisfied with the visualization output.
seabornDataFrames provided by the seaborn library:titanic and tips DataFrames:sns.countplot()data: DataFrame.x: Name of a categorical variable (column) in DataFramesns.countplot() function to plot a bar charthue: Name of a categorical variableWe can further break up the bars in the bar chart based on another categorical variable.
sns.histplot()bins: Number of binsbinwidth: Width of each binsns.histplot() function to plot a histogrambins or binwidth.
bins/binwidthbins/binwidth.sns.boxplot()sns.boxplot() function to plot a boxplot.sns.scatterplot() and sns.lmplot()x: Name of a continuous variable on the horizontal axisy: Name of a continuous variable on the vertical axisA scatter plot is used to display the relationship between two continuous variables.
We use sns.scatterplot() function to plot a scatter plot.
To the scatter plot, we can add a hue-VARIABLE mapping to display how the relationship between two continuous variables varies by VARIABLE.
Suppose we are interested in the following question:
sns.lmplot() adds a line that fits well into the scattered points.alpha helps address many data points on the same location.
alpha to number between 0 and 1.
alpha = 0: Full transparencyalpha = 1: No transparencyTo the scatter plot, we can add a hue-VARIABLE mapping to display how the relationship between two continuous variables varies by VARIABLE.
Using the fitted lines, let’s answer the following question:
sns.lineplot()x: Name of a continuous variable (often time variable) on the horizontal axisy: Name of a continuous variable on the vertical axissns.lineplot() draws a line by connecting the scattered points in order of the variable on the x-axis, so that it highlights exactly when changes occur.# New data
healthexp = (
sns.load_dataset("healthexp")
.sort_values(["Country", "Year"])
.query("Year <= 2020")
)
sns.lineplot(data = healthexp,
x = 'Year',
y = 'Life_Expectancy',
hue = 'Country').FacetGrid() object with the data we will be using and define how it will be subset with the row and col arguments:.map() method to run a plotting function on each of the subsets, passing along any necessary arguments.DataFrames with seabornLet’s do Classwork 12!