Data Visualization - Bar Charts
March 26, 2024
ggplot
BasicsLog Functions
Bar Charts
Position Adjustments
Statistical Transformation
log()
\(\log_{10}\,(\,100\,)\): the base \(10\) logarithm of \(100\) is \(2\), because \(10^{2} = 100\)
\(\log_{e}\,(\,x\,)\): the base \(e\) logarithm is called the natural log, where \(e = 2.718\cdots\) is the mathematical constant, the Euler’s number.
\(\log\,(\,x\,)\) or \(\ln\,(\,x\,)\): the natural log of \(x\) .
\(\log_{e}\,(\,7.389\cdots\,)\): the natural log of \(7.389\cdots\) is \(2\), because \(e^{2} = 7.389\cdots\).
log()
sale_df
contains data for residential property sales from September 2017 and August 2018 in NYC.
sale.price
, a property’s sales price.log()
1. We should consider using a logarithmic scale when percent change, or change in orders of magnitude, is more important than changes in absolute units.
\[\Delta \log(x) \,= \, \log(x_{1}) \,-\, \log(x_{0}) \approx\, \frac{x_{1} \,-\, x_{0}}{x_{0}} \,=\, \frac{\Delta\, x}{x_{0}}.\]
sale.price
of $10,000 means something very different across people with different income/wealth levels.log()
2. We should consider using a log scale when a variable is heavily skewed. - It can help visualize both small and large values effectively.
Many graphs, including bar charts, calculate new values to plot:
geom_bar()
, geom_histogram()
, and geom_freqpoly()
bin our data and then plot bin counts, the number of observations that fall in each bin.
geom_boxplot()
computes a summary of the distribution and then display a specially formatted box.
geom_smooth()
fits a model to our data and then plot predictions from the model.
Bar charts seem simple, but they are interesting because they reveal something subtle about plots.
Consider a basic bar chart, as drawn with geom_bar()
.
The following bar chart displays the total number of diamonds in the ggplot2::diamonds
data.frame, grouped by cut
.
The diamonds
data.frame comes in ggplot2
and contains information about ~54,000 diamonds, including the price
, carat
, color
, clarity
, and cut
of each diamond.
geom_bar()
bins our data and then plot bin counts, the number of observations that fall in each bin.The algorithm used to calculate new values for a graph is called a stat
, short for statistical transformation.
The figure below describes how this process works with geom_bar()
.
color
and fill
aestheticcolor
aesthetic, or, more usefully, fill
.color
and fill
aestheticstat
explicitly:
stat
.stat()
stat_summary()
.stat
.stat()
stat_summary()
.fill
aestheticfill
aesthetic to another variable.fill
aestheticstack
ing is performed automatically by the position adjustment specified by the position
argument.If we don’t want a stacked bar chart with counts, we can use one of two other position
options: fill
or dodge
.
position = "fill"
works like stacking, but makes each set of stacked bars the same height.
position = "dodge"
places overlapping objects directly beside one another.
position = "fill"
position = "dodge"
G
rammar of G
raphics (ggplot
)