Homework 1

Personal Website; ggplot Visualization

Author

Byeong-Hak Choe

Published

March 3, 2025

Modified

March 3, 2025

Direction

Please submit your Quarto Document for Part 2 in Homework 1 to Brightspace with the name below:
- danl-310-hw1-LASTNAME-FIRSTNAME.qmd
  ( e.g., danl-310-hw1-choe-byeonghak.qmd )
The due is February 19, 2025, 2:00 P.M.
Please send Byeong-Hak an email (bchoe@geneseo.edu) if you have any questions.

Descriptive Statistics

The following provides the descriptive statistics for each part of Homework 1:

Part 1. Personal Website

Decorate your website:

Replace YOUR NAME with your name in _quarto.yml and index.qmd.
Describe yourself in index.qmd.
Add the picture file (e.g., png) of your profile photo to img directory. Then correct img/profile.png in index.qmd accordingly.
Add the PDF file of your resumé to the website working directory in your laptop.
Correct links for your resumé, LinkedIn, email, and optionally social media.
Make sure that you do not have any broken links in your website.

Add a “ggplot Basics” blog post to your blog using Quarto document.

In your “ggplot Basics” blog post, briefly explain ggplot basics we discussed in Lecture 3, Lecture 4, and Classwork 4
Choose a proper image file for a thumbnail for a blog post.
An YAML header template for a blog post can be found below, including an image option:

---
title: BLOG_TITLE
author: YOUR_NAME
date: 2025-02-14
categories: [tag_1, tag_2, tag_3] # tags for a blog post (e.g., python)
image: image.png 

execute: 
  warning: false
  message: false
  
toc: true
---

Use the 3-step git commands (git add ., git commit -m "...", and git push) to update your online website.

Part 2. `ggplot` visualization

Setup

This is the setup R code chunk for this Quarto document:

library(tidyverse)
library(datasets)
library(gapminder)
library(skimr)   # a better summary of data.frame
library(scales)  # scales for ggplot
library(ggthemes)  # additional ggplot themes
library(hrbrthemes) # additional ggplot themes and color pallets
library(lubridate)
library(ggridges)
library(DT)
theme_set(theme_minimal()) # setting the minimal theme for ggplot

Provide ggplot codes to replicate the given figures.

Use the following data.frame for Question 1, 2, and 3.

ncdc_temp <- read_csv(
  'https://bcdanl.github.io/data/ncdc_temp_cleaned.csv')

Question 1

Click to Check the Answer!

ggplot(ncdc_temp, aes(x = date, y= temperature)) +
  geom_line(aes(color = location), size = 1) + 
  geom_point(data = ncdc_temp |> 
               filter(month %in% c("01", "04", "07", "10"),
                      day == 1)) + # Adds a layer to the ggplot object with a line plot of the temperature data, with a size of 1.
  scale_x_date(name = "month", 
               limits = c(ymd("0000-01-01"), ymd("0001-01-04")), # Adds a scale to the x-axis with the label "month" and limits of Jan 1, 0000 to Jan 4, 0001, and breaks at the beginning of each quarter (Jan, Apr, Jul, Oct), with corresponding labels.
               breaks = c(ymd("0000-01-01"), ymd("0000-04-01"), ymd("0000-07-01"), ymd("0000-10-01"), ymd("0001-01-01")),
               labels = c("Jan", "Apr", "Jul", "Oct", "Jan"), 
               expand = c(1/366, 0)) +
  scale_y_continuous(limits = c(19.9, 107), # Adds a scale to the y-axis with limits of 19.9 to 107, breaks at every 20 units, and label "temperature (°F)".
                     breaks = seq(20, 100, by = 20),
                     name = "temperature (°F)") +
  theme(legend.title.align = 0.5) # Adjusts the alignment of the legend title to be centered.

Question 2

Click to Check the Answer!

p <- ggplot(ncdc_temp, 
            aes(x = month, y= temperature)) 

  # add a box plot with grey fill
p + geom_boxplot(fill = 'grey90') + 
  # add labels for x and y axes
  labs(x = "month",
       y = "mean temperature (°F)") +
  # apply a custom theme to the plot
  theme_clean()

Question 3

Use ggridges::geom_density_ridges() for Question 3.

Click to Check the Answer!

p <- ggplot(ncdc_temp, 
            aes(x = temperature, y = month)) 

p + geom_density_ridges( # Adds a layer to the ggplot object with a smoothed density plot of the temperature data using the 'ridgeline' plot type.
  scale = 3, 
  rel_min_height = 0.01, # Sets the scaling and minimum relative height for the plot.
  bandwidth = 3.4, 
  fill = "#56B4E9", 
  color = "white" # Sets the bandwidth for the plot, as well as the fill and color for the plot elements.
) +

scale_x_continuous( # Adds a scale to the x-axis for continuous values.
  name = "mean temperature (°F)", # Sets the label for the x-axis.
  expand = c(0, 0), 
  breaks = c(0, 25, 50, 75) # Sets the expansion and the break points for the x-axis.
) +

scale_y_discrete(
  name = "month", 
  expand = c(0, .2, 0, 2.6)) + # Adds a scale to the y-axis for discrete (categorical) values, with a label and a custom expansion.

theme( # Applies a custom theme to the ggplot object.
  plot.margin = margin(3, 7, 3, 1.5) # Sets the margin of the plot.
)

Question 4

Use datasets::mtcars for Question 4.

Click to Check the Answer!

m <- ggplot(data = mtcars, 
            aes(x = disp, y = mpg, color = hp)) 

m + geom_point(aes(color = hp)) + # add scatter plot with color mapped to "hp" variable
  labs(x = "displacement(cu. in.)", y = "fuel efficiency(mpg)")+ # add labels to x and y axes
  scale_color_gradient()+ # add color gradient scale legend
  scale_fill_brewer(palette = "Emrld") # add fill color palette with "Emrld" scheme to the legend

Question 5

Use the following data.frame for Question 5.

popgrowth_df <- read_csv(
  'https://bcdanl.github.io/data/popgrowth.csv')

Click to Check the Answer!

p <- ggplot(popgrowth_df, 
            aes(y = fct_reorder(state, popgrowth), 
                x = 100*popgrowth, 
                fill = region))
p + geom_col() + # Add the geom for the columns
  scale_x_continuous(
    limits = c(-.6, 37.5), expand = c(0, 0), # Set x axis limits and expansion
    labels = scales::percent_format(accuracy = 1, scale = 1), # Set percent labels for x axis
    name = "population growth, 2000 to 2010" # Set name for x axis
    ) +
  theme(legend.position = c(.67, .4), # Set legend position
        axis.text.y = element_text( size = 6, 
                                    margin = margin(t = 0, r = 0, b = 0, l = 0) )) # Adjust the size and margin for y axis text

Question 6

Use the following data.frame for Question 6

male_Aus <- read_csv(
  'https://bcdanl.github.io/data/aus_athletics_male.csv')

Click to Check the Answer!

# Define color and fill vectors for use in plot
colors <- c("#BD3828", rep("#808080", 4))
fills <- c("#BD3828D0", rep("#80808080", 4))

p <- ggplot(male_Aus, 
            aes(x=height, y=pcBfat, 
                shape = sport, 
                color = sport, 
                fill = sport))

# Add geom_point layer with custom size
p + geom_point(size = 3) +

# Set shape values for different sports
  scale_shape_manual(values = 21:25) +

# Set color values for different sports
  scale_color_manual(values = colors) +

# Set fill values for different sports
  scale_fill_manual(values = fills) +

# Set x and y axis labels
  labs(x = "height (cm)",
       y = "% body fat" )

Question 7

Use the following data.frame for Question 7

titanic <- read_csv(
  'https://bcdanl.github.io/data/titanic_cleaned.csv')

Click to Check the Answer!

p <- ggplot(titanic, aes(x = age, y = after_stat(count) ) ) 

# Add a density line plot for all passengers with transparent color, and fill legend with "all passengers"
p + geom_density(
    data = select(titanic, -gender), 
    aes(fill = "all passengers"),
    color = "transparent"
  ) + 
  # Add another density line plot for each gender with transparent color, and fill legend with gender
  geom_density(aes(fill = gender),
               bw = 2, 
               color = "transparent") +
  # Set the x-axis limits, name, and expand arguments
  scale_x_continuous(limits = c(0, 75), 
                     name = "passenger age (years)", 
                     expand = c(0, 0)) +
  # Set the y-axis limits, name, and expand arguments
  scale_y_continuous(limits = c(0, 26), 
                     name = "count", 
                     expand = c(0, 0)) +
  # Set the manual color and fill values, breaks, and labels for the legend
  scale_fill_manual(
    values = c("#b3b3b3a0", "#0072B2", "#D55E00"), 
    breaks = c("all passengers", "male", "female"),
    labels = c("all passengers  ", "males  ", "females"),
    name = NULL,
    guide = guide_legend(direction = "horizontal")
  ) +
  # Set the Cartesian coordinate system to allow for data points to fall outside the plot limits
  coord_cartesian(clip = "off") +
  # Create separate density line plots for male and female passengers
  facet_wrap(~gender) +
  # Set the x-axis line to blank, increase the strip text size, and set the legend position and margin
  theme(
    axis.line.x = element_blank(),
    strip.text = element_text(size = 14, margin = margin(0, 0, 0.2, 0, "cm")),
    legend.position = "bottom",
    legend.justification = "right",
    legend.margin = margin(4.5, 0, 1.5, 0, "pt"),
    legend.spacing.x = grid::unit(4.5, "pt"),
    legend.spacing.y = grid::unit(0, "pt"),
    legend.box.spacing = grid::unit(0, "cm")
  )

Question 8

Use the following data.frame for Question 8.

cows_filtered <- read_csv(
  'https://bcdanl.github.io/data/cows_filtered.csv')

Click to Check the Answer!

p <- ggplot(cows_filtered, 
            aes(x = butterfat, 
                color = breed, 
                fill = breed))

# add a density line for each breed with some transparency
p + geom_density(alpha = .2) +

# set x-axis properties
  scale_x_continuous(
    expand = c(0, 0), # remove padding from axis limits
    labels = scales::percent_format(accuracy = 1, scale = 1), # format axis labels as percentages with 1 decimal point
    name = "butterfat contents" # set axis label
) +

# set y-axis properties
  scale_y_continuous(limits = c(0, 1.99), 
                     expand = c(0, 0)) +

# set plot area properties
  coord_cartesian(clip = "off") + # allow density lines to extend beyond axis limits
  theme(axis.line.x = element_blank()) # remove x-axis line

Question 9

Provide your GitHub username.

Direction

Descriptive Statistics

Part 1. Personal Website

Part 2. ggplot visualization

Setup

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Part 2. `ggplot` visualization