Classwork 9

ETL Process in R

Author

Byeong-Hak Choe

Published

October 29, 2025

Modified

October 31, 2025


Question 1

Consider the following three related data frames from the Classwork 8 - Social Media Analytics:

  • df_survey β€” individual student responses about social media usage
  • df_platform β€” reference information about each social media platform
  • df_card β€” additional details or attributes linked to each platform

You will use these data frames in Question 1.

df_platform <- 
  read_csv("https://bcdanl.github.io/data/platform_reference.csv")
df_card <- 
  read_csv("https://bcdanl.github.io/data/card_suit_rules.csv")
df_survey <- 
  read_csv("https://bcdanl.github.io/data/danl-101-survey-social-media-fall-2025.csv")

Part A

Write an R code to create a data frame named df_joined that combines the three datasets β€” df_survey, df_platform, and df_card β€” using the left_join() function.

Part B

Find out which platforms DANL 101 students use.

count(): Counting Occurrences of Each Category in a Categorical Variable
DATA.FRAME |> count(CATERIGOCAL_VARIABLE)
  • The data transformation function count() calculates the frequency of each unique value in a categorical variable.
library(nycflights13)
flights |> count(origin)
  • flights |> count(origin) returns the data.frame with the two variables, origin and n:
    • n: the number of occurrences of each unique value in the origin variable in the flights data.frame

Part C

Count how many students use each platform, and determine which platform is the most popular based on the number of users.

  • Check out the new data transformation function: count():

Part D

Calculate descriptive statistics (e.g., mean, standard deviation, and quartiles) of daily_time_min for each platform by using the skimr::skim() function.


Question 2

Consider the two related data.frames, df_1 and df_2:

  • df_1
id name age
1 Alice 19
2 Bob 21
4 Olivia 20
  • df_2
id major
1 Economics
2 Business Administration
3 Data Analytics

Which of the following R code correctly join the two related data.frames, df_1 and df_2, to produce the resulting data.frame shown below?

id major name age
1 Economics Alice 19
2 Business Administration Bob 21
3 Data Analytics NA NA
  1. df_1 |> left_join(df_2)
  2. df_2 |> left_join(df_1)
  3. Both a and b



Discussion

Welcome to our Classwork 9 Discussion Board! πŸ‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 9.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) or peer classmate (@GitHub-Username) regarding the Classwork 9 materials or need clarification on any points, don’t hesitate to ask here.

Let’s collaborate and learn from each other!

Back to top