library(tidyverse)
library(skimr)
esg_info <- read_csv("https://bcdanl.github.io/data/stock_esg_simple.csv")Midterm Exam II
DANL 101: Introduction to Data Analytics
Section 3. Data Transformation and Visualization with R
Questions 13-19
For Questions 13-19, consider the following R packages and the data.frame, esg_info, containing individual company statistics for the Environmental, Social, and Governance (ESG) risk score in 2024.
- The
esg_infodata.frame is with 631 observations and 7 variables. - The first 5 observations in the
esg_infodata.frame are displayed below:
| Ticker | Company_Name | sector | total_esg |
|---|---|---|---|
| A | Agilent Technologies, Inc. | Industrials | 13.6 |
| AA | Alcoa Corporation | Industrials | 24.0 |
| AAL | American Airlines Group Inc. | Consumer Discretionary | 26.4 |
| AAP | Advance Auto Parts, Inc. | Consumer Discretionary | 11.5 |
| AAPL | Apple Inc | Technology | 17.2 |
| Ticker | Company_Name | sector | environmental | social | governance |
|---|---|---|---|---|---|
| A | Agilent Technologies, Inc. | Industrials | 1.1 | 6.4 | 6.1 |
| AA | Alcoa Corporation | Industrials | 13.8 | 5.9 | 4.3 |
| AAL | American Airlines Group Inc. | Consumer Discretionary | 9.9 | 11.6 | 4.8 |
| AAP | Advance Auto Parts, Inc. | Consumer Discretionary | 0.1 | 8.3 | 3.1 |
| AAPL | Apple Inc | Technology | 0.5 | 7.4 | 9.4 |
Description of Variables in esg_info:
Ticker: The stock symbol used to uniquely identify a publicly traded company on financial markets.Company_Name: The full name of the company corresponding to the ticker symbol.sector: The broader industry category to which the company belongs, such as Technology, Healthcare, or Financials.total_esg: The company’s overall Environmental, Social, and Governance (ESG) score. It reflects how well the company is performing in terms of sustainability and ethical impact.environmental: The company’s score related to environmental practices, such as energy efficiency, waste management, and carbon footprint.social: The company’s score related to social practices, including employee relations, diversity, community impact, and human rights.governance: The company’s score related to governance practices, like board structure, executive pay, and shareholder rights.
Interpreting ESG Data
The ESG data helps investors evaluate a company’s sustainability profile and exposure to long-term environmental, social, and governance risks. Here’s how to interpret each metric:
Total ESG Risk Score
- What it means: A composite score reflecting the company’s overall exposure to ESG-related risks.
- How to interpret:
- Higher score = higher risk → More vulnerable to ESG-related issues.
- Example: A company with total_ESG = 15 is considered to have lower ESG risk than one with total_ESG = 30.
- Higher score = higher risk → More vulnerable to ESG-related issues.
Environmental Risk Score
- What it measures: Exposure to environmental risks such as:
- Carbon emissions, waste management, climate change strategy
- Higher score → more environmental liabilities or poor sustainability measures.
- Carbon emissions, waste management, climate change strategy
Governance Risk Score
- What it measures: Exposure to governance-related risks, such as:
- Board structure and independence, executive compensation, shareholder rights, transparency and ethics
- Higher score suggests poor governance structures.
- Board structure and independence, executive compensation, shareholder rights, transparency and ethics
The followings are the summary of the esg_info data.frame, including descriptive statistics for each variable.
| Name | esg_info |
| Number of rows | 631 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | min | max | empty | n_unique |
|---|---|---|---|---|---|
| Ticker | 0 | 1 | 5 | 0 | 631 |
| Company_Name | 0 | 5 | 47 | 0 | 630 |
| sector | 0 | 6 | 22 | 0 | 12 |
Variable type: numeric
| skim_variable | n_missing | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|
| total_esg | 0 | 21.62 | 7.08 | 6.4 | 16.30 | 21.20 | 26.10 | 52.0 |
| environmental | 23 | 5.78 | 5.25 | 0.0 | 1.78 | 3.95 | 9.00 | 25.3 |
| social | 23 | 9.02 | 3.57 | 0.8 | 6.70 | 8.90 | 11.20 | 22.5 |
| governance | 23 | 6.83 | 2.40 | 2.4 | 5.20 | 6.30 | 7.93 | 19.4 |
Question 13
Write a code to produce the above summary for the esg_info data.frame, including descriptive statistics for each variable.
Answer: ______________________________________________
skim(esg_info)
Question 14
What code would you use to count the number of companies in each sector?
esg_info |> count(Sector)esg_info |> count(Company_Name)esg_info |> count(Company_Name, Sector)esg_info |> count(Sector, Company_Name)
Answer: a
Question 15
What is the median value of total_esg? Find this value from the summary of the esg_info data.frame.
Answer: ______________________________________________
Answer: 21.2
Question 16
- We are interested in companies whose overall ESG performance is relatively weak.
- To achieve this, we create a new data.frame, a new data.frame,
bottom25_esg, which includes only companies whosetotal_esgvalue is greater than or equal to the third quartile oftotal_esgvariable.
bottom25_esg <- esg_info |>
filter(total_esg >= ___BLANK___)- Using the summary of the
esg_infodata.frame, find condition correctly fills in the BLANK to complete the code above:
- 6.4
- 7.08
- 16.3
- 21.2
- 21.6
- 26.1
Answer: f
Question 17
- Additionally, we are interested in companies with excellent environmental or governance performance, despite having weak overall ESG scores.
- To achieve this, we create a new data.frame,
bottom25_esg_filtered, which includes only companies whoseenvironmentalrisk score is at most 3.95 or whosegovernancerisk score is at most 5.2.
bottom25_esg_filtered <- esg_info |>
filter(___BLANK___)- Which condition correctly fills in the BLANK to complete the code above?
environmental <= 3.95 | governance <= 5.2environmental <= 3.95 , governance <= 5.2environmental <= 3.95 & governance <= 5.2environmental <= 3.95 ! governance <= 5.2- Both (b) and (c)
Answer: a
Question 18
How would you describe the relationship between environmental and governance using the bottom25_esg_filtered data.frame?
- To identify leader companies, such as environmental leaders and governance leaders, some company names are added to such points in the plot.
- Note that it is NOT required to provide the code for adding these texts to the plot.
Complete the code by filling in the blanks (1)-(4).
ggplot(data = ___(1)___,
mapping = aes(x = ___(2)___,
y = ___(3)___)) +
geom_point(alpha = 0.5) +
___(4)___()
Blank (1)
bottom25_esg_filteredbottom25_esgesg_info
Answer: a
Blank (2)
environmentalsocialgovernancetotal_esg
Answer: a
Blank (3)
environmentalsocialgovernancetotal_esg
Answer: c
Blank (4)
geom_fitgeom_scatterplotgeom_smoothgeom_histogram
Answer: c
Environmental Leaders
Which companies qualify as environmental leaders — that is, companies with an environmental risk score below 2.5 whose plotted points lie below the gray ribbon in the figure?
Answer: ______________________________________________
JPMorgan Chase, PNC Financial, Bank of America, AbbVie, and Ionis Pharmaceuticals
Governance Leaders
Which companies qualify as governance leaders — that is, companies with an governance risk score below 10 whose plotted points lie below the gray ribbon in the figure?
Answer: ______________________________________________
Ionis Pharmaceuticals, HCA Healthcare, IAC, and Stericycle
Relationship
Describe the overall relationship between environmental and governance, in the given plot.
Overall, the governance risk score decreases as the environmental risk score increases up to about 10, after which it levels off and remains relatively constant.
This pattern suggests that among companies with weaker overall ESG performance, deterioration in environmental risk is associated with stronger governance up to a threshold, beyond which further changes in environmental risk have little additional correlation with governance risk.
Question 19
How would you describe how the distribution of total_esg varies by sector (sector) using the bottom25_esg_filtered data.frame?
- Note that the
sectorcategories are sorted by the median oftotal_esgin the plot.
Complete the code by filling in the blanks.
ggplot(data = ___(1)___,
mapping = aes(___(2)___,
y = ___(3)___)) +
___(4)___() +
labs(y = "Sector")
Blank (1)
bottom25_esg_filteredbottom25_esgesg_info
a
Blank (2)
x = total_esgy = total_esgx = environmentaly = environmentalx = socialy = socialx = governancey = governance
a
Blank (3)
fct_reorder(total_esg, sector)fct_reorder(environmental, total_esg)fct_reorder(total_esg, social)fct_reorder(governance, total_esg)fct_reorder(total_esg, controversy)fct_reorder(sector, total_esg)
f
Blank (4)
geom_bargeom_boxgeom_boxplotgeom_histogram
c
Question 20

For Question 20, you will use the following R packages and a data.frame named chess_top4, which contains information about chess games played by four of the world’s top online chess players during a special event called “Titled Tuesday” on chess.com. These games were played in a format where each player has 3 minutes to make all their moves, with 1 second added to their clock after each move. The data includes games from October 2022 to October 2024 played only among the following four players:
- Magnus Carlsen
- Hikaru Nakamura
- Alireza Firouzja
- Daniel Naroditsky
Note: Titled Tuesday is a weekly event held every Tuesday on chess.com, where titled chess players (such as Grandmasters and International Masters) compete in online tournaments.
library(tidyverse)
chess_top4 <- read_csv("https://bcdanl.github.io/data/chess_titled_tuesday.csv")The first 15 observations in the chess_top4 data.frame are displayed below:
| Date | White | Black | Result |
|---|---|---|---|
| 2022-10-11 | Hikaru Nakamura | Magnus Carlsen | White Wins |
| 2022-10-18 | Hikaru Nakamura | Alireza Firouzja | Draw |
| 2022-10-25 | Daniel Naroditsky | Alireza Firouzja | Draw |
| 2022-11-08 | Hikaru Nakamura | Alireza Firouzja | Black Wins |
| 2022-12-13 | Alireza Firouzja | Hikaru Nakamura | Draw |
| 2022-12-20 | Hikaru Nakamura | Alireza Firouzja | White Wins |
| 2022-12-20 | Magnus Carlsen | Hikaru Nakamura | Draw |
| 2023-01-03 | Daniel Naroditsky | Hikaru Nakamura | Black Wins |
| 2023-01-03 | Hikaru Nakamura | Magnus Carlsen | White Wins |
| 2023-01-24 | Hikaru Nakamura | Magnus Carlsen | White Wins |
| 2023-02-28 | Alireza Firouzja | Magnus Carlsen | Draw |
| 2023-02-28 | Hikaru Nakamura | Alireza Firouzja | Draw |
| 2023-02-28 | Hikaru Nakamura | Magnus Carlsen | Draw |
| 2023-02-28 | Magnus Carlsen | Hikaru Nakamura | Draw |
| 2023-03-14 | Hikaru Nakamura | Alireza Firouzja | White Wins |
- The
chess_top4data.frame contains 70 observations and 4 variables, representing 70 unique chess games.
Description of Variables in chess_top4:
Date: The date when the game was played.White: The name of the player who played with the white pieces.Black: The name of the player who played with the black pieces.Result: The outcome of the game, which can be one of the following:- “White Wins” (the player with the white pieces won the game)
- “Black Wins” (the player with the black pieces won the game)
- “Draw” (the game ended in a tie)
Question 20 is about a ggplot code to visualize how the distribution of Result varies among these top 4 chess players.
Part 1
Complete the code by filling in the blanks to replicate the given plot.
- The White player is displayed on the vertical axis.
- The Black player is labeled at the top of each panel.
ggplot(data = chess_top4,
mapping = aes(___(1)___,
fill = ___(2)___)) +
geom_bar(___(3)___) +
facet_wrap(___(4)___, ncol = 1) +
labs(x = "Proportion")
Blank (1)
x = Whitey = Whitex = Blacky = Blackx = Proportiony = Proportion
b
Blank (2)
chess_top4WhiteBlackResultcount
d
Blank (3)
position = "stack"position = "fill"position = "dodge"- Leaving (3) empty
- Both a and d
- Both b and d
- Both c and d
b
Blank (4)
White~WhiteBlack~BlackPlayer~Player- both a and b
- both c and d
- both e and f
- both b and f
- both d and f
d
Part 2
Complete the code by filling in the blanks to replicate the given plot.
- The White player is displayed on the vertical axis.
- The Black player is labeled at the top of each panel.
ggplot(data = chess_top4,
mapping = aes(___(1)___,
fill = ___(2)___)) +
geom_bar(___(3)___) +
facet_wrap(___(4)___, ncol = 1)
Blank (1)
x = Whitey = Whitex = Blacky = Blackx = county = count
b
Blank (2)
chess_top4WhiteBlackResultcount
d
Blank (3)
position = "stack"position = "fill"position = "dodge"- Leaving (3) empty
- Both a and d
- Both b and d
- Both c and d
e
Blank (4)
White~WhiteBlack~BlackPlayer~Player- both a and b
- both c and d
- both e and f
- both b and f
- both d and f
d
Part 3
Complete the code by filling in the blanks to replicate the given plot.
- The White player is displayed on the vertical axis.
- The Black player is labeled at the top of each panel.
ggplot(data = chess_top4,
mapping = aes(___(1)___,
fill = ___(2)___)) +
geom_bar(___(3)___) +
facet_wrap(___(4)___, ncol = 1)
Blank (1)
x = Whitey = Whitex = Blacky = Blackx = county = count
b
Blank (2)
chess_top4WhiteBlackResultcount
d
Blank (3)
position = "stack"position = "fill"position = "dodge"- Leaving (3) empty
- Both a and d
- Both b and d
- Both c and d
c
Blank (4)
White~WhiteBlack~BlackPlayer~Player- both a and b
- both c and d
- both e and f
- both b and f
- both d and f
d
Part 4 - Magnus Carlsen vs. Hikaru Nakamura in the Titled Tuesday
Who had more wins in the games where Magnus Carlsen played with the white pieces and Hikaru Nakamura played with the black pieces in the Titled Tuesday?
Answer: ______________________________________________
Magnus Carlsen
Who had more wins in the games where Hikaru Nakamura played with the white pieces and Magnus Carlsen played with the black pieces in the Titled Tuesday?
Answer: ______________________________________________
Hikaru Nakamura
Who won more games in the encounters between Magnus Carlsen and Hikaru Nakamura in the Titled Tuesday?
Answer: ______________________________________________
They had an equal number of wins.
Social Risk Score