import pandas as pd
= "https://bcdanl.github.io/data/esg_proj_2024_data.csv"
url_2024 = pd.read_csv(url_2024) esg_proj_2024_data
Unifying Environmental, Social, and Governance (ESG) Metrics with Financial Analysis
DANL 210 Project
Project
- Collect the following data from Yahoo! Finance:
- Environmental, Social, and Governance (ESG) data
- Stock market data
- Environmental, Social, and Governance (ESG) data
- Publish a webpage presenting your data analysis project on your personal website, hosted via GitHub.
- Your analysis should center around the agenda: “Unifying ESG Metrics with Financial Analysis”
- Present your work using a Jupyter Notebook.
- Due Date: May 16, 2025 (Friday) by 11:59 P.M.
Project Data
- Below is the
esg_proj_2024_data
DataFrame and theesg_proj_2025
DataFrame.
1. esg_proj_2024_data
esg_proj_2024_data
DataFrame which provides a list of companies and associated information, including ESG scores.
Variable Description
Symbol
: a company’s ticker;Company Name
: a company name;Sector
: a sector a company belongs to;Industry
: an industry a company belongs to;Country
: a country a company belongs to;Market Cap
: a company’s market capitalization as of December 20, 2024 (Source: Nasdaq’s Stock Screener).- A company’s market capitalization is the value of the company that is traded on the stock market, calculated by multiplying the total number of shares by the present share price.
IPO_Year
: the year a company first went public by offering its shares to be traded on a stock exchange.total_ESG
: The overall ESG (Environmental, Social, and Governance) risk score, summarizing the company’s exposure to ESG-related risks. A lower score indicates lower risk.Environmental
: The company’s exposure to environmental risks (e.g., emissions, energy use, environmental policy).Social
: The company’s exposure to social risks (e.g., labor practices, human rights, diversity, and customer relations).Governance
: The company’s exposure to governance-related risks (e.g., board structure, executive pay, shareholder rights, transparency).Controversy
: A score reflecting the severity of recent ESG-related controversies involving the company. Higher scores typically indicate greater or more serious controversies.
2. esg_proj_2025
esg_proj_2025
DataFrame provides a list of companies and associated information.
= "https://bcdanl.github.io/data/esg_proj_2025.csv"
url_2025 = pd.read_csv(url_2025) esg_proj_2025
Project Tasks
1. Data Collection
For data collection, include only the companies that are common to both the esg_proj_2024_data
and esg_proj_2025
DataFrames.
- Scraping web data falls into a legal gray area. In the U.S., scraping publicly available information is not illegal, but it is not always clearly allowed either.
- Most companies do not go after individuals for minor or non-commercial violations of their Terms of Service (ToS). Still, if the scraping causes harm, it can lead to legal trouble.
- Tips for Collecting Data from Yahoo! Finance:
- Scrape at a reasonable and moderate rate. To avoid overloading servers, use
time.sleep(random.uniform(5, y))
between page visits. - The method of explicit waits are not required, but they are helpful for ensuring elements load before scraping.
- Be aware that some companies may not have data available for Environmental, Social, or Governance Risk Scores, or Controversy Level.
- Consider starting with the following setup for Selenium web-scrapping
- Scrape at a reasonable and moderate rate. To avoid overloading servers, use
# %%
# =============================================================================
# Setup libraries
# =============================================================================
import time
import random
import pandas as pd
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# %%
# =============================================================================
# Setup working directory
# =============================================================================
= 'PATHNAME_OF_YOUR_WORKING_DIRECTORY'
wd_path
os.chdir(wd_path)
# %%
# =============================================================================
# Setup WebDriver with options
# =============================================================================
= Options()
options "window-size=1400,1200") # Set window size
options.add_argument('--disable-blink-features=AutomationControlled') # Prevent detection of automation by disabling blink features
options.add_argument(= 'eager' # Load only essential content first, skipping non-critical resources
options.page_load_strategy
= webdriver.Chrome(options=options) driver
a. ESG Data
- For each company in the
esg_proj_2025
DataFrame, employ the Pythonselenium
library to gather ESG Risk Ratings, along with the Controversy Level from the Sustainability section of each company’s webpage on Yahoo! Finance, such as:
b. Historical Stock Data
- For each company found in both the
esg_proj_2024_data
andesg_proj_2025
DataFrames, employ theselenium
library to retrieve:- Daily stock prices from January 1, 2024, to March 31, 2025
- e.g., https://finance.yahoo.com/quote/MSFT/history/?p=MSFT&period1=1704067200&period2=1743446400
- 1704067200 = January 1, 2024
- 1743446400 = March 31, 2025
- Daily stock prices from January 1, 2024, to March 31, 2025
- Note: GOOGLEFINANCE function in Google Sheets is freely available for retrieving current or historical stock data.
- Although our course does not cover Google Sheets, you are welcome to use it to collect historical stock data if you prefer.
- If you choose Google Sheets’
GOOGLEFINANCE()
for collecting historical stock data, please share your Google Sheets with Prof. Choe.
2. Data Analysis
Below are the key components in the data analysis webpage.
Title: A clear and concise title that gives an idea of the project topics.
Introduction:
- Background: Provide context for the research questions, explaining why they are significant, relevant, or interesting.
- Statement of the Problem: Clearly articulate the specific problem or issue the project will address.
Data Collection: Use a Python script (
*.py
) to write the code and the comment on how to retrieve ESG data and historical stock data using Pythonselenium
.- Do NOT provide your code for data collection in your webpage. You should submit your Python script for data collection to Brightspace.
Descriptive Statistics
- Provide both grouped and un-grouped descriptive statistics and distribution plots for the ESG data and the finance/accounting data
- Optionally, provide correlation heat maps using
corr()
andseaborn.heatmap()
. Below provides the Python code for creating a correlation heatmap.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Sample DataFrame with varied correlations
= {
data 'Revenue': [100, 200, 300, 400, 500],
'Profit': [20, 40, 60, 80, 100],
'n_Employee': [50, 45, 40, 35, 30],
'n_Customer': [10, 11, 12, 13, 14]
}
# Create a DataFrame from the dictionary
= pd.DataFrame(data)
df
# Calculate the correlation matrix of the DataFrame
= df.corr()
corr
# Set up the matplotlib figure size
=(8, 6))
plt.figure(figsize
# Generate a heatmap in seaborn:
# - 'corr' is the correlation matrix
# - 'annot=True' enables annotations inside the squares with the correlation values
# - 'cmap="coolwarm"' assigns a color map from cool to warm (blue to red)
# - 'fmt=".2f"' formats the annotations to two decimal places
# - 'linewidths=.5' adds lines between each cell
=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
sns.heatmap(corr, annot
# Title of the heatmap
'Correlation Heatmap with Varied Correlations')
plt.title(
# Display the heatmap
plt.show()
- Exploratory Data Analysis:
- Explore the trend of ESG scores from 2024 to 2025.
- Additionally, list the questions you aim to answer.
- Address the questions through data visualization with
seaborn
(orlets-plot
) andpandas
methods and attributes.
- Significance of the Project:
- Explain its implications for real-world applications, business strategies, or public policy.
- References
- List all sources cited in the project.
- Leave a web address of the reference if that is from the web.
- Indicate if the code and the write-up are guided by generative AI, such as ChatGPT. There will be no penalties on using any generative AI.
- Clearly state if the code and the write-up result from collaboration with colleagues. There will be no penalties for collaboration, provided that the shared portions are clearly indicated.
Interpreting ESG Data on Yahoo Finance 🧾
In Yahoo Finance, the ESG data helps investors evaluate a company’s sustainability profile and exposure to long-term environmental, social, and governance risks. Here’s how to interpret each metric:
🔢 Total ESG Risk Score
- What it means: A composite score reflecting the company’s overall exposure to ESG-related risks.
- How to interpret:
- Lower score = lower risk → Better ESG performance.
- Higher score = higher risk → More vulnerable to ESG-related issues.
- Example: A company with total_ESG = 15 is considered to have lower ESG risk than one with total_ESG = 30.
- Lower score = lower risk → Better ESG performance.
🌍 Environmental Risk Score
- What it measures: Exposure to environmental risks such as:
- Carbon emissions
- Energy efficiency
- Waste management
- Climate change strategy
- Carbon emissions
- Interpretation:
- Lower score → better environmental practices.
- Higher score → more environmental liabilities or poor sustainability measures.
🏛 Governance Risk Score
- What it measures: Exposure to governance-related risks, such as:
- Board structure and independence
- Executive compensation
- Shareholder rights
- Transparency and ethics
- Board structure and independence
- Interpretation:
- Lower score suggests better corporate oversight.
- Higher score suggests poor governance structures.
🚨 Controversy Level
- What it measures: Reflects recent ESG-related controversies involving the company.
- Scale: Usually ranges from 0 (no controversies) to 5 (severe and ongoing issues).
- Interpretation:
- Low score (0–1): Minimal or no controversies.
- High score (4–5): Major controversies — potential reputational or legal risks.
- Note: A company may have good ESG scores but still be flagged due to a high controversy score.
🧠 ESG Score Summary
Metric | Good Score | Bad Score |
---|---|---|
total_ESG | Low | High |
Environmental | Low | High |
Social | Low | High |
Governance | Low | High |
Controversy | 0–1 | 4–5 |
General Tips on Data Visualization 📈
Distribution
When describing the distribution of a variable, we are typically interested in several key characteristics:
- Center: The central tendency of the data, such as the mean or median, which indicates the typical or average value.
- Spread: How spread the values are within the variable, showing the range and standard deviation of values.
- Common Values: Identifying frequent values and the mode.
- Rare Values: Recognizing unusual or infrequent values.
- Shape: The overall shape of the distribution, such as whether it’s symmetric, skewed left or right, or having multiple groups with multiple peaks.

Relationship Between Two Variables
- Start with determining whether the two variables have a positive association, a negative association, or no association.
- E.g., A negative slope in the fitted line indicates that sales decrease as the price increases, while a positive slope would indicate that sales increase with price. A zero slope means that there is no relationship between sales and price; changes in price do not affect sales.

- Input on the x-axis; output on the y-axis
- By convention, the input (or predictor) variable is plotted on the x-axis, and the output (or response) variable on the y-axis.
- This helps visualize potential relationships—though it shows correlation, not necessarily causation.
- Correlation does not necessarily mean causation.
- When a question asks you to describe how the relationship varies by another categorical variable, examine both the direction of the slope (negative, positive, or none) from the fitted line and the steepness of the slope (steep or shallow).
- The slope of the fitted straight line is the rate at which the “y” variable (like grades) changes as the “x” variable (like study hours) changes. In simple terms, it shows how much one thing goes up or down when the other thing changes.
- For example, a comment such as, “The plot shows a negative relationship between sales and price” does not address how the relationship differs by brand.
- The focus is on the relationship, not the distribution.
- While adding a comment on the distribution of a single variable can be helpful, the question is primarily about the relationship between the two variables.
Time Trend of a Variable
Here are some general tips for describing the time trend of a variable:
- Start with Identifying the Overall Trend
- Look for the general direction of the trend over time.
- Is it moving upward, downward, or remaining relatively constant?
- Note Patterns and Cycles
- Identify any repeating patterns, such as seasonal fluctuations (e.g., monthly or quarterly changes) or long-term cycles.
- These can reveal consistent influences that affect the variable over time.
- Highlight Any Significant Fluctuations
- Describe any sharp increases, decreases, or irregular spikes in the data.

Interpreting Visualization
- Be specific.
- Avoid vague statements. Below examples do not actually explain what the patterns are.
- “The plot shows how the time trend of a stock price varies across sectors, with each sector having a unique best fitting line and scatter pattern”
- “The trend shows the evolution of stock price in the market over time”
- Clearly describe what is the pattern—and how it differs across categories.
- Avoid vague statements. Below examples do not actually explain what the patterns are.
- Add Narration:
- Connect the visualization to real-world phenomena and/or your idea that could help explain it, adding insight into what is happening.
Rubric
Project Write-up
Attribute | Very Deficient (1) | Somewhat Deficient (2) | Acceptable (3) | Very Good (4) | Outstanding (5) |
---|---|---|---|---|---|
1. Quality of research questions | • Not stated or very unclear • Entirely derivative • Anticipate no contribution |
• Stated somewhat confusingly • Slightly interesting, but largely derivative • Anticipate minor contributions |
• Stated explicitly • Somewhat interesting and creative • Anticipate limited contributions |
• Stated explicitly and clearly • Clearly interesting and creative • Anticipate at least one good contribution |
• Articulated very clearly • Highly interesting and creative • Anticipate several important contributions |
2. Quality of data visualization | • Very poorly visualized • Unclear • Unable to interpret figures |
• Somewhat visualized • Somewhat unclear • Difficulty interpreting figures |
• Mostly well visualized • Mostly clear • Acceptably interpretable |
• Well organized • Well thought-out visualization • Almost all figures clearly interpretable |
• Very well visualized • Outstanding visualization • All figures clearly interpretable |
3. Quality of exploratory data analysis | • Little or no critical thinking • Little or no understanding of data analytics concepts with Python |
• Rudimentary critical thinking • Somewhat shaky understanding of data analytics concepts with Python |
• Average critical thinking • Understanding of data analytics concepts with Python |
• Mature critical thinking • Clear understanding of data analytics concepts with Python |
• Sophisticated critical thinking • Superior understanding of data analytics concepts with Python |
4. Quality of business/economic analysis | • Little or no critical thinking • Little or no understanding of business/economic concepts |
• Rudimentary critical thinking • Somewhat shaky understanding of business/economic concepts |
• Average critical thinking • Understanding of business/economic concepts |
• Mature critical thinking • Clear understanding of business/economic concepts |
• Sophisticated critical thinking • Superior understanding of business/economic concepts |
5. Quality of writing | • Very poorly organized • Very difficult to read • Many typos and grammatical errors |
• Somewhat disorganized • Somewhat difficult to read • Numerous typos and grammatical errors |
• Mostly well organized • Mostly easy to read • Some typos and grammatical errors |
• Well organized • Easy to read • Very few typos or grammatical errors |
• Very well organized • Very easy to read • No typos or grammatical errors |
6. Quality of Jupyter Notebook usage | • Very poorly organized • Many redundant warning/error messages • Inappropriate code to produce outputs |
• Somewhat disorganized • Numerous warning/error messages • Misses important code |
• Mostly well organized • Some warning/error messages • Provides appropriate code |
• Well organized • Very few warning/error messages • Provides advanced code |
• Very well organized • No warning/error messages • Proposes highly advanced code |
Data Collection
Evaluation | Description | Criteria |
---|---|---|
1 (Very Deficient) | - Very poorly implemented - Data is unreliable. |
- Poor web scraping practices with selenium , leading to unreliable or incorrect data from Yahoo Finance.- Inadequate use of pandas , resulting in poorly structured DataFrames. |
2 (Somewhat Deficient) | - Somewhat effective implementation - Data has minor reliability issues. |
- Basic web scraping with selenium that sometimes fails to capture all relevant data accurately.- Basic use of pandas , but with occasional issues in data structuring. |
3 (Acceptable) | - Effective web scraping with selenium , capturing most required data from Yahoo Finance.- Adequate use of pandas to structure data in a mostly logical format. |
|
4 (Very Good) | - Well-implemented and organized - Data is reliable. |
- Thorough web scraping with selenium that consistently captures accurate and complete data from Yahoo Finance.- Skillful use of pandas for clear and logical data structuring. |
5 (Outstanding) | - Exceptionally implemented - Data is highly reliable. |
- Expert web scraping with selenium , capturing detailed and accurate data from Yahoo Finance without fail.- Expert use of pandas to create exceptionally well-organized DataFrames that facilitate easy analysis. |
👥 Social Risk Score