Unifying Environmental, Social, and Governance (ESG) Metrics with Financial Analysis

DANL 210 Project

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

April 3, 2025


Project

  1. Collect the following data from Yahoo! Finance:
    • Environmental, Social, and Governance (ESG) data
    • Stock market data
  2. Publish a webpage presenting your data analysis project on your personal website, hosted via GitHub.
    • Your analysis should center around the agenda: “Unifying ESG Metrics with Financial Analysis”
  • Present your work using a Jupyter Notebook.
  • Due Date: May 16, 2025 (Friday) by 11:59 P.M.


Project Data

  • Below is the esg_proj_2024_data DataFrame and the esg_proj_2025 DataFrame.

1. esg_proj_2024_data

  • esg_proj_2024_data DataFrame which provides a list of companies and associated information, including ESG scores.
import pandas as pd
url_2024 = "https://bcdanl.github.io/data/esg_proj_2024_data.csv"
esg_proj_2024_data = pd.read_csv(url_2024)


Variable Description

  • Symbol: a company’s ticker;
  • Company Name: a company name;
  • Sector: a sector a company belongs to;
  • Industry: an industry a company belongs to;
  • Country: a country a company belongs to;
  • Market Cap: a company’s market capitalization as of December 20, 2024 (Source: Nasdaq’s Stock Screener).
    • A company’s market capitalization is the value of the company that is traded on the stock market, calculated by multiplying the total number of shares by the present share price.
  • IPO_Year: the year a company first went public by offering its shares to be traded on a stock exchange.
  • total_ESG: The overall ESG (Environmental, Social, and Governance) risk score, summarizing the company’s exposure to ESG-related risks. A lower score indicates lower risk.
  • Environmental: The company’s exposure to environmental risks (e.g., emissions, energy use, environmental policy).
  • Social: The company’s exposure to social risks (e.g., labor practices, human rights, diversity, and customer relations).
  • Governance: The company’s exposure to governance-related risks (e.g., board structure, executive pay, shareholder rights, transparency).
  • Controversy: A score reflecting the severity of recent ESG-related controversies involving the company. Higher scores typically indicate greater or more serious controversies.



2. esg_proj_2025

  • esg_proj_2025 DataFrame provides a list of companies and associated information.
url_2025 = "https://bcdanl.github.io/data/esg_proj_2025.csv"
esg_proj_2025 = pd.read_csv(url_2025)



Project Tasks

1. Data Collection

  • For data collection, consider the esg_proj_2025 DataFrame.

1. selenium

  • For each company in the esg_proj_2025 DataFrame, employ the Python selenium library to gather ESG Risk Ratings, along with the Controversy Level from the Sustainability section of each company’s webpage on Yahoo! Finance, such as:
  • Tips for Collecting Data from Yahoo! Finance’s Sustainability Section:
    1. To avoid getting blocked for automated activity, use time.sleep(random.uniform(x, y)) between page visits.
    2. Explicit waits are not required, but they are helpful for ensuring elements load before scraping.
    3. Be aware that some companies may not have data available for Environmental, Social, or Governance Risk Scores, or Controversy Level.

2. yfinance

  • For each company found in both the esg_proj_2024_data and esg_proj_2025 DataFrames, employ the yfinance library to retrieve:
    • Daily stock prices from January 1, 2024, to March 31, 2025
    • Quarterly income statements for the five quarters ending March 31, 2025 (starting from Q1 2024)
    • Quarterly balance sheets for the same five-quarter period


2. Data Analysis

  • Below are the key components in the data analysis webpage.

    1. Title: A clear and concise title that gives an idea of the project topics.

    2. Introduction:

      • Background: Provide context for the research questions, explaining why they are significant, relevant, or interesting.
      • Statement of the Problem: Clearly articulate the specific problem or issue the project will address.
    3. Data Collection: Use a Python script (*.py) to write the code and the comment on how to retrieve financial, accounting, and ESG data using Python yfinance and selenium.

      • Do NOT provide your code for data collection in your webpage. You should submit your Python script for data collection to Brightspace.
    4. Descriptive Statistics

    • Provide both grouped and un-grouped descriptive statistics and distribution plots for the ESG data and the finance/accounting data
    • Provide correlation heat maps using corr() and seaborn.heatmap(). Below provides the Python code for creating a correlation heatmap.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample DataFrame with varied correlations
data = {
    'Revenue': [100, 200, 300, 400, 500],  
    'Profit': [20, 40, 60, 80, 100],       
    'n_Employee': [50, 45, 40, 35, 30], 
    'n_Customer': [10, 11, 12, 13, 14]  
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Calculate the correlation matrix of the DataFrame
corr = df.corr()

# Set up the matplotlib figure size
plt.figure(figsize=(8, 6))

# Generate a heatmap in seaborn:
# - 'corr' is the correlation matrix
# - 'annot=True' enables annotations inside the squares with the correlation values
# - 'cmap="coolwarm"' assigns a color map from cool to warm (blue to red)
# - 'fmt=".2f"' formats the annotations to two decimal places
# - 'linewidths=.5' adds lines between each cell
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)

# Title of the heatmap
plt.title('Correlation Heatmap with Varied Correlations')

# Display the heatmap
plt.show()

  1. Exploratory Data Analysis:
    • Explore the trend of ESG scores from 2024 to 2025.
    • Additionally, list the questions you aim to answer.
    • Address the questions through data visualization with seaborn (or lets-plot) and pandas methods and attributes.
  2. Significance of the Project:
    • Explain its implications for real-world applications, business strategies, or public policy.
  3. References
    • List all sources cited in the project.
    • Leave a web address of the reference if that is from the web.
    • Indicate if the code and the write-up are guided by generative AI, such as ChatGPT. There will be no penalties on using any generative AI.
    • Clearly state if the code and the write-up result from collaboration with colleagues. There will be no penalties for collaboration, provided that the shared portions are clearly indicated.


🧾 Interpreting ESG Data on Yahoo Finance

In Yahoo Finance, the ESG data helps investors evaluate a company’s sustainability profile and exposure to long-term environmental, social, and governance risks. Here’s how to interpret each metric:

🔢 Total ESG Risk Score

  • What it means: A composite score reflecting the company’s overall exposure to ESG-related risks.
  • How to interpret:
    • Lower score = lower risk → Better ESG performance.
    • Higher score = higher risk → More vulnerable to ESG-related issues.
    • Example: A company with total_ESG = 20 is considered to have lower ESG risk than one with total_ESG = 45.

🌍 Environmental Risk Score

  • What it measures: Exposure to environmental risks such as:
    • Carbon emissions
    • Energy efficiency
    • Waste management
    • Climate change strategy
  • Interpretation:
    • Lower score → better environmental practices.
    • Higher score → more environmental liabilities or poor sustainability measures.

👥 Social Risk Score

  • What it measures: Exposure to social risks, including:
    • Labor practices
    • Human rights
    • Diversity and inclusion
    • Customer and community relations
  • Interpretation:
    • Lower score = better social responsibility.
    • Higher score = more risk from labor issues, PR problems, etc.

🏛 Governance Risk Score

  • What it measures: Exposure to governance-related risks, such as:
    • Board structure and independence
    • Executive compensation
    • Shareholder rights
    • Transparency and ethics
  • Interpretation:
    • Lower score suggests better corporate oversight.
    • Higher score suggests poor governance structures.

🚨 Controversy Level

  • What it measures: Reflects recent ESG-related controversies involving the company.
  • Scale: Usually ranges from 0 (no controversies) to 5 (severe and ongoing issues).
  • Interpretation:
    • Low score (0–1): Minimal or no controversies.
    • High score (4–5): Major controversies — potential reputational or legal risks.
    • Note: A company may have good ESG scores but still be flagged due to a high controversy score.

🧠 ESG Score Summary

Metric Good Score Bad Score
total_ESG Low High
Environmental Low High
Social Low High
Governance Low High
Controversy 0–1 4–5



Rubric

Project Write-up

Attribute Very Deficient (1) Somewhat Deficient (2) Acceptable (3) Very Good (4) Outstanding (5)
1. Quality of research questions • Not stated or very unclear
• Entirely derivative
• Anticipate no contribution
• Stated somewhat confusingly
• Slightly interesting, but largely derivative
• Anticipate minor contributions
• Stated explicitly
• Somewhat interesting and creative
• Anticipate limited contributions
• Stated explicitly and clearly
• Clearly interesting and creative
• Anticipate at least one good contribution
• Articulated very clearly
• Highly interesting and creative
• Anticipate several important contributions
2. Quality of data visualization • Very poorly visualized
• Unclear
• Unable to interpret figures
• Somewhat visualized
• Somewhat unclear
• Difficulty interpreting figures
• Mostly well visualized
• Mostly clear
• Acceptably interpretable
• Well organized
• Well thought-out visualization
• Almost all figures clearly interpretable
• Very well visualized
• Outstanding visualization
• All figures clearly interpretable
3. Quality of exploratory data analysis • Little or no critical thinking
• Little or no understanding of data analytics concepts with Python
• Rudimentary critical thinking
• Somewhat shaky understanding of data analytics concepts with Python
• Average critical thinking
• Understanding of data analytics concepts with Python
• Mature critical thinking
• Clear understanding of data analytics concepts with Python
• Sophisticated critical thinking
• Superior understanding of data analytics concepts with Python
4. Quality of business/economic analysis • Little or no critical thinking
• Little or no understanding of business/economic concepts
• Rudimentary critical thinking
• Somewhat shaky understanding of business/economic concepts
• Average critical thinking
• Understanding of business/economic concepts
• Mature critical thinking
• Clear understanding of business/economic concepts
• Sophisticated critical thinking
• Superior understanding of business/economic concepts
5. Quality of writing • Very poorly organized
• Very difficult to read
• Many typos and grammatical errors
• Somewhat disorganized
• Somewhat difficult to read
• Numerous typos and grammatical errors
• Mostly well organized
• Mostly easy to read
• Some typos and grammatical errors
• Well organized
• Easy to read
• Very few typos or grammatical errors
• Very well organized
• Very easy to read
• No typos or grammatical errors
6. Quality of Jupyter Notebook usage • Very poorly organized
• Many redundant warning/error messages
• Inappropriate code to produce outputs
• Somewhat disorganized
• Numerous warning/error messages
• Misses important code
• Mostly well organized
• Some warning/error messages
• Provides appropriate code
• Well organized
• Very few warning/error messages
• Provides advanced code
• Very well organized
• No warning/error messages
• Proposes highly advanced code


Data Collection

Evaluation Description Criteria
1 (Very Deficient) - Very poorly implemented
- Data is unreliable.
- Ineffective use of yfinance, resulting in incomplete or inaccurate financial data.
- Poor web scraping practices with selenium, leading to unreliable or incorrect data from Yahoo Finance.
- Inadequate use of pandas, resulting in poorly structured DataFrames.
2 (Somewhat Deficient) - Somewhat effective implementation
- Data has minor reliability issues.
- Basic use of yfinance with minor inaccuracies in data retrieval.
- Basic web scraping with selenium that sometimes fails to capture all relevant data accurately.
- Basic use of pandas, but with occasional issues in data structuring.
3 (Acceptable) - Competently implemented
- Data is mostly reliable.
- Competent use of yfinance to retrieve most financial data accurately.
- Effective web scraping with selenium, capturing most required data from Yahoo Finance.
- Adequate use of pandas to structure data in a mostly logical format.
4 (Very Good) - Well-implemented and organized
- Data is reliable.
- Advanced use of yfinance to reliably and accurately fetch financial data.
- Thorough web scraping with selenium that consistently captures accurate and complete data from Yahoo Finance.
- Skillful use of pandas for clear and logical data structuring.
5 (Outstanding) - Exceptionally implemented
- Data is highly reliable.
- Expert use of yfinance to obtain comprehensive and precise financial data.
- Expert web scraping with selenium, capturing detailed and accurate data from Yahoo Finance without fail.
- Expert use of pandas to create exceptionally well-organized DataFrames that facilitate easy analysis.
Back to top