import pandas as pd
= "https://bcdanl.github.io/data/esg_proj_2024_data.csv"
url_2024 = pd.read_csv(url_2024) esg_proj_2024_data
Unifying Environmental, Social, and Governance (ESG) Metrics with Financial Analysis
DANL 210 Project
Project
- Collect the following data from Yahoo! Finance:
- Environmental, Social, and Governance (ESG) data
- Stock market data
- Environmental, Social, and Governance (ESG) data
- Publish a webpage presenting your data analysis project on your personal website, hosted via GitHub.
- Your analysis should center around the agenda: “Unifying ESG Metrics with Financial Analysis”
- Present your work using a Jupyter Notebook.
- Due Date: May 16, 2025 (Friday) by 11:59 P.M.
Project Data
- Below is the
esg_proj_2024_data
DataFrame and theesg_proj_2025
DataFrame.
1. esg_proj_2024_data
esg_proj_2024_data
DataFrame which provides a list of companies and associated information, including ESG scores.
Variable Description
Symbol
: a company’s ticker;Company Name
: a company name;Sector
: a sector a company belongs to;Industry
: an industry a company belongs to;Country
: a country a company belongs to;Market Cap
: a company’s market capitalization as of December 20, 2024 (Source: Nasdaq’s Stock Screener).- A company’s market capitalization is the value of the company that is traded on the stock market, calculated by multiplying the total number of shares by the present share price.
IPO_Year
: the year a company first went public by offering its shares to be traded on a stock exchange.total_ESG
: The overall ESG (Environmental, Social, and Governance) risk score, summarizing the company’s exposure to ESG-related risks. A lower score indicates lower risk.Environmental
: The company’s exposure to environmental risks (e.g., emissions, energy use, environmental policy).Social
: The company’s exposure to social risks (e.g., labor practices, human rights, diversity, and customer relations).Governance
: The company’s exposure to governance-related risks (e.g., board structure, executive pay, shareholder rights, transparency).Controversy
: A score reflecting the severity of recent ESG-related controversies involving the company. Higher scores typically indicate greater or more serious controversies.
2. esg_proj_2025
esg_proj_2025
DataFrame provides a list of companies and associated information.
= "https://bcdanl.github.io/data/esg_proj_2025.csv"
url_2025 = pd.read_csv(url_2025) esg_proj_2025
Project Tasks
1. Data Collection
- For data collection, consider the
esg_proj_2025
DataFrame.
1. selenium
- For each company in the
esg_proj_2025
DataFrame, employ the Pythonselenium
library to gather ESG Risk Ratings, along with the Controversy Level from the Sustainability section of each company’s webpage on Yahoo! Finance, such as: - Tips for Collecting Data from Yahoo! Finance’s Sustainability Section:
- To avoid getting blocked for automated activity, use
time.sleep(random.uniform(x, y))
between page visits. - Explicit waits are not required, but they are helpful for ensuring elements load before scraping.
- Be aware that some companies may not have data available for Environmental, Social, or Governance Risk Scores, or Controversy Level.
- To avoid getting blocked for automated activity, use
2. yfinance
- For each company found in both the
esg_proj_2024_data
andesg_proj_2025
DataFrames, employ theyfinance
library to retrieve:- Daily stock prices from January 1, 2024, to March 31, 2025
- Quarterly income statements for the five quarters ending March 31, 2025 (starting from Q1 2024)
- Quarterly balance sheets for the same five-quarter period
2. Data Analysis
Below are the key components in the data analysis webpage.
Title: A clear and concise title that gives an idea of the project topics.
Introduction:
- Background: Provide context for the research questions, explaining why they are significant, relevant, or interesting.
- Statement of the Problem: Clearly articulate the specific problem or issue the project will address.
Data Collection: Use a Python script (
*.py
) to write the code and the comment on how to retrieve financial, accounting, and ESG data using Pythonyfinance
andselenium
.- Do NOT provide your code for data collection in your webpage. You should submit your Python script for data collection to Brightspace.
Descriptive Statistics
- Provide both grouped and un-grouped descriptive statistics and distribution plots for the ESG data and the finance/accounting data
- Provide correlation heat maps using
corr()
andseaborn.heatmap()
. Below provides the Python code for creating a correlation heatmap.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Sample DataFrame with varied correlations
= {
data 'Revenue': [100, 200, 300, 400, 500],
'Profit': [20, 40, 60, 80, 100],
'n_Employee': [50, 45, 40, 35, 30],
'n_Customer': [10, 11, 12, 13, 14]
}
# Create a DataFrame from the dictionary
= pd.DataFrame(data)
df
# Calculate the correlation matrix of the DataFrame
= df.corr()
corr
# Set up the matplotlib figure size
=(8, 6))
plt.figure(figsize
# Generate a heatmap in seaborn:
# - 'corr' is the correlation matrix
# - 'annot=True' enables annotations inside the squares with the correlation values
# - 'cmap="coolwarm"' assigns a color map from cool to warm (blue to red)
# - 'fmt=".2f"' formats the annotations to two decimal places
# - 'linewidths=.5' adds lines between each cell
=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
sns.heatmap(corr, annot
# Title of the heatmap
'Correlation Heatmap with Varied Correlations')
plt.title(
# Display the heatmap
plt.show()
- Exploratory Data Analysis:
- Explore the trend of ESG scores from 2024 to 2025.
- Additionally, list the questions you aim to answer.
- Address the questions through data visualization with
seaborn
(orlets-plot
) andpandas
methods and attributes.
- Significance of the Project:
- Explain its implications for real-world applications, business strategies, or public policy.
- References
- List all sources cited in the project.
- Leave a web address of the reference if that is from the web.
- Indicate if the code and the write-up are guided by generative AI, such as ChatGPT. There will be no penalties on using any generative AI.
- Clearly state if the code and the write-up result from collaboration with colleagues. There will be no penalties for collaboration, provided that the shared portions are clearly indicated.
🧾 Interpreting ESG Data on Yahoo Finance
In Yahoo Finance, the ESG data helps investors evaluate a company’s sustainability profile and exposure to long-term environmental, social, and governance risks. Here’s how to interpret each metric:
🔢 Total ESG Risk Score
- What it means: A composite score reflecting the company’s overall exposure to ESG-related risks.
- How to interpret:
- Lower score = lower risk → Better ESG performance.
- Higher score = higher risk → More vulnerable to ESG-related issues.
- Example: A company with total_ESG = 20 is considered to have lower ESG risk than one with total_ESG = 45.
- Lower score = lower risk → Better ESG performance.
🌍 Environmental Risk Score
- What it measures: Exposure to environmental risks such as:
- Carbon emissions
- Energy efficiency
- Waste management
- Climate change strategy
- Carbon emissions
- Interpretation:
- Lower score → better environmental practices.
- Higher score → more environmental liabilities or poor sustainability measures.
🏛 Governance Risk Score
- What it measures: Exposure to governance-related risks, such as:
- Board structure and independence
- Executive compensation
- Shareholder rights
- Transparency and ethics
- Board structure and independence
- Interpretation:
- Lower score suggests better corporate oversight.
- Higher score suggests poor governance structures.
🚨 Controversy Level
- What it measures: Reflects recent ESG-related controversies involving the company.
- Scale: Usually ranges from 0 (no controversies) to 5 (severe and ongoing issues).
- Interpretation:
- Low score (0–1): Minimal or no controversies.
- High score (4–5): Major controversies — potential reputational or legal risks.
- Note: A company may have good ESG scores but still be flagged due to a high controversy score.
🧠 ESG Score Summary
Metric | Good Score | Bad Score |
---|---|---|
total_ESG | Low | High |
Environmental | Low | High |
Social | Low | High |
Governance | Low | High |
Controversy | 0–1 | 4–5 |
Rubric
Project Write-up
Attribute | Very Deficient (1) | Somewhat Deficient (2) | Acceptable (3) | Very Good (4) | Outstanding (5) |
---|---|---|---|---|---|
1. Quality of research questions | • Not stated or very unclear • Entirely derivative • Anticipate no contribution |
• Stated somewhat confusingly • Slightly interesting, but largely derivative • Anticipate minor contributions |
• Stated explicitly • Somewhat interesting and creative • Anticipate limited contributions |
• Stated explicitly and clearly • Clearly interesting and creative • Anticipate at least one good contribution |
• Articulated very clearly • Highly interesting and creative • Anticipate several important contributions |
2. Quality of data visualization | • Very poorly visualized • Unclear • Unable to interpret figures |
• Somewhat visualized • Somewhat unclear • Difficulty interpreting figures |
• Mostly well visualized • Mostly clear • Acceptably interpretable |
• Well organized • Well thought-out visualization • Almost all figures clearly interpretable |
• Very well visualized • Outstanding visualization • All figures clearly interpretable |
3. Quality of exploratory data analysis | • Little or no critical thinking • Little or no understanding of data analytics concepts with Python |
• Rudimentary critical thinking • Somewhat shaky understanding of data analytics concepts with Python |
• Average critical thinking • Understanding of data analytics concepts with Python |
• Mature critical thinking • Clear understanding of data analytics concepts with Python |
• Sophisticated critical thinking • Superior understanding of data analytics concepts with Python |
4. Quality of business/economic analysis | • Little or no critical thinking • Little or no understanding of business/economic concepts |
• Rudimentary critical thinking • Somewhat shaky understanding of business/economic concepts |
• Average critical thinking • Understanding of business/economic concepts |
• Mature critical thinking • Clear understanding of business/economic concepts |
• Sophisticated critical thinking • Superior understanding of business/economic concepts |
5. Quality of writing | • Very poorly organized • Very difficult to read • Many typos and grammatical errors |
• Somewhat disorganized • Somewhat difficult to read • Numerous typos and grammatical errors |
• Mostly well organized • Mostly easy to read • Some typos and grammatical errors |
• Well organized • Easy to read • Very few typos or grammatical errors |
• Very well organized • Very easy to read • No typos or grammatical errors |
6. Quality of Jupyter Notebook usage | • Very poorly organized • Many redundant warning/error messages • Inappropriate code to produce outputs |
• Somewhat disorganized • Numerous warning/error messages • Misses important code |
• Mostly well organized • Some warning/error messages • Provides appropriate code |
• Well organized • Very few warning/error messages • Provides advanced code |
• Very well organized • No warning/error messages • Proposes highly advanced code |
Data Collection
Evaluation | Description | Criteria |
---|---|---|
1 (Very Deficient) | - Very poorly implemented - Data is unreliable. |
- Ineffective use of yfinance , resulting in incomplete or inaccurate financial data.- Poor web scraping practices with selenium , leading to unreliable or incorrect data from Yahoo Finance.- Inadequate use of pandas , resulting in poorly structured DataFrames. |
2 (Somewhat Deficient) | - Somewhat effective implementation - Data has minor reliability issues. |
- Basic use of yfinance with minor inaccuracies in data retrieval.- Basic web scraping with selenium that sometimes fails to capture all relevant data accurately.- Basic use of pandas , but with occasional issues in data structuring. |
3 (Acceptable) | - Competently implemented - Data is mostly reliable. |
- Competent use of yfinance to retrieve most financial data accurately.- Effective web scraping with selenium , capturing most required data from Yahoo Finance.- Adequate use of pandas to structure data in a mostly logical format. |
4 (Very Good) | - Well-implemented and organized - Data is reliable. |
- Advanced use of yfinance to reliably and accurately fetch financial data.- Thorough web scraping with selenium that consistently captures accurate and complete data from Yahoo Finance.- Skillful use of pandas for clear and logical data structuring. |
5 (Outstanding) | - Exceptionally implemented - Data is highly reliable. |
- Expert use of yfinance to obtain comprehensive and precise financial data.- Expert web scraping with selenium , capturing detailed and accurate data from Yahoo Finance without fail.- Expert use of pandas to create exceptionally well-organized DataFrames that facilitate easy analysis. |
👥 Social Risk Score