Lecture 1

Syllabus and Course Outline

Byeong-Hak Choe

SUNY Geneseo

January 21, 2026

Instructor

Current Appointment & Education

  • Name: Byeong-Hak Choe.

  • Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.

  • Ph.D. in Economics from University of Wyoming.

  • M.S. in Economics from Arizona State University.

  • M.A. in Economics from SUNY Stony Brook.

  • B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.

    • Minor in Business Administration.
    • Concentration in Finance.

Research 1: Social Media vs. Fossil Fuel Lobbying on Climate Legislation

  • Choe, B.-H., 2021. “Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange and Energy Lobbies.

  • Question: To what extent do social media campaigns compete with fossil fuel lobbying on climate change legislation?

  • Data include:

    • 5.0 million tweets with #climatechange/#globalwarming around the globe;
    • 12.0 million retweets/likes to those tweets;
    • 0.8 million Twitter users who wrote those tweets;
    • 1.4 million Twitter users who retweeted or liked those tweets;
    • 0.3 million US Twitter users with their location at a city level;
    • Firm-level lobbying data (expenses, targeted bills, etc.).

Research 2: Measuring the Value of Statistical Life Using Big Data

  • Choe, B.-H. and Newbold, Steve, “Estimating the Value of Statistical Life (VSL) through Big Data

  • VSL is the monetary value associated with reducing the risk of death.

    • How much value would that be? How can we measure it?
    • How do government agencies use the VSL to decide which policies are worth the cost when they reduce the risk of death?

Research 3: Governance Barriers to Effective Climate Finance

  • Choe, B.-H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World

  • Climate finance refers to the financial resources allocated for mitigating and adapting to climate change, including support for initiatives that reduce greenhouse gas emissions and enhance resilience to climate impacts.

    • We focus on transnational financing that rich countries provide poor countries with financial resources, in order to help them adapt to climate change and mitigate greenhouse gas (GHG) emissions.
    • Since the GHG emissions in developing countries are rapidly growing, it is crucial to assess the effectiveness of climate finance.
    • Poor governance (e.g., legal system, rule of law, and accountability) can be significant barriers to emissions reductions.

Research 4: Climate Finance with Renegotiations

  • Choe, B.-H., “Climate Finance under Limited Commitment and Renegotiations: A Dynamic Contract Approach” (2026)
    • Using climate funds (e.g., the Green Climate Fund) as the main mechanism for financing developing countries, this paper studies a long-term funding relationship between a rich and a poor country.
    • The parties disagree over (1) the level of funding and (2) how to allocate it between adaptation and mitigation. Commitment is limited, and contracts may be renegotiated each period with some probability.
    • Main results: Less frequent renegotiation improves contract efficiency and reduces inequality between the countries.
    • These findings suggest design principles for climate funds: strengthen commitment and limit renegotiation, especially as climate risks intensify.

Syllabus

Course & Meeting Info

  • Course: DANL 320-01 — Big Data Analytics (3 credits)
  • Semester: Spring 2026
  • Instructor: Byeong-Hak Choe
  • Email (preferred):
  • Class time: MW 3:30 P.M. – 4:45 P.M.
  • Class location: South 338
  • Office: South 227B
  • Office Hours: MWF 9:15 A.M. – 10:15 A.M. (or by appointment)

Course Website & Brightspace

Course Website (GitHub Pages)
(https://bcdanl.github.io/320)

  • Lecture slides
  • Classwork
  • Homework
  • Project resources

Brightspace

  • Announcements
  • Grades
  • Assignment submissions

I will post a Brightspace announcement whenever new course materials (e.g., homework) are added to the course website.

What This Course Is About

DANL 320 focuses on practical machine learning for large, real-world datasets using R.

“Big data” in this course does not only mean huge files — it also means data that are:

  • high-dimensional (many variables/features)
  • computationally demanding
  • hard to model or evaluate responsibly

Key skills you’ll build:

  • Build supervised learning models on large and messy datasets
  • Evaluate models using appropriate metrics (including imbalanced data)
  • Tune models using train/test and cross-validation strategies
  • Use unsupervised learning for high-dimensional datasets
  • Communicate results clearly and responsibly (interpretability, bias, limitations)

Methods you’ll practice

  • Supervised learning (prediction & inference)
    • Linear regression
    • Logistic regression
    • Regularization
    • Tree-based models (e.g., random forests)
  • Unsupervised learning (structure discovery)
    • Clustering
    • Principal component analysis (PCA)
    • Association rules

Prerequisites

DANL 300

Communication Guidelines

  • I check email daily (Mon–Fri)
  • Expect responses within 12–72 hours

Course Contents & Schedule

  • Weeks 1–2: Portfolio website (Git, GitHub, Quarto)
  • Weeks 2–4: Linear Regression
    • HW 1: Feb 11
  • Weeks 5–6: Logistic Regression
    • HW 2: Feb 25
  • Weeks 7–8: Classification
    • HW 3: Mar 9
    • Midterm Exam: March 11 (during class time)
  • Week 9: Spring Break (March 14–21)

Course Contents & Schedule

  • Weeks 10–11: Regularization
    • HW 4: Apr 8
  • Week 12-13: Trees and Forests
    • GREAT Day: April 22 (no class)
    • HW 4: Apr 27
  • Week 14-15: Unsupervised Learning
    • HW 6: May 6
  • Week 16: ML Project Presentations
    • May 4
    • May 6
  • Week 17: Final Exam + Project Report
    • Final Exam: May 13, 3:30 P.M. – 5:30 P.M.
    • Project Report Due: May 14

Grading

Grade components

  • Attendance — 5%
  • Participation — 5%
  • Group Project — 20%
  • Homework (total) — 20%
  • Exams (total) — 50%

Important grading details

  • Lowest homework score is dropped
  • Total exam grade is the maximum of:
    • Simple average of midterm and final, or
    • Weighted average (33% midterm / 67% final)

Grading Machine Learning Project

  • Peer evaluation on presentation: 5%
  • Instructor evaluation: 95%
    • Descriptive statistics (5%)
    • Data transformation (5%)
    • Data visualization (5%)
    • Data storytelling (10%)
    • Machine learning analysis (25%)
    • Presentation slides (10%)
    • Presentation (30%)
    • Code (10%)

Course Policies

  • Late work: Homework is accepted up to 3 days late with a 30% penalty.
  • Make-up exams are allowed only with documented excuses or university-excused absences.
  • Attendance is taken with a sign-in sheet
    • You must email me before missing class for illness or other valid reasons
    • You have 5 unexcused absences available in the semester
      • Beyond that, your total percentage grade may be reduced

Academic integrity & AI use

  • Cheating and plagiarism violate college policy and may result in serious consequences.
  • You may use AI tools (e.g., ChatGPT, Gemini) to support your learning, but you must:
    • Disclose AI use when it meaningfully contributes to your work
    • Cite/attribute AI-generated material appropriately
    • Using AI without disclosure/citation may be treated as academic dishonesty
  • There is no penalty for using AI on Homework and the Machine Learning Project.
    • However, you are still responsible for understanding, verifying, and being able to explain any AI-assisted work.

Accessibility

SUNY Geneseo is committed to equitable access. If you have approved accommodations, please coordinate through OAS and communicate with me early.

Course Tools

Useful Keyboard Shortcuts for Lecture Slides

  • CTRL + Shift + F: Search
  • M: Toggle menu
  • E: PDF export/printing mode

Let’s Practice Markdown!

  • Markdown is a lightweight writing format that lets you create well-structured documents using plain text (headings, lists, links, quotes, images, and code blocks).

  • Why Markdown is widely used on the internet:

    • Works everywhere: supported by many platforms, including Quarto, Jupyter Notebook, GitHub, Reddit, Slack/Discord/Obsidian, and documentation websites.
    • Great for publishing and future-proof: Markdown can be converted into professional webpages/PDFs, and because it is plain text, and it is easy to store and reuse.
  • In this course, Markdown is essential because you will write narrative explanations in Quarto (*.qmd) when creating your Machine Learning Report.

  • Start here: Classwork 1