Lecture 1

Syllabus, Course Outline, and Introduction

Byeong-Hak Choe

SUNY Geneseo

August 27, 2024

Instructor

Instructor

Current Appointment & Education

  • Name: Byeong-Hak Choe.

  • Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.

  • Ph.D. in Economics from University of Wyoming.

  • M.S. in Economics from Arizona State University.

  • M.A. in Economics from SUNY Stony Brook.

  • B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.

    • Minor in Business Administration.
    • Concentration in Finance.

Instructor

Economics and Data Science

  • Choe, B.H., 2021. “Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange and Energy Lobbies.”

  • Question: To what extent do social media campaigns compete with fossil fuel lobbying on climate change legislation?

  • Data include:

    • 5.0 million tweets with #climatechange/#globalwarming around the globe;
    • 12.0 million retweets/likes to those tweets;
    • 0.8 million Twitter users who wrote those tweets;
    • 1.4 million Twitter users who retweeted or liked those tweets;
    • 0.3 million US Twitter users with their location at a city level;
    • Firm-level lobbying data (expenses, targeted bills, etc.).

Instructor

Economics and Data Science

  • Choe, B.H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World”

  • Climate finance refers to the financial resources allocated for mitigating and adapting to climate change, including support for initiatives that reduce greenhouse gas emissions and enhance resilience to climate impacts.

    • We focus on transnational financing that rich countries provide poor countries with financial resources, in order to help them adapt to climate change and mitigate greenhouse gas (GHG) emissions.
    • Since the GHG emissions in developing countries are rapidly growing, it is crucial to assess the effectiveness of climate finance.
    • Poor governance can be significant barriers to emissions reductions within these countries.

Instructor

Economics and Data Science

  • Choe, B.H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World”

  • Data include:

    • Global climate finance data (e.g., donors, recipients, characteristics of climate change projects)
    • World Bank Governance Indicators over the years (e.g., government effectiveness, voice and accountability, political stability and absence of violence/terrorism, regulatory quality, rule of law, control of corruption)
    • Various economic indicators (e.g., trade pattern of low carbon technology products, macroeconomic risks, energy)

Instructor

Economics and Data Science

  • Choe, B.H., James, Alex and Newbold, Steve, 2025. “Estimating the Value of Statistical Life (VSL) through Big Data”

  • VSL is the monetary value associated with reducing the risk of death.

    • How much value would that be? How can we measure it?

Syllabus

Syllabus

Email, Class & Office Hours

Syllabus

Course Description

  • This course offers a practical introduction to the data analytics process and methods.
  • The goal is to help you unlock the potential of data analysis and enhance your ability to transform data into a powerful decision-making tool.
  • You will develop foundational data analytics skills that will prepare you for a career or further studies in more advanced data analytics topics.
  • The course covers: (1) an introduction to data analytics thinking, (2) exploratory data analysis through data wrangling and visualization, and (3) an introduction to data science, social media analytics, and artificial intelligence (AI).
  • Throughout the course, you will gain hands-on experience with the Python and R programming languages, along with its associated data analysis libraries and AI tools.

Syllabus

Course Learning Outcomes

  • Grasp the basic principles of data analytics, including data types and data processing.
  • Gain introductory experience with programming languages commonly used in data analytics, such as Python or R.
  • Develop the ability to create and interpret various types of data visualizations.
  • Enhance critical thinking skills by learning to ask relevant questions and draw insights from data.
  • Apply data analytics techniques to solve real-world problems in various domains.

Syllabus

Reference Materials

Syllabus

Course Requirements

  • Homework: There will be six homework assignments.
  • Quiz: There will be in-class quizzes.
  • Class Participation: You are encouraged to ask questions and discuss topics during the class time.
  • Exam: There will be two short midterm exams and one comprehensive final exam.
  • Team presentation: There will be a team presentation.

Syllabus

Team Presentations

  • Each team will present on data storytelling with visualization.

  • Data storytelling with visualization is the art of communicating complex data insights in a clear, engaging, and impactful way by blending data analysis, visual design, and narrative techniques.

  • It goes beyond simply showing charts and graphs; it involves crafting a compelling story that guides the audience through the data, highlights key findings, and effectively conveys the intended message.

Syllabus

Team Presentations

  • Key aspects of data storytelling include:
    • Clarity: Make sure the visuals are easy to understand and focus on the most important information.
    • Structure: Just like a story, start with an introduction, present the main points, and finish with a conclusion.
    • Engagement: Tailor the story to your audience so it is relevant and interesting to them.

Syllabus

Class Schedule

  • There will be tentatively 42 class sessions.
    • The midterm exam 1 is scheduled on October 9, 2024, Wednesday, during the class time.
    • The midterm exam 2 is scheduled on November 8, 2024, Friday, during the class time.
    • The team presentation is scheduled during the first week of December, 2024.
    • The final exam is scheduled on December 16, 2024, Monday, noon–2:00 P.M.

Syllabus

Course Contents

Syllabus

Course Contents

Syllabus

Course Contents

Syllabus

Grading

\[ \begin{align} (\text{Total Percentage Grade}) =&\;\, 0.05\times(\text{Attendance}) \notag\\ &\,+\, 0.15\times(\text{Quiz & Class Participation})\notag\\ & \,+\, 0.15\times(\text{Homework})\notag\\ &\,+\, 0.15\times(\text{Presentation})\notag\\ & \,+\, 0.50\times(\text{Exam}).\notag \end{align} \]

Syllabus

Grading

  • You are allowed up to 6 absences without penalty.

    • Send me an email if you have standard excused reasons (illness, family emergency, transportation problems, etc.).
  • For each absence beyond the initial six, there will be a deduction of 1% from the Total Percentage Grade.

  • The single lowest homework score will be dropped when calculating the total homework score.

    • Each homework except for the homework with the lowest score accounts for 20% of the total homework score.

Syllabus

Grading

\[ \begin{align} &(\text{Total Exam Score}) \\ =\, &\text{max}\,\left\{0.50\times(\text{Midterm Exam Score}) \,+\, 0.50\times(\text{Final Exam Score})\right.,\notag\\ &\qquad\;\,\left.0.25\times(\text{Midterm Exam Score}) \,+\, 0.75\times(\text{Final Exam Score})\right\}.\notag \end{align} \]

  • The total exam score is the maximum between
    1. the simple average of the midterm exam score and the final exam score and
    2. the weighted average of them with one-fourth weight on the midterm exam score and three-third weight on the final exam score:

Syllabus

Make-up Policy

  • Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.

  • If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.

  • A missed exam without an excused absence earns a grade of zero.

  • Late submissions for homework assignment will be accepted with a penalty.

  • A zero will be recorded for a missed assignment.

Syllabus

Academic Integrity and Plagiarism

  • All homework assignments and exams must be the original work by you.

  • Examples of academic dishonesty include:

    • representing the work, thoughts, and ideas of another person as your own
    • allowing others to represent your work, thoughts, or ideas as theirs, and
    • being complicit in academic dishonesty by suspecting or knowing of it and not taking action.
  • Geneseo’s Library offers frequent workshops to help you understand how to paraphrase, quote, and cite outside sources properly.

Syllabus

Artificial Intelligence (AI) Policy

  • Unless AI tools are explicitly permitted for homework or in-class quizzes, you must complete your work independently.

  • This means you should not use tools like ChatGPT for any aspect of our coursework.

  • Such use is a form of academic dishonesty. Use of such tools is not only cheating, it will also cheat you of the opportunity to learn and develop your own skills.

  • While AI will undoubtedly play important roles in our future society, you will be better able to utilize AI if you have developed your own critical thinking, writing, and analytical skills by doing your own work.

  • If you have any questions about this, please ask.

Syllabus

Accessibility

  • The Office of Accessibility will coordinate reasonable accommodations for persons with physical, emotional, or cognitive disabilities to ensure equal access to academic programs, activities, and services at Geneseo.

  • Please contact me and the Office of Accessibility Services for questions related to access and accommodations.

Syllabus

Career Design

  • To get information about career development, you can visit the Career Development Events Calendar (https://www.geneseo.edu/career_development/events/calendar).

  • You can stop by South 112 to get assistance in completing your Handshake Profile https://app.joinhandshake.com/login.

    • Handshake is ranked #1 by students as the best place to find full-time jobs.
    • 50% of the 2018-2020 graduates received a job or internship offer on Handshake.
    • Handshake is trusted by all 500 of the Fortune 500.

Prologue

Why Data Analytics?

  • Fill in the gaps left by traditional business and economics classes.
    • Practical skills that will benefit your future career.
    • Neglected skills like how to actually find datasets in the wild and clean them.
  • Data analytics skills are largely distinct from (and complementary to) the core quantitative works familiar to business undergrads.
    • Data visualization, cleaning and wrangling; databases; machine learning; etc.
  • In short, we will cover things that I wish someone had taught me when I was undergraduate.

You, at the end of this course

Why Data Analytics?

  • Data analysts use analytical tools and techniques to extract meaningful insights from data.
    • Skills in data analytics are also useful for business analysts or market analysts.
  • Breau of Labor Statistics forecasts that the projected growth rate of the employment in the industry related to data analytics from 2021 to 2031 is 36%.
    • The average growth rate for all occupations is 5%.

Why R, Python, and Databases?

Why R, Python, and Databases?

  • Stack Overflow is the most popular Q & A website specifically for programmers and software developers in the world.

  • See how programming languages have trended over time based on use of their tags in Stack Overflow from 2008 to 2022.

Data Science and Big Data

The State of the Art

Generative AI and ChatGPT

Data Science and Big Data Trend

From 2008 to 2023

Programmers in 2024

The State of the Art

Generative AI and ChatGPT

  • Generative AI refers to a category of artificial intelligence (AI) that is capable of generating new content, ranging from text, images, and videos to music and code.
  • In the early 2020s, advances in transformer-based deep neural networks enabled a number of generative AI systems notable for accepting natural language prompts as input.
    • These include large language model (LLM) chatbots (e.g., ChatGPT, Claude, Copilot, Gemini, LLaMA, Grok).
  • ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by OpenAI and launched on November 30, 2022.
    • By January 2023, it had become what was then the fastest-growing consumer software application in history.

The State of the Art

Generative AI and ChatGPT

  • Users around the world have explored how to best utilize GPT for writing essays and programming codes.
  • Is AI a threat to data analytics?
    • Fundamental understanding of the subject matter is still crucial for effectively utilizing AI’s capabilities.
  • If you use Generative AI such as ChatGPT, please try to understand what ChatGPT gives you.
    • Copying and pasting it without any understanding harms your learning opportunity.

DANL Tools

What is Git?

\(\quad\)

  • Git is the most popular version control tool for any software development.
    • It tracks changes in a series of snapshots of the project, allowing developers to revert to previous versions, compare changes, and merge different versions.
    • It is the industry standard and ubiquitous for coding collaboration.

What is GitHub?

  • GitHub is a web-based hosting platform for Git repositories to store, manage, and share code.

  • Out class website is hosted on a GitHub repository.

  • Course contents will be posted not only in Brightspace but also in our GitHub repositories (“repos”) and websites.

  • Github is useful for many reasons, but the main reason is how user friendly it makes uploading and sharing code.

What is GitHub?

What is R?

  • R is a programming language and software environment designed for statistical computing and graphics.

  • R has become a major tool in data analysis, statistical modeling, and visualization.

    • It is widely used among statisticians and data scientists for developing statistical software and performing data analysis.
    • R is open source and freely available.

What is RStudio?

  • RStudio is an integrated development environment (IDE) for R.
    • An IDE is a software application that provides comprehensive facilities (e.g., text code editor, graphical user interface (GUI)) to computer programmers for software development.
  • RStudio is a user-friendly interface that makes using R easier and more interactive.
    • It provides a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, and workspace management.
  • We will use a free cloud version of RStudio, which is Posit Cloud.

What is Python?

  • Python is a versatile programming language known for its simplicity and readability.

  • Python has become a dominant tool in various fields including data analysis, machine learning, and web development.

    • It is widely used among developers, data scientists, and researchers for building applications and performing data-driven tasks.
    • Python is open source and has a vast ecosystem of libraries and frameworks.

What is Jupyter?

  • Jupyter is an open-source integrated development environment (IDE) primarily for Python, though it supports many other languages.
    • Jupyter provides a notebook interface that allows users to write and execute code in a more interactive and visual format.
  • Jupyter Notebook is a user-friendly environment that enhances coding, data analysis, and visualization.
    • It offers a web-based interface that combines live code, equations, visualizations, and narrative text.
    • Jupyter is widely used for data science, machine learning, and research, enabling easy sharing and collaboration.
  • We will use a free cloud version of Jupyter, which is Google Colab.

Python vs. R