Syllabus, Course Outline, and Introduction
January 22, 2025
Name: Byeong-Hak Choe.
Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.
Ph.D. in Economics from University of Wyoming.
M.S. in Economics from Arizona State University.
M.A. in Economics from SUNY Stony Brook.
B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.
Email: bchoe@geneseo.edu
Class Homepage:
Office: South Hall 227B
Office Hours:
This course is designed to provide a comprehensive overview of data handling techniques, focusing on practical application through case studies.
Key topics include:
These areas will be explored through detailed, real-world examples to address common data analysis challenges.
Throughout the course, students will gain hands-on experience with Python and its data analysis libraries, along with practical applications of git and GitHub.
Python for Data Analysis (3rd Edition) by Wes McKinney
Python Programming for Data Science by Tomas Beuzen
Coding for Economists by Arthur Turrell
Python for Econometrics in Economics by Fabian H. C. Raters
QuantEcon DataScience - Python Fundamentals by Chase Coleman, Spencer Lyon, and Jesse Perla
QuantEcon DataScience - pandas by Chase Coleman, Spencer Lyon, and Jesse Perla
Laptop: You should bring your own laptop (Mac or Windows) to the classroom.
Homework: There will be six homework assignments.
Project: There will be one project on a personal website.
Exams: There will be two Midterm Exams and one Final Exam.
Discussions: You are encouraged to participate in GitHub-based online discussions and class discussion, and office hours.
You will create your own website using Quarto, R Studio, and Git.
You will publish your homework assignments and team project on your website.
Your website will be hosted in GitHub.
The basics in Markdown will be discussed.
References:
Team formation is scheduled for late March.
The project report should include data collection and exploratory data analysis using summary statistics, visual representations, and data wrangling.
The document for the team project must be published in each member’s website.
Any changes to team composition require approval from Byeong-Hak Choe.
There will be tentatively 42 class sessions.
The Midterm Exam I is scheduled on February 28, 2025, Friday, during the class time.
The Midterm Exam II is scheduled on April 9, 2025, Wednesday, during the class time.
The Final Exam is scheduled on May 14, Wednesday, 8:30 A.M.–10:30 A.M.
No class on
The due for the team project is May 16, 2025, Friday, 11:59 P.M., Eastern Time
pandas
basics.pandas
.\[ \begin{align} (\text{Total Percentage Grade}) =&\quad\;\, 0.05\times(\text{Attendance Score})\notag\\ &\,+\, 0.05\times(\text{Participation Score})\notag\\ &\,+\, 0.15\times(\text{Project and Website Score})\notag\\ &\,+\, 0.25\times(\text{Total Homework Score})\notag\\ &\,+\, 0.50\times(\text{Total Exam Score}).\notag \end{align} \]
You are allowed up to 5 absences without penalty.
For each absence beyond the initial five, there will be a deduction of 1% from the Total Percentage Grade.
Participation will be evaluated by quantity and quality of GitHub-based online discussions and in-person discussion.
The single lowest homework score will be dropped when calculating the total homework score.
\[ \begin{align} &(\text{Midterm Exam Score}) \\ =\, &\text{max}\,\left\{0.50\times(\text{Midterm Exam I Score}) \,+\, 0.50\times(\text{Midterm Exam II Score})\right.,\notag\\ &\qquad\;\,\left.0.25\times(\text{Midterm Exam I Score}) \,+\, 0.75\times(\text{Midterm Exam II Score})\right\}.\notag \end{align} \]
\[ \begin{align} &(\text{Total Exam Score}) \\ =\, &\text{max}\,\left\{0.50\times(\text{Midterm Exam Score}) \,+\, 0.50\times(\text{Final Exam Score})\right.,\notag\\ &\qquad\;\,\left.0.25\times(\text{Midterm Exam Score}) \,+\, 0.75\times(\text{Final Exam Score})\right\}.\notag \end{align} \]
Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.
If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.
A missed exam without an excused absence earns a grade of zero.
Late submissions for homework assignment will be accepted with a penalty.
A zero will be recorded for a missed assignment.
All homework assignments and exams must be the original work by you.
Examples of academic dishonesty include:
Geneseo’s Library offers frequent workshops to help you understand how to paraphrase, quote, and cite outside sources properly.
The Office of Accessibility will coordinate reasonable accommodations for persons with physical, emotional, or cognitive disabilities to ensure equal access to academic programs, activities, and services at Geneseo.
Please contact me and the Office of Accessibility Services for questions related to access and accommodations.
You are strongly encouraged to communicate your needs to faculty and staff and seek support if you are experiencing unmanageable stress or are having difficulties with daily functioning.
Liz Felski, the School of Business Student Advocate (felski@geneseo.edu, South Hall 303), or the Dean of Students (585-245-5706) can assist and provide direction to appropriate campus resources.
For more information, see https://www.geneseo.edu/dean_students.
To get information about career development, you can visit the Career Development Events Calendar (https://www.geneseo.edu/career_development/events/calendar).
You can stop by South 112 to get assistance in completing your Handshake Profile https://app.joinhandshake.com/login.
From 2008 to 2023
\(\quad\)
GitHub is a web-based hosting platform for Git repositories to store, manage, and share code.
Out class website is hosted on a GitHub repository.
Course contents will be posted not only in Brightspace but also in our GitHub repositories (“repos”) and websites.
Github is useful for many reasons, but the main reason is how user friendly it makes uploading and sharing code.
Python is a versatile programming language known for its simplicity and readability.
Python has become a dominant tool in various fields including data analysis, machine learning, and web development.
*.ipynb
) is a user-friendly environment that enhances coding, data analysis, and visualization.
RStudio-*.dmg
file.tidyverse
tidyverse
is a collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures.
tidyverse
packages work harmoniously together to make data manipulation, exploration, and visualization more.tidyverse
throughout the course. (e.g., ggplot2
, dplyr
, tidyr
)install.packages("packageName")
install.packages("packageName")
.
tidyverse
, type and run the following from R console:no
in the R Console, and then hit Enter.library(packageName)
library(packageName)
so that its functions and data can be used.
tidyverse
, type and run the following command from a R script:mpg
is the data.frame provided by the R package ggplot2
, one of the R pakcages in tidyverse
.Jupyter Notebook, Quarto, and GitHub-based Discussion Boards use markdown as its underlying document syntax.
Let’s do Classwork 2.