Syllabus, Course Outline, and Introduction
January 22, 2025
Name: Byeong-Hak Choe.
Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.
Ph.D. in Economics from University of Wyoming.
M.S. in Economics from Arizona State University.
M.A. in Economics from SUNY Stony Brook.
B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.
Email: bchoe@geneseo.edu
Class Homepage:
Office: South Hall 227B
Office Hours:
This course delves into the tools and methodologies essential for creating visually engaging and informative data representations. Its focus is on enhancing data comprehension and facilitating effective data analytics through aesthetically pleasing graphics. The curriculum includes:
Key topics include:
These areas will be explored through detailed, real-world examples to address common data analysis challenges.
Throughout the course, practical experience is emphasized, with hands-on projects using tools like R, Python, RStudio, Quarto, Jupyter Notebook, Shiny, Git, and GitHub.
Data Visualization: A Practical Introduction by Kieran Healy
Python Programming for Data Science by Tomas Beuzen
Coding for Economists by Arthur Turrell
Python for Econometrics in Economics by Fabian H. C. Raters
QuantEcon DataScience - Python Fundamentals by Chase Coleman, Spencer Lyon, and Jesse Perla
QuantEcon DataScience - pandas by Chase Coleman, Spencer Lyon, and Jesse Perla
Laptop: You should bring your own laptop (Mac or Windows) to the classroom.
Homework: There will be six homework assignments.
Project: There will be one project presentation and a write-up on a personal website.
Exams: There will be one Midterm Exam.
Discussions: You are encouraged to participate in GitHub-based online discussions and class discussion, and office hours.
You will create your own website using Quarto, R Studio, and Git.
You will publish your homework assignments and team project on your website.
Your website will be hosted in GitHub.
The basics in Markdown will be discussed.
References:
Team formation is scheduled for late March.
For the team project, a team must choose data related to business or socioeconomic issues.
The project report should include exploratory data analysis using summary statistics, visual representations, and data wrangling.
The document for the team project must be published in each member’s website.
The project for the team project must include a Shiny dashboard.
Any changes to team composition require approval from Byeong-Hak Choe.
There will be tentatively 28 class sessions.
The Midterm Exam is scheduled on March 31, 2025, Wednesday, during the class time.
The Project Presentation is scheduled on May 9, 2025, Friday, 3:30 P.M.-5:30 P.M.
The due for the Project write-up is May 16, 2024, Friday.
\[ \begin{align} (\text{Total Percentage Grade}) =&\quad\;\, 0.05\times(\text{Total Attendance Score})\notag\\ &\,+\, 0.05\times(\text{Total Participation Score})\notag\\ &\,+\, 0.10\times(\text{Website Score})\notag\\ &\,+\, 0.30\times(\text{Total Homework Score})\notag\\ &\,+\, 0.50\times(\text{Total Exam and Project Score}).\notag \end{align} \]
You are allowed up to 2 absences without penalty.
For each absence beyond the initial two, there will be a deduction of 1% from the Total Percentage Grade.
Participation will be evaluated by quantity and quality of GitHub-based online discussions and in-person discussion.
The single lowest homework score will be dropped when calculating the total homework score.
Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.
If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.
A missed exam without an excused absence earns a grade of zero.
Late submissions for homework assignment will be accepted with a penalty.
A zero will be recorded for a missed assignment.
RStudio-*.dmg
file.Script Pane is where you write R commands in a script file that you can save.
tidyverse
R packages are collections of R functions, compiled code, and data that are combined in a structured format.
The tidyverse
is a collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures.
tidyverse
packages work harmoniously together to make data manipulation, exploration, and visualization more.tidyverse
throughout the course. (e.g., ggplot2
, dplyr
, tidyr
)install.packages("packageName")
install.packages("packageName")
.
tidyverse
, type and run the following from R console:no
in the R Console, and then hit Enter.library(packageName)
library(packageName)
so that its functions and data can be used.
tidyverse
, type and run the following command from a R script:mpg
is the data.frame provided by the R package ggplot2
, one of the R pakcages in tidyverse
.Mac
<-
.Windows
<-
.o or Esc overviews lecture slides
You can also click the menu button at the top-right corner, and go to a specific slide.
Ctrl + Shift + F to search.
::: {.panel-tabset} ## Python Option
We can run Python codes within RStudio.
Select Python interpreter in RStudio from Tools \(>\) Global Options \(>\) Python:
reticulate
reticulate
Quarto
to use Python and R objects interactively within one Quarto document.