Syllabus, Course Outline, and DANL Career
January 22, 2025
Name: Byeong-Hak Choe.
Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.
Ph.D. in Economics from University of Wyoming.
M.S. in Economics from Arizona State University.
M.A. in Economics from SUNY Stony Brook.
B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.
Email: bchoe@geneseo.edu
Class Homepage:
Office: South Hall 227B
Office Hours:
This course teaches you how to analyze big data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark!
The top technology companies like Google, Facebook, Netflix, Airbnb, 3 Amazon, and many more are all using Spark to solve their big data problems!
With the Spark 3.0 DataFrame framework, it can perform up to 100x faster than Hadoop MapReduce.
This course will review the basics in Python, continuing on to learning how to use Spark DataFrame API with the latest Spark 3.0 syntax!
In addition, you will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
You will also gain hands-on experience using PySpark within the Jupyter Notebook environment. This course also covers the latest Spark technologies, like Spark SQL, Spark Streaming, and advanced data analytics modeling methodologies.
Introduction to pyspark
by Pedro Duarte Faria
Python Programming for Data Science by Tomas Beuzen
Coding for Economists by Arthur Turrell
Python for Econometrics in Economics by Fabian H. C. Raters
QuantEcon DataScience - Python Fundamentals by Chase Coleman, Spencer Lyon, and Jesse Perla
QuantEcon DataScience - pandas by Chase Coleman, Spencer Lyon, and Jesse Perla
Laptop: You should bring your own laptop (Mac or Windows) to the classroom.
Homework: There will be six homework assignments.
Project: There will be one project on a personal website.
Exams: There will be one Midterm Exam.
Participation: You are encouraged to participate in GitHub-based online discussions and class discussion, and office hours.
You will create your own website using Quarto, R Studio, and Git.
You will publish your homework assignments and team project on your website.
Your website will be hosted in GitHub.
The basics in Markdown will be discussed.
References:
Team formation is scheduled for early April.
For the team project, a team must choose data related to business or socioeconomic issues.
The project report should include both (1) exploratory data analysis using summary statistics, visual representations, and data wrangling, and (2) machine learning analysis.
The document for the team project must be published in each member’s website.
Any changes to team composition require approval from Byeong-Hak Choe.
There will be tentatively 28 class sessions.
The Midterm Exam is scheduled on March 31, 2025, Wednesday, during the class time.
The Project Presentation is scheduled on May 9, 2025, Friday, 3:30 P.M.-5:30 P.M.
The due for the Project write-up is May 16, 2024, Friday.
\[ \begin{align} (\text{Total Percentage Grade}) =&\quad\;\, 0.05\times(\text{Total Attendance Score})\notag\\ &\,+\, 0.05\times(\text{Total Participation Score})\notag\\ &\,+\, 0.10\times(\text{Website Score})\notag\\ &\,+\, 0.30\times(\text{Total Homework Score})\notag\\ &\,+\, 0.50\times(\text{Total Exam and Project Score}).\notag \end{align} \]
You are allowed up to 2 absences without penalty.
For each absence beyond the initial two, there will be a deduction of 1% from the Total Percentage Grade.
Participation will be evaluated by quantity and quality of GitHub-based online discussions and in-person discussion.
The single lowest homework score will be dropped when calculating the total homework score.
Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.
If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.
A missed exam without an excused absence earns a grade of zero.
Late submissions for homework assignment will be accepted with a penalty.
A zero will be recorded for a missed assignment.
:::
Jaehyung Lee (Andy), Class of 2022