Lecture 1

Syllabus, Course Outline, and DANL Career

Byeong-Hak Choe

SUNY Geneseo

January 22, 2025

Instructor

Instructor

Current Appointment & Education

  • Name: Byeong-Hak Choe.

  • Assistant Professor of Data Analytics and Economics, School of Business at SUNY Geneseo.

  • Ph.D. in Economics from University of Wyoming.

  • M.S. in Economics from Arizona State University.

  • M.A. in Economics from SUNY Stony Brook.

  • B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea.

    • Minor in Business Administration.
    • Concentration in Finance.

Instructor

Economics and Data Science

  • Choe, B.H., Newbold, S. and James, A., “Estimating the Value of Statistical Life through Big Data”
    • Question: How much is the society willing to pay to reduce the likelihood of fatality?
  • Choe, B.H., “Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange and Energy Lobbies.”
    • Question: To what extent do social media campaigns compete with fossil fuel lobbying on climate change legislation?
  • Choe, B.H. and Ore-Monago, T., 2024. “Governance and Climate Finance in the Developing World”
    • Question: In what ways and through what forms does poor governance act as a significant barrier to reducing greenhouse gas emissions in developing countries?

Syllabus

Syllabus

Email, Class & Office Hours

Syllabus

Course Description

  • This course teaches you how to analyze big data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark!

  • The top technology companies like Google, Facebook, Netflix, Airbnb, 3 Amazon, and many more are all using Spark to solve their big data problems!

  • With the Spark 3.0 DataFrame framework, it can perform up to 100x faster than Hadoop MapReduce.

Syllabus

Course Description

  • This course will review the basics in Python, continuing on to learning how to use Spark DataFrame API with the latest Spark 3.0 syntax!

  • In addition, you will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

  • You will also gain hands-on experience using PySpark within the Jupyter Notebook environment. This course also covers the latest Spark technologies, like Spark SQL, Spark Streaming, and advanced data analytics modeling methodologies.

Syllabus

Required Materials

Syllabus

Reference Materials - PySpark

Syllabus

Reference Materials - Sports Data

  • Introduction to Sports Analytics Using R by Ryan Elmore and Andrew Urbaczewski

Syllabus

Reference Materials - Python

Syllabus

Reference Materials - Website

Syllabus

Course Requirements

  • Laptop: You should bring your own laptop (Mac or Windows) to the classroom.

    • The minimum specification for your laptop in this course is 2+ core CPU, 4+ GB RAM, and 500+ GB disk storage.
  • Homework: There will be six homework assignments.

  • Project: There will be one project on a personal website.

  • Exams: There will be one Midterm Exam.

  • Participation: You are encouraged to participate in GitHub-based online discussions and class discussion, and office hours.

    • Checkout the netiquette policy in the syllabus.

Syllabus

Personal Website

  • You will create your own website using Quarto, R Studio, and Git.

  • You will publish your homework assignments and team project on your website.

  • Your website will be hosted in GitHub.

  • The basics in Markdown will be discussed.

  • References:

Syllabus

Team Project

  • Team formation is scheduled for early April.

    • Each team must have one to two students.
  • For the team project, a team must choose data related to business or socioeconomic issues.

  • The project report should include both (1) exploratory data analysis using summary statistics, visual representations, and data wrangling, and (2) machine learning analysis.

  • The document for the team project must be published in each member’s website.

  • Any changes to team composition require approval from Byeong-Hak Choe.

Syllabus

Class Schedule and Exams

  • There will be tentatively 28 class sessions.

  • The Midterm Exam is scheduled on March 31, 2025, Wednesday, during the class time.

  • The Project Presentation is scheduled on May 9, 2025, Friday, 3:30 P.M.-5:30 P.M.

  • The due for the Project write-up is May 16, 2024, Friday.

Syllabus

Course Contents

Syllabus

Course Contents

Syllabus

Grading

\[ \begin{align} (\text{Total Percentage Grade}) =&\quad\;\, 0.05\times(\text{Total Attendance Score})\notag\\ &\,+\, 0.05\times(\text{Total Participation Score})\notag\\ &\,+\, 0.10\times(\text{Website Score})\notag\\ &\,+\, 0.30\times(\text{Total Homework Score})\notag\\ &\,+\, 0.50\times(\text{Total Exam and Project Score}).\notag \end{align} \]

Syllabus

Grading

  • You are allowed up to 2 absences without penalty.

    • Send me an email if you have standard excused reasons (illness, family emergency, transportation problems, etc.).
  • For each absence beyond the initial two, there will be a deduction of 1% from the Total Percentage Grade.

  • Participation will be evaluated by quantity and quality of GitHub-based online discussions and in-person discussion.

  • The single lowest homework score will be dropped when calculating the total homework score.

Syllabus

Make-up Policy

  • Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.

  • If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.

  • A missed exam without an excused absence earns a grade of zero.

  • Late submissions for homework assignment will be accepted with a penalty.

  • A zero will be recorded for a missed assignment.

:::

DANL Career Session

DANL Career Session

Introduction: Share Your Background and Experience

Marcie Hogan, Class of 2023

Jaehyung Lee (Andy), Class of 2022

Jason Rappazzo, Class of 2023 - Part 1

Jason Rappazzo, Class of 2023 - Part 2

DANL Career Session

Q. What challenges have you encountered in your role at the workplace?

Marcie Hogan, Class of 2023

Jason Rappazzo, Class of 2023

Jaehyung Lee (Andy), Class of 2022

DANL Career Session

Q. How do you envision the future of AI?

Jaehyung Lee (Andy), Class of 2022

Marcie Hogan, Class of 2023

DANL Career Session

Q. How does working at a large company like Meta compare to your previous experiences?

Marcie Hogan, Class of 2023

DANL Career Session

Q. What has been the most fulfilling aspect of your work?

Jason Rappazzo, Class of 2023

Jaehyung Lee (Andy), Class of 2022

Marcie Hogan, Class of 2023

DANL Career Session

Q. Can you share advice for students who are preparing for internships or job applications?

Marcie Hogan, Class of 2023

Jaehyung Lee (Andy), Class of 2022

DANL Career Session

DANL Career Session - 1. Marcie Hogan

  • Background
    • Graduated from SUNY Geneseo in 2023 with a degree in Geography and minors in Mathematics and Physics.
    • Took several data analytics courses and was a tutor in the Data Analytics Lab.
    • Initially worked at Crown Castle as a Geospatial Data Analyst, including an internship, totaling about two years.
  • Career Path
    • At Crown Castle, Marcie managed database operations for spatial data, frequently working with Python and spatial packages.
    • In July 2024, she began a new role at Meta as a Community Project Manager within the Maps Organization, working on a contract basis.

DANL Career Session - 1. Marcie Hogan

  • Role at Meta
    • Works extensively on Mapillary, an open-source street-level imagery hosting platform acquired by Meta.
    • Conducts analytics to understand the user community and guide product development.
    • While her job title doesn’t explicitly mention data, data analytics is heavily integrated into her role.

DANL Career Session - 2. Jaehyung Lee (Andy)

  • Background
    • Graduated from SUNY Geneseo in 2022 with a major in Mathematics and a concentration in Data Analytics.
    • Served in the South Korean military for two years before resuming his academic and professional pursuits.
    • Took several data analytics courses and was a tutor in the Data Analytics Lab.

DANL Career Session - 2. Jaehyung Lee (Andy)

  • Invisible Technologies
    • Works as Analytics Associate at Invisible Technologies, a fast-growing startup that partners with leading AI platforms like OpenAI, Cohere, Google, and Character.AI.
    • The company specializes in refining AI models through techniques like Reinforcement Learning from Human Feedback (RLHF) and Supervised Fine-Tuning (SFT).

DANL Career Session - 2. Jaehyung Lee (Andy)

  • Role at Invisible Technologies
    • Holds monthly business reviews for executives.
    • Provides visibility on departmental targets for hiring, people operations, and client services.
    • Analyzes client data to ensure targets are met, particularly for clients like OpenAI.
    • Creates numerous data pipelines using Python and develops data models with SQL.

DANL Career Session - 2. Jaehyung Lee (Andy)

  • Projects
    • Self-Evaluation Project: Tracked over 20 personal metrics daily for two years to improve his lifestyle and productivity, such as wake-up times, tasks completed, reading, and meditation.
    • Used this project to enhance his coding skills and had valuable discussions during job interviews.
    • Emphasized the importance of personal projects in learning and showcasing skills.

DANL Career Session - 2. Jaehyung Lee (Andy)

  • Impact of Academic Courses
    • Took courses like Data Analytics 100, 200, 210, and 310, which helped him build a meaningful relationship with data.
    • Learned programming languages like R and Python, and developed skills in data visualization and presentation.
    • Created interactive dashboards using Shiny, which impressed interviewers and aided in job acquisition.
    • Stressed the applicability of classroom knowledge to real-world scenarios.

DANL Career Session - 3. Jason Rappazzo

  • Background
    • Graduated from SUNY Geneseo in 2023 with a major in Economics and a minor in Data Analytics.
    • Served as a tutor in the Data Analytics Lab and was part of the track team alongside Marcie.
  • Role at Momentive
    • Works as a data analyst in the Global IT Automation and Reporting at Momentive, a chemical manufacturing company.
    • The company produces a wide array of products found in everyday items, from tires to mattresses and aerospace components.
    • Responsible for automating manual business processes and turning complex data into actionable insights.

DANL Career Session - 3. Jason Rappazzo

  • Technologies Used
    • Alteryx: For data extraction and consolidating data from various company sources.
    • Snowflake: Serves as the data warehouse where data is transformed and prepared.
    • Tableau: Used to create dashboards and visualizations for different departments.

DANL Career Session - 3. Jason Rappazzo

  • Insights on Learning
    • Emphasized the importance of adaptability and the ability to learn new technologies in a rapidly changing tech landscape.
    • Encouraged persistence in coding, highlighting that overcoming errors and frustrations is part of the learning process.
    • Believes that proficiency in one programming language can facilitate learning others.

DANL Career Session - 3. Jason Rappazzo

  • Projects
    • SG&A Dashboard: Acted as the project manager for a dashboard visualizing the company’s Selling, General, and Administrative expenses.
    • Digital Growth Dashboard: Visualizes online sales data and tracks the company’s shift toward automated online product sales.
    • Regularly combines data from various ERP-related tables into coherent views for analysis and presentation.

DANL Career Session

Challenges Faced

  • Marcie
    • Experienced corporate instability at Crown Castle, witnessing four rounds of layoffs, a CEO change, and office closures.
    • Had to make an earlier-than-expected career move due to the company’s uncertain future.
    • Adjusted to Meta’s faster-paced environment and the need to learn new internal tools and processes.
    • Navigated strict data privacy and legal considerations at Meta, a shift from her previous role.

DANL Career Session

Challenges Faced

  • Andy
    • Initially believed data analytics was primarily technical but realized the importance of interpersonal skills.
    • Learned to communicate effectively with stakeholders to meet their needs and deliver valuable insights.
    • Continues to work on improving communication and presentation skills.

DANL Career Session

Challenges Faced

  • Jason
    • Faced a learning curve due to a limited background in finance, necessitating on-the-job learning of financial data analysis.
    • Recognized the importance of understanding data context to create meaningful visualizations for stakeholders.
    • Learned that real-world data is often messy and inconsistent, requiring significant data preparation and cleaning.

DANL Career Session

Insights on AI

  • Marcie
    • Observed Meta’s significant investment in AI technologies, such as open-source models like Llama 3.
    • Believes AI applications will expand across various sectors, including geospatial data and map-making.
    • Emphasized the importance of having industry-specific knowledge to optimize AI models effectively.

DANL Career Session

Insights on AI

  • Andy
    • Acknowledged the hype surrounding AI and the possibility of an AI bubble.
    • Mentioned that his company is preparing by diversifying into other industries beyond AI.
    • Believes AI will become more prominent across all businesses, requiring adaptability and broader skill sets.

DANL Career Session

Most Fulfilling Aspects

  • Marcie
    • Finds fulfillment when her work aligns with her passions, particularly in geospatial data and map-making.
    • Enjoys seeing the direct impact of her insights on public-facing products and contributing to an interconnected world.
    • Prefers roles where she can witness the end results and user impact of her work.

DANL Career Session

Most Fulfilling Aspects

  • Andy
    • Derives satisfaction from building solutions from scratch and seeing them positively impact stakeholders.
    • Values moments when his work translates into tangible benefits for the company and receives appreciation.

DANL Career Session

Most Fulfilling Aspects

  • Jason
    • Finds it rewarding when code runs flawlessly and data is accurately prepared.
    • Enjoys impressing stakeholders with the final product, especially when it simplifies their work.
    • Values making a tangible impact on colleagues’ efficiency and effectiveness.

DANL Career Session

Advice to Students Who Are Looking for a Job

  • Marcie
    • Encouraged students to leverage opportunities at Geneseo to build their resumes, such as tutoring, research, and extracurricular activities.
    • Highlighted the competitive job market and the necessity of relevant experience to secure internships and jobs.
    • Advised students to combine their data analytics skills with industries they are passionate about for a more fulfilling career.

DANL Career Session

Advice to Students Who Are Looking for a Job

  • Andy
    • Emphasized the significance of working on personal projects to deepen coding and data science expertise.
    • Recommended mastering Python, R, and SQL as essential tools for data-related jobs.
    • Advised students to have one or two significant projects to discuss during interviews.
    • Encouraged networking with professionals in the field to gain insights and collaboration opportunities.
    • Highlighted that exploring various projects builds confidence and aids in transitioning from academia to industry.

DANL Career Session

Advice to Students Who Are Looking for a Job

  • Jason
    • Recommended learning basic finance and accounting concepts to better understand and meet business needs.
    • Advised developing strong interpersonal and communication skills to effectively collaborate with non-technical stakeholders.
    • Encouraged being open to learning new tools and technologies to stay relevant in the field.

DANL Career Session

Summary

  • The speakers highlighted the importance of:
    • Technical Skills: Proficiency in programming languages like Python, R, and SQL is crucial.
    • Continuous Learning: Staying adaptable and willing to learn new technologies and methodologies.
    • Interpersonal Skills: Effective communication with stakeholders and team members is essential.
    • Passion Alignment: Combining data analytics skills with personal interests leads to a more fulfilling career.
    • Practical Experience: Engaging in projects, internships, and leveraging academic opportunities enhances employability.

DANL Career

Useful Skills for DANL Students

  • Data Analysis with Python and R
  • Data Science with Python and R
  • Programming Languages for databases such as SQL
  • Business Intelligence Software such as Power BI and/or Tableau
  • Version Control with Git and GitHub
  • Data Analytics/Science Portfolio with a Capstone Project on GitHub
  • Prompt Engineering in Generative AI

DANL Career

  1. General Data Analytics, SQL, and Business Intelligence
  2. Data Science
  3. Generative AI