Lecture 2
Introduction to DANL and its tools
Byeong-Hak Choe
SUNY Geneseo
August 27, 2025
Why Data Analytics?
- Fill in the gaps left by traditional business and economics classes.
- Practical skills that will benefit your future career.
- Neglected skills like how to actually find datasets in the wild and clean them.
- Data analytics skills are largely distinct from (and complementary to) the core quantitative works familiar to business undergrads.
- Data visualization, cleaning and wrangling; databases; machine learning; etc.
- In short, we will cover things that I wish someone had taught me when I was undergraduate.
You, at the end of this course
Why Data Analytics?
- Data analysts use analytical tools and techniques to extract meaningful insights from data.
- Skills in data analytics are also useful for business analysts, market analysts, financial analysts, human resource analysts, or economists.
- Breau of Labor Statistics forecasts that the projected growth rate of the employment in the industry related to data analytics from 2021 to 2031 is 36%.
- The average growth rate for all occupations is 5%.
Why R, Python, and Databases?
Why R, Python, and Databases?
Stack Overflow Trends
Stack Overflow is the most popular Q & A website specifically for programmers and software developers in the world.
See how programming languages have trended over time based on use of their tags in Stack Overflow from 2008 to 2023.
Most Popular Languagues
![]()
Data Science and Big Data
![]()
Data Analytics and Generative Artificial Intelligence (AI)
Data Analytics and Big Data Trend
From 2008 to 2025
![]()
Programmers in 2025
![]()
Data Analytics and Generative AI
- Generative AI refers to a category of AI that is capable of generating new content, ranging from text, images, and videos to music and code.
- In the early 2020s, advances in transformer-based deep neural networks enabled a number of generative AI systems notable for accepting natural language prompts as input.
- These include large language model (LLM) chatbots (e.g., ChatGPT, Claude, Gemini, Copilot, Grok).
- ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by OpenAI and launched on November 30, 2022.
- By January 2023, it had become what was then the fastest-growing consumer software application in history.
Data Analytics and Generative AI
- Users around the world have explored how to best utilize GPT for writing essays and programming codes.
- Is AI a threat to data analytics?
- Fundamental understanding of the subject matter is still crucial for effectively utilizing AI’s capabilities.
- If you use Generative AI such as ChatGPT, please try to understand what ChatGPT gives you.
- Copying and pasting it without any understanding harms your learning opportunity.
What is R?
R is a programming language and software environment designed for statistical computing and graphics.
R has become a major tool in data analysis, statistical modeling, and visualization.
- It is widely used among statisticians and data scientists for developing statistical software and performing data analysis.
- R is open source and freely available.
What is RStudio?
- RStudio is an integrated development environment (IDE) for R.
- An IDE is a software application that provides comprehensive facilities (e.g., text code editor, graphical user interface (GUI) to computer programmers for software development.
- RStudio is a user-friendly interface that makes using R easier and more interactive.
- It provides a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, and workspace management.
- We will use a free cloud version of RStudio, which is Posit Cloud.
What is Python?
Python is a versatile programming language known for its simplicity and readability.
Python has become a dominant tool in various fields including data analysis, machine learning, and web development.
- It is widely used among developers, data scientists, and researchers for building applications and performing data-driven tasks.
- Python is open source and has a vast ecosystem of libraries and frameworks.
What is Jupyter?
- Jupyter is an open-source integrated development environment (IDE) primarily for Python, though it supports many other languages.
- Jupyter provides a notebook interface that allows users to write and execute code in a more interactive and visual format.
- Jupyter Notebook is a user-friendly environment that enhances coding, data analysis, and visualization.
- It offers a web-based interface that combines live code, equations, visualizations, and narrative text.
- Jupyter is widely used for data science, machine learning, and research, enabling easy sharing and collaboration.
- You can use a free cloud version of Jupyter, which is Google Colab.
- Google Colab can be used for R as well.
What is Git?
\(\quad\)
- Git is the most popular version control tool for any software development.
- It tracks changes in a series of snapshots of the project, allowing developers to revert to previous versions, compare changes, and merge different versions.
- It is the industry standard and ubiquitous for coding collaboration.
What is GitHub?
GitHub is a web-based hosting platform for Git repositories to store, manage, and share code.
Our course website is hosted on a GitHub repository.
Course contents will be posted not only in Brightspace but also in my GitHub repositories (“repos”).
Github is useful for many reasons, but the main reason is how user friendly it makes uploading and sharing code.
What is Machine Learning?
ML Example 1: Image Recognition 🖼️
- Traditional programming:
- Try to define a “cat” in code (pointy ears, whiskers, etc.). Very hard!
- Machine learning:
- Show the computer thousands of pictures labeled cat or not cat.
- It learns the patterns in pixels and can recognize new cat photos.
ML Example 2: Music Recommendations 🎵
- Traditional programming:
- Code rules like: “If a user likes pop songs, recommend other pop songs.”
- Machine learning:
- Use data from millions of listeners.
- The algorithm finds hidden patterns (e.g., “People who like Artist A often also like Artist B”).
- Then it recommends songs you might enjoy.
ML Example 3: Spam Email Detection 📧
- Traditional programming:
- Write rules like: “If an email contains the word ‘lottery,’ mark it as spam.”
- Machine learning:
- Provide thousands of emails labeled spam or not spam.
- The computer learns the patterns and applies them to new emails.