Lecture 17

Nobel Prize for Machine Learning; Midterm I Review

Byeong-Hak Choe

SUNY Geneseo

October 11, 2024

Nobel Prize in Physics and Chemistry in 2024

The Royal Swedish Academy of Sciences has decided to award the 2024 Nobel Prize in Physics to U.S. scientist John J. Hopfield and British-Canadian Geoffrey E. Hinton for discoveries and inventions in ____________________, a field that enables computers to learn from and make predictions or decisions based on data, which paved the way for the artificial intelligence boom.

Nobel Prize in Physics and Chemistry in 2024

2024 Nobel Prizes: Laying the Foundations for ML

The Breakthroughs Behind AI’s Modern Revolution

Recognizes groundbreaking contributions to ML and AI
- Hopfield and Hinton: Deep learning and neural network architecture
- Hassabis and Jumper: AI and protein folding breakthroughs
Why It Matters: These discoveries laid the foundation for the ML revolution we are living through today

2024 Nobel Prizes: Laying the Foundations for ML

Deep Learning

Geoffrey Hinton’s Innovations:
- Developed backpropagation, a key algorithm for training deep neural networks
Deep learning: Multi-layered networks that autonomously learn complex patterns
Impact on AI:
- Enabled machines to learn from data without explicit instructions
- Core technology behind language models (like ChatGPT), image recognition, and more

2024 Nobel Prizes: Laying the Foundations for ML

ML Today

The Scale of Modern AI:
- Deep neural networks now contain billions to trillions of parameters
- Hugging Face: Open-source community for ML and AI
Applications: Voice assistants, self-driving cars, medical diagnostics, and so on.
Importance of Data and Computing Power:
- The explosion in data and cloud computing has fueled rapid progress in AI
- AI models like GPT-4 are direct descendants of Hopfield and Hinton’s early work

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

Boardgame Master Enters the Protein Olympics:
Early Life:
- Started playing chess at age 4, achieved master level at 13
- Transitioned to programming and video game development as a teenager

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

Co-founded DeepMind in 2010, a company that revolutionized AI for boardgames
DeepMind’s Global Attention:
- Sold to Google in 2014
- In 2016, DeepMind’s AI defeated the world champion of Go, a breakthrough in AI’s problem-solving abilities

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

AI’s True Purpose for Hassabis:
- Games were just a stepping stone to developing AI for more meaningful applications, like predicting protein structures

2024 Nobel Prizes: Laying the Foundations for ML

The Future of Machine Learning

What’s Next?:
- AI is rapidly evolving, expanding into areas like business, economics, climate science, healthcare, personalized medicine, and so on.
- Challenges include transparency, ethics, and responsible AI development
Ethical Considerations:
- How do we ensure AI benefits humanity and minimizes harm?
Final Thoughts:
- As AI continues to grow, it holds the potential to solve many of humanity’s greatest challenges

Midterm Exam I Review

Questions 6-8

For Questions 6-8, consider the following data.frame, twitter_data, displayed below:

Question 6

What type of variable is Country in the dataset?

Nominal
Ordinal
Interval
Ratio

Country: Country of residence

Question 7

What type of variable is LastLoginHour in the dataset?

Nominal
Ordinal
Interval
Ratio

LastLoginHour: Time of last login in hours since midnight

Question 8

What type of variable is SatisfactionLevel in the dataset?

Nominal
Ordinal
Interval
Ratio

SatisfactionLevel: User satisfaction level

Question 14

Which of the following R code correctly assigns the data.frame nycflights13::airlines to the variable airlines_df? (Note that airlines_df is simply the name of the R object and can be any valid name in R.)

nycflights13::airlines <- airlines_df
airlines_df <- nycflights13::airlines
nycflights13::airlines >= airlines_df
airlines_df == nycflights13::airlines
All of the above

Question 15

Write the R code to create a new variable called total and assign to it the sum of 8 and 12 in R.

Answer: ______________________________________________

Question 16

Given the data.frame df with variables height and name, which of the following expressions returns a vector containing the values in the height variable?

df:height
df::height
df$height
Both b and c

Question 17

The expression as.numeric("456") will return the numeric value 456.

True
False

Question 18

What is the result of the expression (1 + 2 * 3) ^ 2 in R?

36
49
81

Question 19

Given vectors a <- c(2, 4, 6) and b <- c(1, 3, 5), what is the result of a + b?

c(3, 7, 11)
c(2, 4, 6, 1, 3, 5)
c(1, 2, 3, 4, 5, 6)
Error

Question 20

To use the function read_csv() from the readr package, one of the packages in tidyverse, you first need to load the package using the R code ________.

library(readr)
library(skimr)
library(tidyverse)
All of the above
Both a and c
Both b and c
Both a and c

Question 21

Consider the following data.frame df0:

x	y
Na	7
2	NA
3	9

What is the result of mean(df0$y)?

7
NA
8
9

Questions 22-23

Consider the following data.frame df for Questions 22-23:

id	name	age	score
1	Anna	22	90
2	Ben	28	85
3	Carl	NA	95
4	Dana	35	NA
5	Ella	40	80

Question 22

Which of the following code snippets filters observations where score is strictly between 85 and 95 (i.e., excluding 85 and 95)?

df |> filter(score >= 85 | score <= 95)
df |> filter(score > 85 | score < 95)
df |> filter(score > 85 & score < 95)
df |> filter(score >= 85 & score <= 95)

Question 23

Which of the following expressions correctly keeps observations from df where the age variable does not have any missing values?

df |> filter(is.na(age))
df |> filter(!is.na(age))
df |> filter(age == NA)
df |> filter(age != NA)
Both a and c
Both b and d

Question 24

Consider the following data.frame df3:

id	value
1	15
1	15
2	25
3	35
3	35
4	45
5	55

Which of the following code snippets returns a data.frame of unique id values from df3?

df3 |> select(id) |> distinct()
df3 |> distinct(value)
df3 |> distinct(id)
Both a and c

Question 25

Which of the following code snippets correctly renames the variable name to first_name in df?

df |> rename(first_name = name)
df |> rename(name = first_name)
df |> rename("name" = "first_name")
df |> rename_variable(name = first_name)

Question 26

Which of the following code snippets correctly removes the score variable from df?

df |> select(-score)
df |> select(-"score")
df |> select(!score)
df |> select(, -score)
df |> select(desc(score))

Question 27

Which of the following code snippets filters observations where age is not NA, then arranges them in ascending order of age, and then selects the name and age variables?

df |> filter(!is.na(age)) |> arrange(age) |> select(name, age)
df |> select(name, age) |> arrange(age) |> filter(!is.na(age))
df |> arrange(age) |> filter(!is.na(age)) |> select(name, age)
df |> filter(is.na(age)) |> arrange(desc(age)) |> select(name, age)
All of the above

Question 28

Consider the two related data.frames, students and majors:

df_1

student_id	name	age
1	Brad	20
2	Jason	22
4	Marcie	21

df_2

student_id	major
1	Business Administration
2	Economics
3	Data Analytics

student_id	major	name	age
1	Business Administration	Brad	20
2	Economics	Jason	22
3	Data Analytics	NA	NA

students |> left_join(majors)
majors |> left_join(students)
Both a and b

Question 29

In R, what does the function sd(x) compute, and why can it be more useful than var(x)?

Question 30

List at least four applications of data analytics in sports analytics mentioned in the lecture, and briefly describe each one.