Lecture 17

Nobel Prize for Machine Learning; Midterm I Review

Byeong-Hak Choe

SUNY Geneseo

October 11, 2024

Nobel Prize in Physics and Chemistry in 2024

The Royal Swedish Academy of Sciences has decided to award the 2024 Nobel Prize in Physics to U.S. scientist John J. Hopfield and British-Canadian Geoffrey E. Hinton for discoveries and inventions in ____________________, a field that enables computers to learn from and make predictions or decisions based on data, which paved the way for the artificial intelligence boom.

Nobel Prize in Physics and Chemistry in 2024

2024 Nobel Prizes: Laying the Foundations for ML

The Breakthroughs Behind AI’s Modern Revolution

  • Recognizes groundbreaking contributions to ML and AI
    • Hopfield and Hinton: Deep learning and neural network architecture
    • Hassabis and Jumper: AI and protein folding breakthroughs
  • Why It Matters: These discoveries laid the foundation for the ML revolution we are living through today

2024 Nobel Prizes: Laying the Foundations for ML

Deep Learning

  • Geoffrey Hinton’s Innovations:

    • Developed backpropagation, a key algorithm for training deep neural networks
  • Deep learning: Multi-layered networks that autonomously learn complex patterns

  • Impact on AI:

    • Enabled machines to learn from data without explicit instructions
    • Core technology behind language models (like ChatGPT), image recognition, and more

2024 Nobel Prizes: Laying the Foundations for ML

ML Today

  • The Scale of Modern AI:

    • Deep neural networks now contain billions to trillions of parameters
    • Hugging Face: Open-source community for ML and AI
  • Applications: Voice assistants, self-driving cars, medical diagnostics, and so on.

  • Importance of Data and Computing Power:

    • The explosion in data and cloud computing has fueled rapid progress in AI
    • AI models like GPT-4 are direct descendants of Hopfield and Hinton’s early work

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • Boardgame Master Enters the Protein Olympics:
  • Early Life:
    • Started playing chess at age 4, achieved master level at 13
    • Transitioned to programming and video game development as a teenager

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • Co-founded DeepMind in 2010, a company that revolutionized AI for boardgames

  • DeepMind’s Global Attention:

    • Sold to Google in 2014
    • In 2016, DeepMind’s AI defeated the world champion of Go, a breakthrough in AI’s problem-solving abilities

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • AI’s True Purpose for Hassabis:
    • Games were just a stepping stone to developing AI for more meaningful applications, like predicting protein structures

2024 Nobel Prizes: Laying the Foundations for ML

The Future of Machine Learning

  • What’s Next?:
    • AI is rapidly evolving, expanding into areas like business, economics, climate science, healthcare, personalized medicine, and so on.
    • Challenges include transparency, ethics, and responsible AI development
  • Ethical Considerations:
    • How do we ensure AI benefits humanity and minimizes harm?
  • Final Thoughts:
    • As AI continues to grow, it holds the potential to solve many of humanity’s greatest challenges

Midterm Exam I Review

Questions 6-8

For Questions 6-8, consider the following data.frame, twitter_data, displayed below:

Question 6

What type of variable is Country in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

Country: Country of residence

Question 7

What type of variable is LastLoginHour in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

LastLoginHour: Time of last login in hours since midnight

Question 8

What type of variable is SatisfactionLevel in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

SatisfactionLevel: User satisfaction level

Question 14

Which of the following R code correctly assigns the data.frame nycflights13::airlines to the variable airlines_df? (Note that airlines_df is simply the name of the R object and can be any valid name in R.)

  1. nycflights13::airlines <- airlines_df
  2. airlines_df <- nycflights13::airlines
  3. nycflights13::airlines >= airlines_df
  4. airlines_df == nycflights13::airlines
  5. All of the above

Question 15

Write the R code to create a new variable called total and assign to it the sum of 8 and 12 in R.

Answer: ______________________________________________

Question 16

Given the data.frame df with variables height and name, which of the following expressions returns a vector containing the values in the height variable?

  1. df:height
  2. df::height
  3. df$height
  4. Both b and c

Question 17

The expression as.numeric("456") will return the numeric value 456.

  1. True
  2. False

Question 18

What is the result of the expression (1 + 2 * 3) ^ 2 in R?

  1. 36
  2. 49
  3. 81

Question 19

Given vectors a <- c(2, 4, 6) and b <- c(1, 3, 5), what is the result of a + b?

  1. c(3, 7, 11)
  2. c(2, 4, 6, 1, 3, 5)
  3. c(1, 2, 3, 4, 5, 6)
  4. Error

Question 20

To use the function read_csv() from the readr package, one of the packages in tidyverse, you first need to load the package using the R code ________.

  1. library(readr)
  2. library(skimr)
  3. library(tidyverse)
  4. All of the above
  5. Both a and c
  6. Both b and c
  7. Both a and c

Question 21

Consider the following data.frame df0:

x y
Na 7
2 NA
3 9

What is the result of mean(df0$y)?

  1. 7
  2. NA
  3. 8
  4. 9

Questions 22-23

Consider the following data.frame df for Questions 22-23:

id name age score
1 Anna 22 90
2 Ben 28 85
3 Carl NA 95
4 Dana 35 NA
5 Ella 40 80

Question 22

Which of the following code snippets filters observations where score is strictly between 85 and 95 (i.e., excluding 85 and 95)?

  1. df |> filter(score >= 85 | score <= 95)
  2. df |> filter(score > 85 | score < 95)
  3. df |> filter(score > 85 & score < 95)
  4. df |> filter(score >= 85 & score <= 95)

Question 23

Which of the following expressions correctly keeps observations from df where the age variable does not have any missing values?

  1. df |> filter(is.na(age))
  2. df |> filter(!is.na(age))
  3. df |> filter(age == NA)
  4. df |> filter(age != NA)
  5. Both a and c
  6. Both b and d

Question 24

Consider the following data.frame df3:

id value
1 15
1 15
2 25
3 35
3 35
4 45
5 55

Which of the following code snippets returns a data.frame of unique id values from df3?

  1. df3 |> select(id) |> distinct()
  2. df3 |> distinct(value)
  3. df3 |> distinct(id)
  4. Both a and c

Question 25

Which of the following code snippets correctly renames the variable name to first_name in df?

  1. df |> rename(first_name = name)
  2. df |> rename(name = first_name)
  3. df |> rename("name" = "first_name")
  4. df |> rename_variable(name = first_name)

Question 26

Which of the following code snippets correctly removes the score variable from df?

  1. df |> select(-score)
  2. df |> select(-"score")
  3. df |> select(!score)
  4. df |> select(, -score)
  5. df |> select(desc(score))

Question 27

Which of the following code snippets filters observations where age is not NA, then arranges them in ascending order of age, and then selects the name and age variables?

  1. df |> filter(!is.na(age)) |> arrange(age) |> select(name, age)
  2. df |> select(name, age) |> arrange(age) |> filter(!is.na(age))
  3. df |> arrange(age) |> filter(!is.na(age)) |> select(name, age)
  4. df |> filter(is.na(age)) |> arrange(desc(age)) |> select(name, age)
  5. All of the above

Question 28

Consider the two related data.frames, students and majors:

  • df_1
student_id name age
1 Brad 20
2 Jason 22
4 Marcie 21
  • df_2
student_id major
1 Business Administration
2 Economics
3 Data Analytics
student_id major name age
1 Business Administration Brad 20
2 Economics Jason 22
3 Data Analytics NA NA
  1. students |> left_join(majors)
  2. majors |> left_join(students)
  3. Both a and b

Question 29

In R, what does the function sd(x) compute, and why can it be more useful than var(x)?

Question 30

List at least four applications of data analytics in sports analytics mentioned in the lecture, and briefly describe each one.