Lecture 13

Nobel Prize for Machine Learning; Midterm I Review

Byeong-Hak Choe

SUNY Geneseo

October 10, 2024

Nobel Prize in Physics and Chemistry in 2024

The Royal Swedish Academy of Sciences has decided to award the 2024 Nobel Prize in Physics to U.S. scientist John J. Hopfield and British-Canadian Geoffrey E. Hinton for discoveries and inventions in ____________________, a field that enables computers to learn from and make predictions or decisions based on data, which paved the way for the artificial intelligence boom.

Nobel Prize in Physics and Chemistry in 2024

2024 Nobel Prizes: Laying the Foundations for ML

The Breakthroughs Behind AI’s Modern Revolution

  • Recognizes groundbreaking contributions to ML and AI
    • Hopfield and Hinton: Deep learning and neural network architecture
    • Hassabis and Jumper: AI and protein folding breakthroughs
  • Why It Matters: These discoveries laid the foundation for the ML revolution we are living through today

2024 Nobel Prizes: Laying the Foundations for ML

Deep Learning

  • Geoffrey Hinton’s Innovations:

    • Developed backpropagation, a key algorithm for training deep neural networks
  • Deep learning: Multi-layered networks that autonomously learn complex patterns

  • Impact on AI:

    • Enabled machines to learn from data without explicit instructions
    • Core technology behind language models (like ChatGPT), image recognition, and more

2024 Nobel Prizes: Laying the Foundations for ML

ML Today

  • The Scale of Modern AI:

    • Deep neural networks now contain billions to trillions of parameters
    • Hugging Face: Open-source community for ML and AI
  • Applications: Voice assistants, self-driving cars, medical diagnostics, and so on.

  • Importance of Data and Computing Power:

    • The explosion in data and cloud computing has fueled rapid progress in AI
    • AI models like GPT-4 are direct descendants of Hopfield and Hinton’s early work

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • Boardgame Master Enters the Protein Olympics:
  • Early Life:
    • Started playing chess at age 4, achieved master level at 13
    • Transitioned to programming and video game development as a teenager

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • Co-founded DeepMind in 2010, a company that revolutionized AI for boardgames

  • DeepMind’s Global Attention:

    • Sold to Google in 2014
    • In 2016, DeepMind’s AI defeated the world champion of Go, a breakthrough in AI’s problem-solving abilities

2024 Nobel Prizes: Laying the Foundations for ML

Demis Hassabis’s Story: From Chess to AI Mastery

  • AI’s True Purpose for Hassabis:
    • Games were just a stepping stone to developing AI for more meaningful applications, like predicting protein structures

2024 Nobel Prizes: Laying the Foundations for ML

The Future of Machine Learning

  • What’s Next?:
    • AI is rapidly evolving, expanding into areas like business, economics, climate science, healthcare, personalized medicine, and so on.
    • Challenges include transparency, ethics, and responsible AI development
  • Ethical Considerations:
    • How do we ensure AI benefits humanity and minimizes harm?
  • Final Thoughts:
    • As AI continues to grow, it holds the potential to solve many of humanity’s greatest challenges

Midterm Exam I Review

Questions 10-13

For Questions 10-13, consider the following data.frame, netflix_data, displayed below:

Questions 10-13

Question 10

What type of variable is FavoriteGenre in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

FavoriteGenre: User’s favorite genre

Questions 10-13

Question 11

What type of variable is SubscriptionPlan in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

SubscriptionPlan: Type of Netflix subscription

Questions 10-13

Question 12

What type of variable is LastLoginTime in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

LastLoginTime: Time of last login in hours since midnight

Questions 10-13

Question 13

What type of variable is Satisfaction in the dataset?

  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio

Satisfaction: User satisfaction rating (1 to 5 stars)

Question 20

Which of the following R code correctly assigns the nycflights13::airlines data.frame to the variable df_airlines? (Note that df_airlines is simply the name of the R object and can be any valid name in R.)

  1. nycflights13::airlines <- df_airlines
  2. df_airlines <- nycflights13::airlines
  3. nycflights13::airlines <= df_airlines
  4. df_airlines == nycflights13::airlines

Question 21

Which of the following R code correctly calculate the number of elements in a vector x <- c(1,2,3,4,5)?

  1. nrow(x)
  2. sd(x)
  3. sum(x)
  4. length(x)

Question 22

Write the R code to create a new variable called result and assign to it the sum of 5 and 7 in R.

Question 22

Write the R code to create a new variable called result and assign to it the sum of 5 and 7 in R.

Question 23

Given the data.frame df with variables age and name, which of the following expressions returns a vector containing the values in the age variable?

  1. df:age
  2. df::age
  3. df$age
  4. Both b and c

Question 24

The expression as.numeric("123") will return the numeric value 123.

  1. True
  2. False

Question 25

What is the result of the expression (4 + 3) ^ 2 in R?

  1. 3.5
  2. 9
  3. 14
  4. 49

Question 26

Given vectors a <- c(1, 2, 3) and b <- c(4, 5, 6), what is the result of a + b?

  1. c(5, 7, 9)
  2. c(4, 5, 6, 1, 2, 3)
  3. c(1, 2, 3, 4, 5, 6)
  4. Error

Question 27

Which of the following functions is part of the tidyverse package and is used to read a CSV file into a data.frame?

  1. read.csv()
  2. read_csv()
  3. read.table()
  4. load()

Question 28

To use the function skim() from the skimr package, you first need to load the package using the R code ________.

  1. library(skimr)
  2. load(skimr)
  3. skimr
  4. skimr::skim

Question 29

The filter() function can use both logical operators like & and comparison operators like > within the same logical condition.

  1. True
  2. False

Question 30

Consider the following data.frame df0:

x y
1 4
2 NA
Na 6

What is the result of mean(df0$y)?

  1. 4
  2. NA
  3. 5
  4. 6

Questions 31-32

Consider the following data.frame df for Questions 31-32:

id name age score
1 Alice 25 85
2 Bob 30 90
3 Charlie 35 75
4 David NA 80
5 Eve 45 NA

Question 31

Which of the following code snippets keeps observations where score is between 80 and 90 inclusive?

  1. df |> filter(score > 80 & score < 90)
  2. df |> filter(score >= 80 & score <= 90)
  3. df |> filter(score >= 80 | score <= 90)
  4. df |> filter(score > 80 | score < 90)

Question 32

Which of the following expressions correctly keeps observations from df where the age variable has missing values?

  1. df |> filter(is.na(age))
  2. df |> filter(!is.na(age))
  3. df |> filter(age == NA)
  4. df |> filter(age != NA)
  5. Both a and c
  6. Both b and d

Question 33

The arrange() function can sort data based on multiple variables.

  1. True
  2. False

Question 34

Consider the following data.frame df3:

id value
1 10
2 20
2 20
3 30
4 40
4 40
5 50

Which of the following code snippets returns a data.frame of unique id values from df3?

  1. df3 |> select(id) |> distinct()
  2. df3 |> distinct(value)
  3. df3 |> distinct(id)
  4. Both A and C

Question 35

Which of the following code snippets correctly renames the variable age to years in df?

  1. df |> rename(years = age)
  2. df |> rename(age = years)
  3. df |> rename("age" = "years")
  4. df |> rename_variable(age = years)

Question 36

Which of the following code snippets correctly removes the age variable from df?

  1. df |> select(-age)
  2. df |> select(-"age")
  3. df |> select(!age)
  4. df |> select(, -age)
  5. df |> select(desc(age))

Question 37

Which of the following code snippets filters observations where age is not NA, then arranges them in descending order of age, and then selects the name and age variables?

  1. df |> filter(!is.na(age)) |> arrange(desc(age)) |> select(name, age)
  2. df |> select(name, age) |> arrange(desc(age)) |> filter(!is.na(age))
  3. df |> arrange(desc(age)) |> filter(!is.na(age)) |> select(name, age)
  4. df |> filter(is.na(age)) |> arrange(age) |> select(name, age)

Question 38

Consider the two related data.frames, df_1 and df_2:

  • df_1
id name age
1 Alice 19
2 Bob 21
4 Olivia 20
  • df_2
id major
1 Economics
2 Business Administration
3 Data Analytics
id major name age
1 Economics Alice 19
2 Business Administration Bob 21
3 Data Analytics NA NA
  1. df_1 |> left_join(df_2)
  2. df_2 |> left_join(df_1)
  3. both a and b

Question 39

In R, what does the function sd(x) compute, and why can it be more useful than var(x)?

Question 40

What is the primary limitation of Hadoop’s MapReduce, and how is it addressed by technologies like Apache Storm and Apache Spark?