Nobel Prize for Machine Learning; Midterm I Review
October 11, 2024
The Royal Swedish Academy of Sciences has decided to award the 2024 Nobel Prize in Physics to U.S. scientist John J. Hopfield and British-Canadian Geoffrey E. Hinton for discoveries and inventions in ____________________, a field that enables computers to learn from and make predictions or decisions based on data, which paved the way for the artificial intelligence boom.
Geoffrey Hinton’s Innovations:
Deep learning: Multi-layered networks that autonomously learn complex patterns
Impact on AI:
The Scale of Modern AI:
Applications: Voice assistants, self-driving cars, medical diagnostics, and so on.
Importance of Data and Computing Power:
Co-founded DeepMind in 2010, a company that revolutionized AI for boardgames
DeepMind’s Global Attention:
For Questions 6-8, consider the following data.frame, twitter_data
, displayed below:
What type of variable is Country
in the dataset?
Country
: Country of residence
What type of variable is LastLoginHour
in the dataset?
LastLoginHour
: Time of last login in hours since midnight
What type of variable is SatisfactionLevel
in the dataset?
SatisfactionLevel
: User satisfaction level
Which of the following R code correctly assigns the data.frame nycflights13::airlines
to the variable airlines_df
? (Note that airlines_df
is simply the name of the R object and can be any valid name in R.)
nycflights13::airlines <- airlines_df
airlines_df <- nycflights13::airlines
nycflights13::airlines >= airlines_df
airlines_df == nycflights13::airlines
Write the R code to create a new variable called total
and assign to it the sum of 8 and 12 in R.
Answer: ______________________________________________
Given the data.frame df
with variables height
and name
, which of the following expressions returns a vector containing the values in the height
variable?
df:height
df::height
df$height
The expression as.numeric("456")
will return the numeric value 456.
What is the result of the expression (1 + 2 * 3) ^ 2
in R?
36
49
81
Given vectors a <- c(2, 4, 6)
and b <- c(1, 3, 5)
, what is the result of a + b
?
c(3, 7, 11)
c(2, 4, 6, 1, 3, 5)
c(1, 2, 3, 4, 5, 6)
Error
To use the function read_csv()
from the readr
package, one of the packages in tidyverse
, you first need to load the package using the R code ________.
library(readr)
library(skimr)
library(tidyverse)
Consider the following data.frame df0
:
x | y |
---|---|
Na | 7 |
2 | NA |
3 | 9 |
What is the result of mean(df0$y)
?
7
NA
8
9
Consider the following data.frame df
for Questions 22-23:
id | name | age | score |
---|---|---|---|
1 | Anna | 22 | 90 |
2 | Ben | 28 | 85 |
3 | Carl | NA | 95 |
4 | Dana | 35 | NA |
5 | Ella | 40 | 80 |
Which of the following code snippets filters observations where score
is strictly between 85 and 95 (i.e., excluding 85 and 95)?
df |> filter(score >= 85 | score <= 95)
df |> filter(score > 85 | score < 95)
df |> filter(score > 85 & score < 95)
df |> filter(score >= 85 & score <= 95)
Which of the following expressions correctly keeps observations from df where the age
variable does not have any missing values?
df |> filter(is.na(age))
df |> filter(!is.na(age))
df |> filter(age == NA)
df |> filter(age != NA)
Consider the following data.frame df3
:
id | value |
---|---|
1 | 15 |
1 | 15 |
2 | 25 |
3 | 35 |
3 | 35 |
4 | 45 |
5 | 55 |
Which of the following code snippets returns a data.frame of unique id
values from df3
?
df3 |> select(id) |> distinct()
df3 |> distinct(value)
df3 |> distinct(id)
Which of the following code snippets correctly renames the variable name
to first_name
in df
?
df |> rename(first_name = name)
df |> rename(name = first_name)
df |> rename("name" = "first_name")
df |> rename_variable(name = first_name)
Which of the following code snippets correctly removes the score
variable from df
?
df |> select(-score)
df |> select(-"score")
df |> select(!score)
df |> select(, -score)
df |> select(desc(score))
Which of the following code snippets filters observations where age
is not NA
, then arranges them in ascending order of age
, and then selects the name
and age
variables?
df |> filter(!is.na(age)) |> arrange(age) |> select(name, age)
df |> select(name, age) |> arrange(age) |> filter(!is.na(age))
df |> arrange(age) |> filter(!is.na(age)) |> select(name, age)
df |> filter(is.na(age)) |> arrange(desc(age)) |> select(name, age)
Consider the two related data.frames, students
and majors
:
df_1
student_id | name | age |
---|---|---|
1 | Brad | 20 |
2 | Jason | 22 |
4 | Marcie | 21 |
df_2
student_id | major |
---|---|
1 | Business Administration |
2 | Economics |
3 | Data Analytics |
student_id | major | name | age |
---|---|---|---|
1 | Business Administration | Brad | 20 |
2 | Economics | Jason | 22 |
3 | Data Analytics | NA | NA |
students |> left_join(majors)
majors |> left_join(students)
In R, what does the function sd(x)
compute, and why can it be more useful than var(x)
?
List at least four applications of data analytics in sports analytics mentioned in the lecture, and briefly describe each one.