Nobel Prize for Machine Learning; Midterm I Review
October 10, 2024
The Royal Swedish Academy of Sciences has decided to award the 2024 Nobel Prize in Physics to U.S. scientist John J. Hopfield and British-Canadian Geoffrey E. Hinton for discoveries and inventions in ____________________, a field that enables computers to learn from and make predictions or decisions based on data, which paved the way for the artificial intelligence boom.
Geoffrey Hinton’s Innovations:
Deep learning: Multi-layered networks that autonomously learn complex patterns
Impact on AI:
The Scale of Modern AI:
Applications: Voice assistants, self-driving cars, medical diagnostics, and so on.
Importance of Data and Computing Power:
Co-founded DeepMind in 2010, a company that revolutionized AI for boardgames
DeepMind’s Global Attention:
For Questions 10-13, consider the following data.frame, netflix_data
, displayed below:
What type of variable is FavoriteGenre
in the dataset?
FavoriteGenre
: User’s favorite genre
What type of variable is SubscriptionPlan
in the dataset?
SubscriptionPlan
: Type of Netflix subscription
What type of variable is LastLoginTime
in the dataset?
LastLoginTime
: Time of last login in hours since midnight
What type of variable is Satisfaction
in the dataset?
Satisfaction
: User satisfaction rating (1 to 5 stars)
Which of the following R code correctly assigns the nycflights13::airlines
data.frame to the variable df_airlines
? (Note that df_airlines
is simply the name of the R object and can be any valid name in R.)
nycflights13::airlines <- df_airlines
df_airlines <- nycflights13::airlines
nycflights13::airlines <= df_airlines
df_airlines == nycflights13::airlines
Which of the following R code correctly calculate the number of elements in a vector x <- c(1,2,3,4,5)
?
nrow(x)
sd(x)
sum(x)
length(x)
Write the R code to create a new variable called result
and assign to it the sum of 5 and 7 in R.
Write the R code to create a new variable called result
and assign to it the sum of 5 and 7 in R.
Given the data.frame df
with variables age
and name
, which of the following expressions returns a vector containing the values in the age
variable?
df:age
df::age
df$age
The expression as.numeric("123")
will return the numeric value 123.
What is the result of the expression (4 + 3) ^ 2
in R?
Given vectors a <- c(1, 2, 3)
and b <- c(4, 5, 6)
, what is the result of a + b
?
c(5, 7, 9)
c(4, 5, 6, 1, 2, 3)
c(1, 2, 3, 4, 5, 6)
Error
Which of the following functions is part of the tidyverse package and is used to read a CSV file into a data.frame?
read.csv()
read_csv()
read.table()
load()
To use the function skim()
from the skimr
package, you first need to load the package using the R code ________.
library(skimr)
load(skimr)
skimr
skimr::skim
The filter()
function can use both logical operators like &
and comparison operators like >
within the same logical condition.
Consider the following data.frame df0
:
x | y |
---|---|
1 | 4 |
2 | NA |
Na | 6 |
What is the result of mean(df0$y)
?
NA
Consider the following data.frame df
for Questions 31-32:
id | name | age | score |
---|---|---|---|
1 | Alice | 25 | 85 |
2 | Bob | 30 | 90 |
3 | Charlie | 35 | 75 |
4 | David | NA | 80 |
5 | Eve | 45 | NA |
Which of the following code snippets keeps observations where score
is between 80 and 90 inclusive?
df |> filter(score > 80 & score < 90)
df |> filter(score >= 80 & score <= 90)
df |> filter(score >= 80 | score <= 90)
df |> filter(score > 80 | score < 90)
Which of the following expressions correctly keeps observations from df
where the age
variable has missing values?
df |> filter(is.na(age))
df |> filter(!is.na(age))
df |> filter(age == NA)
df |> filter(age != NA)
The arrange()
function can sort data based on multiple variables.
Consider the following data.frame df3
:
id | value |
---|---|
1 | 10 |
2 | 20 |
2 | 20 |
3 | 30 |
4 | 40 |
4 | 40 |
5 | 50 |
Which of the following code snippets returns a data.frame of unique id
values from df3
?
df3 |> select(id) |> distinct()
df3 |> distinct(value)
df3 |> distinct(id)
Which of the following code snippets correctly renames the variable age
to years
in df
?
df |> rename(years = age)
df |> rename(age = years)
df |> rename("age" = "years")
df |> rename_variable(age = years)
Which of the following code snippets correctly removes the age
variable from df
?
df |> select(-age)
df |> select(-"age")
df |> select(!age)
df |> select(, -age)
df |> select(desc(age))
Which of the following code snippets filters observations where age
is not NA
, then arranges them in descending order of age
, and then selects the name
and age
variables?
df |> filter(!is.na(age)) |> arrange(desc(age)) |> select(name, age)
df |> select(name, age) |> arrange(desc(age)) |> filter(!is.na(age))
df |> arrange(desc(age)) |> filter(!is.na(age)) |> select(name, age)
df |> filter(is.na(age)) |> arrange(age) |> select(name, age)
Consider the two related data.frames, df_1
and df_2
:
df_1
id | name | age |
---|---|---|
1 | Alice | 19 |
2 | Bob | 21 |
4 | Olivia | 20 |
df_2
id | major |
---|---|
1 | Economics |
2 | Business Administration |
3 | Data Analytics |
id | major | name | age |
---|---|---|---|
1 | Economics | Alice | 19 |
2 | Business Administration | Bob | 21 |
3 | Data Analytics | NA | NA |
df_1 |> left_join(df_2)
df_2 |> left_join(df_1)
In R, what does the function sd(x)
compute, and why can it be more useful than var(x)
?
What is the primary limitation of Hadoop’s MapReduce, and how is it addressed by technologies like Apache Storm and Apache Spark?