<- state.name US_states
R Basics I
Classwork 1
Question 1.
base-R provides the R object state.name
. Write an R code to assign state.name
to a variable, US_states
.
Answer:
state.name
is a predefined R object that contains the names of all 50 U.S. states. The code assigns the contents ofstate.name
to a new variableUS_states
, effectively storing the state names in this new variable.- In the general R environment, a variable is a name assigned to any object or data stored in memory, whether it’s a simple value, a vector, or a more complex structure like a data frame. For this reason, I often refer to a variable as “the name of this object” in the general R environment.
- In a data.frame, a variable is a column that represents a particular attribute of the data.frame.
Question 2.
The temp_F
vector contains the average high temperatures in January for the following cities: Seoul, Lagos, Paris, Rio de Janeiro, San Juan, and Rochester.
<- c(35, 88, 42, 84, 81, 30) temp_F
Create a new vector named temp_C
that stores the converted Celsius temperatures. Below is the conversion formula:
\[ C = \frac{5}{9}\times(F - 32) \]
Answer:
<- c(35, 88, 42, 84, 81, 30)
temp_F <- (5/9) * (temp_F - 32)
temp_C temp_C
[1] 1.666667 31.111111 5.555556 28.888889 27.222222 -1.111111
The formula to convert Fahrenheit to Celsius is applied element-wise to the temp_F vector, which stores the temperatures in Fahrenheit. The code then assigns the converted Celsius temperatures to the temp_C vector.
Question 3.
- Write an R code to calculate the standard deviation (SD) of the integer vector
x
below manually. That is to calculate the SD without using thesd()
or thevar()
functions.
<- 1:25 x
- Also, write an R code to test whether the standard deviation you calculate manually above is equal to
sd(x)
.
Answer:
# Manual calculation of standard deviation
<- length(x)
n <- sum(x) / n
mean_x <- sum((x - mean_x)^2) / (n - 1)
variance_manual <- sqrt(variance_manual)
sd_manual
# Test if it is equal to sd(x)
== sd(x) sd_manual
[1] TRUE
- The formula for standard deviation is the square root of the variance. Variance is calculated as the sum of the squared differences from the mean, divided by the number of observations minus 1 (for a sample).
- The
sd_manual
is then compared to the result of the built-insd()
function to ensure correctness.
Question 4.
- Consider the vectors:
<- c(-10, -20, 30, 10, 50, 40, -100)
my_vec <- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT",
beers "GENESEE LIGHT", "MILLER LITE", "NATURAL LIGHT")
- Write an R code to filter only the positive values in
my_vec
. - Write an R code to access the beers that are in positions 2, 4, and 6 using indexing.
Answer:
# Filtering positive values
<- my_vec[my_vec > 0]
positive_values
# Accessing beers in positions 2, 4, and 6
<- c("BUD LIGHT", "BUSCH LIGHT", "COORS LIGHT",
beers "GENESEE LIGHT", "MILLER LITE", "NATURAL LIGHT")
<- beers[c(2, 4, 6)]
selected_beers
positive_values
[1] 30 10 50 40
selected_beers
[1] "BUSCH LIGHT" "GENESEE LIGHT" "NATURAL LIGHT"
- The positive values from
my_vec
are filtered using logical indexing (my_vec > 0
), and the selected beers are accessed using the positions 2, 4, and 6 through direct indexing (beers[c(2, 4, 6)]
).
Question 5.
- Write an R code to read the CSV file,
https://bcdanl.github.io/data/mlb_teams.csv
using thetidyverse
’sread_csv()
function, and assign it toMLB_teams
.
Answer:
library(tidyverse) # to use the read_csv() function
<- read_csv("https://bcdanl.github.io/data/mlb_teams.csv") MLB_teams
- The
read_csv()
function from the tidyverse package is used to read the CSV file from the given URL and assign it to the name,MLB_teams
. This function automatically handles reading in the CSV file and properly parsing the data.
Question 6.
Write an R code to provide descriptive statistics—mean, standard deviation, minimum, first quartile, median, third quartile, and maximum—for variables in the MLB_teams
data.frame.
Answer:
library(skimr)
skim(MLB_teams)
Name | MLB_teams |
Number of rows | 300 |
Number of columns | 56 |
_______________________ | |
Column type frequency: | |
character | 13 |
numeric | 43 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
lgID | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
teamID | 0 | 1 | 3 | 3 | 0 | 31 | 0 |
franchID | 0 | 1 | 3 | 3 | 0 | 30 | 0 |
divID | 0 | 1 | 1 | 1 | 0 | 3 | 0 |
DivWin | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
WCWin | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
LgWin | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
WSWin | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
name | 0 | 1 | 12 | 29 | 0 | 31 | 0 |
park | 0 | 1 | 8 | 31 | 0 | 37 | 0 |
teamIDBR | 0 | 1 | 3 | 3 | 0 | 31 | 0 |
teamIDlahman45 | 0 | 1 | 3 | 3 | 0 | 30 | 0 |
teamIDretro | 0 | 1 | 3 | 3 | 0 | 31 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
yearID | 0 | 1 | 2013.50 | 2.88 | 2009.00 | 2011.00 | 2013.50 | 2016.00 | 2018.00 | ▇▇▇▇▇ |
Rank | 0 | 1 | 3.01 | 1.44 | 1.00 | 2.00 | 3.00 | 4.00 | 6.00 | ▇▃▅▃▁ |
G | 0 | 1 | 161.99 | 0.26 | 161.00 | 162.00 | 162.00 | 162.00 | 163.00 | ▁▁▇▁▁ |
Ghome | 0 | 1 | 80.99 | 0.53 | 78.00 | 81.00 | 81.00 | 81.00 | 84.00 | ▁▁▇▁▁ |
W | 0 | 1 | 80.99 | 11.40 | 47.00 | 73.00 | 81.00 | 90.00 | 108.00 | ▁▅▇▇▂ |
L | 0 | 1 | 80.99 | 11.37 | 54.00 | 72.00 | 81.00 | 89.00 | 115.00 | ▂▇▇▅▁ |
R | 0 | 1 | 707.24 | 73.22 | 513.00 | 650.75 | 707.00 | 755.00 | 915.00 | ▁▆▇▅▁ |
AB | 0 | 1 | 5519.63 | 70.84 | 5294.00 | 5465.00 | 5519.50 | 5565.00 | 5735.00 | ▁▅▇▅▁ |
H | 0 | 1 | 1405.71 | 73.91 | 1199.00 | 1353.50 | 1403.00 | 1452.00 | 1625.00 | ▁▆▇▃▁ |
X2B | 0 | 1 | 278.00 | 25.55 | 219.00 | 260.00 | 276.50 | 294.00 | 363.00 | ▂▇▇▂▁ |
X3B | 0 | 1 | 29.05 | 9.12 | 5.00 | 22.00 | 29.00 | 35.00 | 57.00 | ▁▆▇▃▁ |
HR | 0 | 1 | 167.32 | 35.86 | 91.00 | 141.75 | 164.00 | 191.25 | 267.00 | ▃▇▇▅▁ |
BB | 0 | 1 | 504.87 | 64.18 | 375.00 | 457.00 | 503.00 | 547.00 | 672.00 | ▃▇▇▅▂ |
SO | 0 | 1 | 1235.67 | 135.21 | 905.00 | 1142.75 | 1232.00 | 1324.25 | 1594.00 | ▂▅▇▅▁ |
SB | 0 | 1 | 93.12 | 29.71 | 19.00 | 71.00 | 91.00 | 112.25 | 194.00 | ▂▇▇▂▁ |
CS | 0 | 1 | 35.53 | 9.59 | 13.00 | 29.00 | 34.00 | 42.00 | 74.00 | ▂▇▅▂▁ |
HBP | 0 | 1 | 54.38 | 13.74 | 26.00 | 44.00 | 53.00 | 63.00 | 101.00 | ▃▇▅▂▁ |
SF | 0 | 1 | 41.70 | 7.94 | 24.00 | 36.00 | 41.00 | 47.00 | 64.00 | ▃▇▇▃▁ |
RA | 0 | 1 | 707.24 | 79.76 | 525.00 | 646.75 | 704.50 | 760.00 | 894.00 | ▂▆▇▅▂ |
ER | 0 | 1 | 652.10 | 74.82 | 478.00 | 598.00 | 649.50 | 700.50 | 846.00 | ▂▇▇▅▂ |
ERA | 0 | 1 | 4.06 | 0.49 | 2.94 | 3.71 | 4.04 | 4.37 | 5.36 | ▂▆▇▃▂ |
CG | 0 | 1 | 3.83 | 2.91 | 0.00 | 2.00 | 3.00 | 6.00 | 18.00 | ▇▅▁▁▁ |
SHO | 0 | 1 | 10.36 | 4.07 | 2.00 | 7.00 | 10.00 | 13.00 | 23.00 | ▃▇▇▂▁ |
SV | 0 | 1 | 41.44 | 7.09 | 24.00 | 37.00 | 41.00 | 46.00 | 62.00 | ▂▇▇▃▁ |
IPouts | 0 | 1 | 4341.87 | 40.37 | 4235.00 | 4314.75 | 4340.00 | 4369.00 | 4485.00 | ▂▇▇▂▁ |
HA | 0 | 1 | 1405.71 | 84.18 | 1125.00 | 1350.50 | 1405.00 | 1462.25 | 1637.00 | ▁▃▇▆▁ |
HRA | 0 | 1 | 167.32 | 28.32 | 96.00 | 147.00 | 167.00 | 184.00 | 258.00 | ▂▆▇▂▁ |
BBA | 0 | 1 | 504.87 | 55.82 | 352.00 | 466.00 | 504.00 | 540.00 | 653.00 | ▁▅▇▅▂ |
SOA | 0 | 1 | 1235.67 | 132.99 | 911.00 | 1153.00 | 1231.00 | 1312.50 | 1687.00 | ▂▇▇▂▁ |
E | 0 | 1 | 96.28 | 15.14 | 54.00 | 86.00 | 97.00 | 106.25 | 143.00 | ▁▆▇▃▁ |
DP | 0 | 1 | 144.00 | 17.18 | 95.00 | 133.00 | 144.00 | 155.00 | 190.00 | ▁▅▇▅▁ |
FP | 0 | 1 | 0.98 | 0.00 | 0.98 | 0.98 | 0.98 | 0.99 | 0.99 | ▁▃▇▅▁ |
attendance | 0 | 1 | 2439186.37 | 647151.08 | 811104.00 | 1924926.50 | 2373285.50 | 2988512.50 | 3857500.00 | ▁▆▇▆▃ |
BPF | 0 | 1 | 100.09 | 5.58 | 88.00 | 96.00 | 99.50 | 103.00 | 120.00 | ▃▇▇▁▁ |
PPF | 0 | 1 | 100.09 | 5.45 | 88.00 | 97.00 | 100.00 | 103.00 | 121.00 | ▂▇▅▁▁ |
TB | 0 | 1 | 2243.78 | 154.63 | 1810.00 | 2136.75 | 2235.00 | 2346.25 | 2703.00 | ▁▅▇▅▁ |
WinPct | 0 | 1 | 0.50 | 0.07 | 0.29 | 0.45 | 0.50 | 0.56 | 0.67 | ▁▅▇▇▂ |
rpg | 0 | 1 | 4.37 | 0.45 | 3.17 | 4.02 | 4.37 | 4.66 | 5.65 | ▁▆▇▅▁ |
hrpg | 0 | 1 | 1.03 | 0.22 | 0.56 | 0.88 | 1.01 | 1.18 | 1.65 | ▃▇▇▅▁ |
tbpg | 0 | 1 | 13.85 | 0.95 | 11.17 | 13.19 | 13.81 | 14.47 | 16.69 | ▁▅▇▅▁ |
kpg | 0 | 1 | 7.63 | 0.83 | 5.59 | 7.06 | 7.61 | 8.18 | 9.84 | ▂▅▇▅▁ |
k2bb | 0 | 1 | 2.48 | 0.40 | 1.53 | 2.20 | 2.49 | 2.74 | 3.75 | ▂▇▇▃▁ |
whip | 0 | 1 | 1.32 | 0.07 | 1.16 | 1.27 | 1.31 | 1.37 | 1.56 | ▂▇▆▂▁ |
skimr::skim()
is a function from theskimr
package in R, which provides an enhanced and comprehensive summary of data compared to the traditionalsummary()
function.- It generates descriptive statistics for a vector or for each column (variable) in a data.frame, offering an easy-to-read output with more details than basic summaries.
Discussion
Welcome to our Classwork 1 Discussion Board! 👋
This space is designed for you to engage with your classmates about the material covered in Classwork 1.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) or peer classmate (@GitHub-Username) regarding the Classwork 1 materials or need clarification on any points, don’t hesitate to ask here.
Let’s collaborate and learn from each other!