R Basics
September 13, 2024
Descriptive statistics condense data into manageable summaries, making it easier to understand key characteristics of the data.
\[ \overline{x} = \frac{x_{1} + x_{2} + \cdots + x_{N}}{N} \]
mean()
calculates the mean of the values in a vector.
median()
calculates the median of the values in a vector.The mode is the value(s) that occurs most frequently in a given vector.
Mode is useful, although it is often not a very good representation of centrality.
The R package, modest
, provides the mfw(x)
function that calculate the mode of values in vector x
.
\[ (\text{range of x}) \,=\, (\text{maximum value in x}) \,-\, (\text{minimum value in x}) \]
max(x)
returns the maximum value of the values in a given vector \(x\).min(x)
returns the minimum value of the values in a given vector \(x\).\[ \overline{s}^{2} = \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \]
var(x)
calculates the variance of the values in a vector \(x\).\[ \overline{s} = \sqrt{ \left( \frac{(x_{1}-\overline{x})^{2} + (x_{2}-\overline{x})^{2} + \cdots + (x_{N}-\overline{x})^{2}}{N-1}\;\, \right) } \]
sd(x)
calculates the standard deviation of the values in a vector \(x\)quantile(x)
quantile(x, 0) # the minimum
quantile(x, 0.25) # the 1st quartile
quantile(x, 0.5) # the 2nd quartile
quantile(x, 0.75) # the 3rd quartile
quantile(x, 1) # the maximum
A CSV (comma-separated values) file is a text file in which values are separated by commas.
CSV files are most commonly encountered in spreadsheets and databases.
Example
https://bcdanl.github.io/data/tvshows.csv
getwd()
returns the pathname of the working directory./cloud/project/
custdata_rev.csv
/Users/user/documents/data/custdata_rev.csv
C:\\Users\\user\\Documents\\data\\custdata_rev.csv
Path relative to the working directory.
Example:
custdata_rev.csv
is /Users/user/documents/data/custdata_rev.csv
./Users/user/documents/
.custdata_rev.csv
is dada/custdata_rev.csv
.When using the Posit Cloud project, we can use a relative path to read a file.
read_csv()
function to read a comma-separated values (CSV) file.Download the CSV file, custdata_rev.csv
from the Class Files module in our Brightspace.
Create a sub-directory, data
, by clicking “New Folder” in the Files Pane in Posit Cloud.
Upload the custdata_rev.csv
file to the sub-directory data
.
Provide the relative pathname for the file, custdata_rev.csv
, to the read_csv()
function.
View()
displays the data in a simple spreadsheet-like grid.dim()
shows how many rows and columns are in the data for data.frame
.nrow()
and ncol()
shows the number of rows and columns for data.frame
respectively.skimr::skim()
refers to the skim()
function from the skimr
package.
summary()
, offering both numerical and categorical summaries.