DANL Project

Data-Driven Mastery: Unlocking Business Potential

Authors

Name 1

Name 2

Name 3

Name 4

Name 5

1 Introduction

About this project 👏

1 + 1

[1] 2

2 Data

The data.frame mpg contains a subset of the fuel economy data that the EPA makes available on https://fueleconomy.gov/. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car. 🚘

2.1 Summary Statistics

mpg <- ggplot2::mpg

skim(mpg) %>% 
  select(-n_missing)

Data summary
Name	mpg
Number of rows	234
Number of columns	11
_______________________
Column type frequency:
character	6
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
manufacturer	1	4	10	15
model	1	2	22	38
trans	1	8	10	10
drv	1	1	1	3
fl	1	1	1	5
class	1	3	10	7

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
displ	1	3.47	1.29	1.6	2.4	3.3	4.6	7	▇▆▆▃▁
year	1	2003.50	4.51	1999.0	1999.0	2003.5	2008.0	2008	▇▁▁▁▇
cyl	1	5.89	1.61	4.0	4.0	6.0	8.0	8	▇▁▇▁▇
cty	1	16.86	4.26	9.0	14.0	17.0	19.0	35	▆▇▃▁▁
hwy	1	23.44	5.95	12.0	18.0	24.0	27.0	44	▅▅▇▁▁

2.2 MPG and a Type of Cars

The following boxplot shows how the distribution of highway MPG (hwy) varies by a type of cars (class) 🚙 🚚 🚐.

ggplot(data = mpg) +
  geom_boxplot(aes(x = class, y = hwy, fill = class),
               show.legend = F) +
  labs(x = "Class", y = "Highway\nMPG")