Unsupervised Learning: Association Rules
April 27, 2026
| Transaction ID | Books checked out |
|---|---|
| 1 | The Hobbit, The Princess Bride |
| 2 | The Princess Bride, The Last Unicorn |
| 3 | The Hobbit |
| 4 | The Neverending Story |
| 5 | The Last Unicorn |
| 6 | The Hobbit, The Princess Bride, The Fellowship of the Ring |
| 7 | The Hobbit, The Fellowship of the Ring, The Two Towers, The Return of the King |
| 8 | The Fellowship of the Ring, The Two Towers, The Return of the King |
| 9 | The Hobbit, The Princess Bride, The Last Unicorn |
| 10 | The Last Unicorn, The Neverending Story |
The rule “if X, then Y”: every time you see the item X in a transaction, you expect to also see the item Y (with a given confidence).
support(X): the number of transactions that contain X divided by the total number of transactions in database T.
The confidence of the rule “if X, then Y”: \[
\text{Confidence}(X \Rightarrow Y) = \frac{\texttt{support}(\{X, Y\})}{\texttt{support}(X)}
\]
The goal in association rule mining is to find all the interesting rules with at least a given minimum support (say, 10%) and a minimum given confidence (say, 60%).
read.transactions() reads transaction data in two common formats:
format = "single": each row corresponds to one item in one transaction, usually with a transaction ID and an item name.format = "basket": each row corresponds to one transaction, with multiple items listed in that row.rm.duplicates = TRUE removes duplicate items within the same transaction, so each item appears at most once in a given basket.transactions.transactions object as a 0/1 matrix, with one row for every transaction (a customer) and one column for every possible item (a book).1 if the \(i\)-th transaction contains item \(j\).basketSizes <- size(bookbaskets)
summary(basketSizes)
quantile(basketSizes, probs = seq(0, 1, 0.1))
basketSizes_df <- data.frame(count = basketSizes)
ggplot(basketSizes_df) +
# geom_density(aes(x = count)) +
geom_histogram(aes(x = count)) +
scale_x_log10() +
labs(title = "Distribution of Basket Sizes (log scale)", x = "Basket size", y = "Density")itemFrequency() tells you how often each book shows up in the transaction data.apriori() to find the association rules.The quality measures on the rules include a rule’s support, confidence, the support count, and a quantity called lift.
Lift compares the frequency of an observed pattern with how often you’d expect to see that pattern just by chance: \[ \text{lift} = \frac{\texttt{support}(\{X, Y\})}{\texttt{support}(X) \times \texttt{support}(Y)} \]
Lift less than 1: X and Y occur together less often than expected.
Lift close to 1: X and Y occur together about as often as expected by chance.
Lift greater than 1: X and Y occur together more often than expected.
The larger the lift, the more likely the pattern is real.
X); tells you how often the rule would be applied in the dataset.appearance parameter.default, all the books can go into the left side of the rules.