ML Series: How Likely Is Y, Given X. Association Rule Mining

Association rule mining is an unsupervised learning algorithm which finds patterns in our data, whereby we know how likely it is that Y occurs in the event of X. In other words, it finds features (or dimensions) which appear together frequently.

Note: just because someone buys burgers with ketchup, does not mean someone that buys ketchup will buy burgers. Make sure you keep an eye out for the direction of the relationship.

If we think of an example in real life, consider that you run a grocery store. You have a list of historical transactions. Here is that list:

Burgers, ketchup, rolls, fries
Burgers, cheese, rolls
Ketchup, rolls, fries
Grapes, vegetables, bread, cereal

From the above, we can intuitively say, that in 100% of the instances that someone bought burgers, they also bought rolls but the opposite is not true. Only 66% of transactions where someone bought rolls, did they also buy a burger.

This is the concept behind the Apriori algorithm. It works out the association between two input variables. 

As you can imagine, this would be really useful in a retail setting. If you know which items are often purchased together, you can co locate them within the store; so that buyers don’t forget to pick them up.

Let’s talk terminology:

  • Support: how frequently the item occurs. I.e., if you have 1,000 transactions and bread appears in 50, then the support is 0.05. This is a threshold we can set on our algorithm; at what % of sales does it make that product significant enough to ‘care’.
  • Confidence: is the probability of Y given X. So, how likely is it that, if a customer has purchased bread, they will also purchase butter?
  • Lift: is important. If Y is just as popular as X, it will inherently appear in a basket including X frequently. Lift refers to the increase in the ratio of sales of Y when X is sold. So, if lift is 4.0; Y is four times more likely to be purchased, if X has also been purchased.

    A lift ratio larger than 1, indicates that item Y is likely to be bought if item X is bought. A lift ratio less than 1 indicates that item Y is unlikely to be bought if item X is bought.

    If we have a lift of exactly 1; it implies that the probability of occurrence of X and Y are independent of one another. Hence, there is no association between those two items.
  • Leverage: the probability of X and Y occurring together vs the product of their independent probabilities. If the probability of X and Y occurring together is greater than the independent probabilities; leverage will be greater than zero and the inverse is true. Leverage is zero when the values are statistically independent.
  • Conviction: high conviction means that the consequent is highly dependent on the antecedent; an independent conviction is 1.

Practical Implementation

First, let’s import our required libraries:

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

Then import our data; using encoding to give us 1/0 values and to also convert the dataset to look like the below – where each transaction has a one or zero depending on which product has been purchased:

df = pd.read_csv('/home/Datasets/retail_dataset.csv/retail_dataset.csv')
df1 = pd.get_dummies(df, prefix='', prefix_sep='').max(axis=1, level=0)
df1.to_csv('/home/Datasets/retail_dataset.csv/retail_out.csv')

Next, we implement the algorithm; where min_support is how frequently an item appears (e.g. 0.1 would mean to consider all items which appear in at least 10% of transactions).

freq_items = apriori(df1, min_support=0.1, use_colnames=True)

Now we implement the full matrix:

rules = association_rules(freq_items, metric="confidence", min_threshold=0.1)

OK, let’s look at the sample output; in the below antecedents are those items already in the basket and consequents are those items which could land in the basket. In row 1, we are saying, if someone has purchased bread, what is the likelihood they will also purchase a bagel.

In this example dataset, 50.4% of all transactions contain bread and 42% of all transactions include bagels. These are defined as the antecedent and consequent support.

Then, we look at how frequently these things occur together. Here, it seems that 27.9% of transactions include both bread and bagels.

Now, we look at the confidence – the probability that someone will by a bagel, given that they have also purchased bread is 55.3%; but when taking into account the popularity of both products independently, we get a lift of 1.30. 

Lift refers to the increase in the ratio of sales of Y when X is sold. So, if lift is 4.0; Y is four times more likely to be purchased, if X has also been purchased. In our example of the antecedents of bread and the consequents of bagel, we can see the lift is 1.3, which shows a relationship between X and Y.

Kodey