In this post, I will be explaining the uses and how can one apply association analysis onto solving real-life issues. Let’s begin by defining Association analysis.

What is Association analysis?

In short, association analysis is used to determine how input variables are associated with the outputs or the relationships between them. Inputs are termed “antecedents” and outputs are termed “consequents”.

Possible applications and uses for Association analysis often include the Market basket analysis. The Market basket analysis is useful to determine what items are frequently purchased by consumers. By using the results obtained, the store can create new discounts, bundles or even a change in their layout to increase sales of targeted products.

An example of the Market basket transactions are as follows:

Itemset 1: {jam,ham,bread}

Itemset 2: {jam,milk,bread}

Itemset 3: {milk,rice,jam}

By running an Association analysis on the Market basket transactions, the analyst can obtain various relationships between the items a customer buys. For example, jam -> bread (If a customer buys jam, he/she may buy bread).

One of the most commonly used algorithms for Association analysis is the Apriori algorithm. The Apriori algorithm generates association rules in the form of antecedents and consequents, as mentioned above.

Where X = antecedent and Y = consequent and the rule = X -> Y. And the chance of X occurring is termed the “support” and Y as the “confidence.”

However, unlike usual logical rules, association rules involve some level of uncertainty. To quantify this uncertainty, we can apply the Support and Confidence Framework.

The framework incorporates the Rule support, which is the percentage of X and Y appearing together and the Confidence that Y appears when X occurs.

Rule Support = P(X and Y occurring together)

Confidence = [P(X and Y) / P(X)]

Additionally, the Apriori algorithm works best with categorical data in a tabular or transactional format. It does not work well with numeric data. For that, we would have to bin or convert numeric data into categories which I would not explain in too much detail in this post.

Tabular data format, aka truth-table or basket data, is represented by having a flag field indicating the absence or presence for each item as seen in the table below.

ID Item 1 Item 2 Item 3 Cust 1 T T T Cust 2 F F T Cust 3 T F F

Unlike the tabular data format, the transactional format has a separate record for each transaction or item as seen in the table below.

ID Items Cust 1 A Cust 2 B Cust 3 C

Thus, by applying the Apriori algorithm, we can generate rules based on user-specified support and confidence %. This can be seen as the threshold for which association rules are created.

However, not all rules with high support and confidence value are useful. For example: If nearly all customers buy jam and almost all customers buy bread, the confidence will be high regardless of whether there is any real association between these variables.

There are also alternatives which one can use to establish association rules. Several techniques include:

Confidence Difference

Confidence Ratio

Information Difference

Normalised Chi-Square

Well, this pretty much sums up Association analysis, what would you apply Association analysis on?