Consumers store correlation with Association Rules

Association Rules and Apriori analysis to better understand customers' store usage during a shopping mall campaign
Chord diagram for customers

A shopping mall is running a promotion where customers can send their invoices to have a chance at winning a few different prizes. Usually, shopping malls don't have transactional information from clients since purchases are made directly through the stores, but with promotional events, we now have access to this valuable data. How can we use this data to better understand our customer's journey during the campaign?


We can take inspiration from a similar problem of Market Basket Analysis. The classic example of this analysis is the unexpected correlation of sales of diapers and beer in a store, most likely due to young parents making some late night shopping. The store uses this discovery, places the two items together, and sales skyrocketed. (At least that how the legend goes)

Instead of looking the correlation of items bought together in a store, we can take a step back, and look for the stores that a same client bought in both of them.

We use the Apriori and Association Rules algorithms to get pairs of stores that our clients frequently purchase in both of them. We can tweak and change what percentage is a good threshold to define what is "frequent" to find a reasonable number of pairs in our set.

With this analysis, we will generate a table with information about the frequent pairs, such as how many times each store appears on their own, they appear together, and the probability of both happening together.

To make it easier to share and understand all this information, a Chord diagram was generated that shows us all the stores that are frequent and creates chords connecting the ones that are correlated. We can go even further and add new information when you hover over a chord, where we show additional information about each store, with the total amount of visits each store has, and average value spent on only one of the stores, or both of them together.

This allows us to extract new insights to improve stores mix in the shopping mall while prioritizing commercial demands, and also evaluate the best way to stimulate the customers, understanding which pairs of stores promote upsell and also those that are impacted through downsell.


To use the Apriori algorithm, we need to adapt our dataset into a sparse matrix where each client corresponds to a row and each column will be a store from our dataset (if we have 1000 stores, we will have 1000 columns here). To create our matrix from our dataset, we can group by clients to get every unique store they visited and then using the get dummies function to split the stores into columns.

We use the mlxtend library for the Apriori and the Association Rules algorithms. Here, we can define our minimum support that will define a frequent pattern when something happens more than X% in our dataset.

This will generate a table with all frequent patterns of antecedents and consequents, and some metrics such as support ("this pattern happend Y% of the time") and confidence ("Z% of the people that bought in this antecedent also bought in this consequent"), to name a few.

To better visualize our results, we create a chord diagram using the Chord library to generate the image above, where each chord corresponds to the amount of times both stores appeared together. If we hover over each chord, we can also plot more information about this pattern, like the average purchase value of visiting both stores or only one of them.

You can also read more about this analysis in Earnings Release 4Q20 (page 14) at

Direct link ➔

Other projects

Get in touch

If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.