Three Machine Learning Techniques for Building Great Targeted Marketing Strategy

Aji Samudra
Life at Telkomsel
Published in
6 min readFeb 21, 2021

--

Image Source: Unsplash

Nowadays, companies have a vast amount of information about their customer, whether it is something related to past interactions on websites or apps, or information that comes from third parties to give them a better understanding of the customer. Having this valuable information will enable companies to do things differently and positively to impact their business; such as to enhance their targeted marketing strategy so that they can offer more personalized products to customers.

In this article, we will explore how we could utilize the data to perform three different ML techniques for building a great targeted marketing strategy. The three techniques are (1) Clustering, (2) Predictive Machine Learning, (3) Causal Machine Learning.

1. Clustering

Photo by Markus Spiske on Unsplash

What is it?
The task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. (Wikipedia 2021)

How could we use it?
The output of this exercise is clustered objects which in this case customers. The characteristics of each cluster really depend on the features that we use to describe one customer. It is recommended to have subject-matter experts that could guide what features could explain the designated characteristics of clusters that want to distinguish.

Having similar customers inside one cluster and different from other clusters, we then could (1) understand better market opportunities; (2) iterate marketing initiative that works for the clusters; (3) design special products for them!

When to use it?
When you want to get inspired about the population of interest!

What we need to create one?
A historical dataset that only has features *without target variable.

Notes
1. Some available clustering algorithms might suffer from scale, please mind the number of samples as well as the number of features used.
2. It is suggested to scale the features to the same value range so features with a high-value range won’t dominate the distance value.

2. Predictive Machine Learning

Photo by David Travis on Unsplash

What is it?
It is referred to as supervised machine learning. The task of learning a function that maps an input (features) to an output (target variable) based on example input-output pairs. (Wikipedia 2021)

How could we use it?
The output from supervised machine learning could be a probability or real value depends on the target variable from which it learns. We could adjust the strategy on top of the predicted future value. To make it clearer, let’s see 2 examples of the different problems at hand. Let’s say your team is responsible for the revenue and marketing cost of product A. Your team has built a model that learns whether a customer will buy product A in the next month or not with great accuracy. Then you might have 2 different objectives (1) to increase total revenue or (2) to reduce marketing cost. For simplicity, we’ll focus on the segment of customers with a high propensity to buy and see a different strategy for each of the objectives.

In the context of increasing total revenue and assuming the segment is likely to buy organically. We also assume that they might need other products and will likely buy them too. We bundle product A with product B and try to offer it to customers, so we have potential additional revenue!

In the context of reducing marketing costs and assuming the segment is likely to buy organically. We decide they won’t need any marketing intervention to materialize their purchase, so we could reduce the total marketing cost!

When to use it?
1. Have clear business metrics to optimize.
2. Have a target variable for the model to optimize. It could be the same with business metrics or correlated with it.
3. Have enough observation/data of the target variable.
4. Have relevant features so the model could learn the correlation between features and the target variable.
5. Have representative training data.

What we need to create one?
A historical dataset that has features + target variable. The type of target variable could be a binary or continuous variable and it should align with the problem which the company wants to address.

Notes
Notice that our assumptions in defining a new strategy might be wrong since we don’t exactly know whether customers will keep buying the product if we apply the strategy in the future. That’s because supervised machine learning is best in finding the correlation between features and target variable from the past, but couldn’t incorporate what-if question. This kind of question will be addressed with the next technique: Causal Machine Learning.

3. Causal Machine Learning

Photo by Mitchell Luo on Unsplash

What is it?
From the previous case, we understand there is a gap between the output from the predictive model and the decision-making on top of that. Let’s explore it a little more. In the context of increasing total revenue, what if there is a group of customers who won’t buy product A because it’s bundled with product B, while there is another group of customers who are more willing to buy if product A is bundled with product C. Which decision should we take to maximize the total revenue?

To find the optimal decision, we need to learn the treatment effect for each product that will be paired with product A. We could call the additional product B or product C here as intervention, an action that might affect the outcome (total revenue) in the future.

Causal machine learning learns to estimate the treatment effect of one intervention given characteristics of treated samples.

How could we use it?
One model estimates the treatment effect of one intervention. The treatment effect for different customers may vary. To maximize the total revenue, we could:
1. Calculate different treatment effects of several interventions for each customer/segment of customers.
2. Choose the best intervention that maximizes the outcome.
3. Leave customers without intervention IF all the treatments give a negative effect on the outcome.

When to use it?
1. All 5 in predictive machine learning.
2. Have various marketing treatments that are proven to affect the target variable.
3. Want to choose the best treatment that drives maximal treatment effect for each customer or group of customers.

What we need to create one?
An experimental dataset from a randomized controlled test or A/B testing. It must have one control and treatment group. Each unit in the experimental dataset either in the control or treatment group. Just like predictive machine learning, it needs features + target variables.

Notes
Various libraries provide the implementation of causal machine learning e.g. EconML, CausalML, DoWhy, Pylift.

Bringing data to life

At Telkomsel, With a huge user base (>170 million *at the time of writing this article), it is really challenging to know the characteristics of our customers and have a great strategy for providing products that might fit their needs.

Telkomsel has been employing these techniques for building its marketing strategy and planning to continuously implement them to the new set of problems.

I hope this post provides perspective on the considerations of which ML techniques could be the solution for the problem at hand. They are powerful when you know how to use them, either individually or collectively.

--

--