My (Maybe) Segmentation Journey - I

"Don't put me in a box!"

                                  Rawan Hassunah


When I was approached by another team about what they called a "clustering" project - I immediately took it on. This would be my first machine learning project at my new job and it's been a long time coming! They had previously outsourced a segmentation model based on survey data to help define the type of customers that shop at our stores. They wanted it recreated internally as the previous model was ultimately a black box. 

 

As I was preparing for my first meeting, I realized that the first question I have to ask is "what will this be used for?" This will help me define whether or not I am segmenting or clustering. I make this distinction because when segmenting, you would have preset definitions around what a segment looks like; set similarity metrics that are predefined. Whereas when clustering, you are searching for the similarities that will define your groups. Further, I have to understand if these segments will be used once or will this be on ongoing project that will classify incoming data points. If the first, a simple clustering model will do the trick. If the latter, I have to find a way to accurately define my segments, and use a model that will classify incoming data points without changing my clusters. 

 

Currently, there are two main objectives that are set for this project:

  1. Segment our customers.
  2. Find the attributes that mostly define the segments and reduce the number of questions in the survey based on that. 

 

After taking a look at a subset of the data, which was already populated with countless rows and columns, I realized that the volume and number of variables in my data set made it impossible to create intelligent segments without the use of a model. That being said, I now have to brainstorm ideas around how I would go about this. Do I segment based on set rules for a few variables and slowly introduce more variables to create the best segments, or do I blindly cluster and move on? (just kidding - there's a third choice: more research, more brainstorming, trial and error. Although... the first option is interesting - there's something there!)

 

Stay tuned as I take you on this adventure with me. I'm excited to see where this series will go.