Categorization workflows

Best practices to fully leverage SimpleX AI features

Stephane_SimpleDecisions

Last Update 2 years ago

Definitions

When it comes to textual comments, classification, categorization, clustering, topic modeling all relate to the same basic intent: group together in a consistent way the different answers into groups, clusters, categories, topics or any equivalent name.


In SimpleX, we use the term topic to describe these groups and in particular the groups of text answers which are semantically consistent ie groups containing answers that are semantically similar ie that roughly talk about the same thing!

We use the term clustering to describe the action of generating these topics. 

A categorization is the peculiar way of grouping the answers into topics, illustrated by a topic treemap.

There are tow types of categorization:

Mono-label: the answers are grouped into mutually exclusive topics. One answer belongs to one topics only. Ideally all answers belong to one topic.

Multi-label: the answers are grouped into topics. One answer may belong to zero, one or several topics.


With SimpleX, you are free to select mono or multi categorization, as ne answer can belong to several topics.


Selecting the right way for your business problem

SimpleX is the most universal and flexible AI-augmented AI platform. It means there are multiple ways and workflows to perform an analysis, depending on your specific objective, your data set, your ways of thinking… It also means you need to follow one of the workflow.


This article aims at giving you clarity on the different workflows available and the criteria upon which select one vs the other, considering the optimal workflow is the one that maximize the ratio Performance/Time spent


Even for one given use case, there is not one single workflow. For instance, it will depend on:

  • dataset size
  • mono or multi label
  • nature of categories
  • length/complexity of text answers

4 categorization workflows

There are 4 different approaches that you can follow when it comes to categorization in SimpleX:


  • Industry-specific/Custom models
  • Sequential
  • Top-down
  • Discovery


Industry-specific or Custom models

Categorizing datasets along a stable list of preset categories, that are specific to a given industry, in order to analyze different datasets with the same analytical grid over time.

There are several industry-specific models available for Premium Plans in SimpleX. It is also possible to build and embed a custom model in SimpleX.


Pros: minimum workload, 10k quotes analyzed in 30 sec in one click

Cons: requires to stick the same categories over time and datasets


Sequential

Building topics one after the other, starting with the most obvious/visible ones and ending with the outliers and exceptions.


Pros: most intuitive, step-by-step approach, easy to share and explain, universal

Cons: requires some analysis upfront to detect topics, more work


Top-down

Defining upfront a coding plan of categories and feed it with the different text answers


Pros: best control, guarantee about end-result look and feel

Cons: requires some analysis upfront to detect topics, more checks


Discovery

Using AI-driven automatic clustering as a starting point for categorization


Pros: immediate initial results, non-prejudiced approach, universal

Cons: requires fine tuning and checks for a good precision

To learn more about each approach, read the detailed article 


Was this article helpful?

2 out of 2 liked this article

Still need help? Message Us