Categorization workflows
Best practices to fully leverage SimpleX AI features
Stephane_SimpleDecisions
Last Update 2 years ago
Definitions
When it comes to textual comments, classification, categorization, clustering, topic modeling all relate to the same basic intent: group together in a consistent way the different answers into groups, clusters, categories, topics or any equivalent name.
In SimpleX, we use the term topic to describe these groups and in particular the groups of text answers which are semantically consistent ie groups containing answers that are semantically similar ie that roughly talk about the same thing!
We use the term clustering to describe the action of generating these topics.
A categorization is the peculiar way of grouping the answers into topics, illustrated by a topic treemap.

There are tow types of categorization:
Mono-label: the answers are grouped into mutually exclusive topics. One answer belongs to one topics only. Ideally all answers belong to one topic.
Multi-label: the answers are grouped into topics. One answer may belong to zero, one or several topics.
With SimpleX, you are free to select mono or multi categorization, as ne answer can belong to several topics.
Selecting the right way for your business problem
SimpleX is the most universal and flexible AI-augmented AI platform. It means there are multiple ways and workflows to perform an analysis, depending on your specific objective, your data set, your ways of thinking… It also means you need to follow one of the workflow.
This article aims at giving you clarity on the different workflows available and the criteria upon which select one vs the other, considering the optimal workflow is the one that maximize the ratio Performance/Time spent
Even for one given use case, there is not one single workflow. For instance, it will depend on:
- dataset size
- mono or multi label
- nature of categories
- length/complexity of text answers
4 categorization workflows
There are 4 different approaches that you can follow when it comes to categorization in SimpleX:
- Industry-specific/Custom models
- Sequential
- Top-down
- Discovery
Industry-specific or Custom models
Categorizing datasets along a stable list of preset categories, that are specific to a given industry, in order to analyze different datasets with the same analytical grid over time.
There are several industry-specific models available for Premium Plans in SimpleX. It is also possible to build and embed a custom model in SimpleX.
Pros: minimum workload, 10k quotes analyzed in 30 sec in one click
Cons: requires to stick the same categories over time and datasets
Sequential
Building topics one after the other, starting with the most obvious/visible ones and ending with the outliers and exceptions.
Pros: most intuitive, step-by-step approach, easy to share and explain, universal
Cons: requires some analysis upfront to detect topics, more work
Top-down
Defining upfront a coding plan of categories and feed it with the different text answers
Pros: best control, guarantee about end-result look and feel
Cons: requires some analysis upfront to detect topics, more checks
Discovery
Using AI-driven automatic clustering as a starting point for categorization
Pros: immediate initial results, non-prejudiced approach, universal
Cons: requires fine tuning and checks for a good precision
To learn more about each approach, read the detailed article