Success Story - en

Thematic clustering of dialogues

Motivation for launching the project by the customer:
For CC analysts, it is important to quickly understand the composition of topics in the corpus of dialogues in order to quickly automate the work. Building such a taxonomy entirely manually is a very labor-intensive task that requires automation.

Description of the initial situation:
Automating responses from contact center operators requires having a taxonomy of issues that clients address. Such a taxonomy will allow categorization of requests and their subsequent processing. When collaborating with a large number of contact centers on various topics, a system for quickly analyzing a corpus of dialogues is needed. It is required to create a tool for automatically constructing ready-made taxonomies for dialogue corpora.

MIL Team solution:
We asked our partner for a labeled sample of synonymous dialogues, which helped us compare different models and configure its parameters to solve a specific problem.
We tested several methods for solving the problem: various neural network approaches to paraphrase retrieval and hierarchical multimodal topic models. Topic models performed better.
The final solution was packaged in a Docker container that implemented the business logic required by the partner.

  • Reducing the load on the analyst
  • Reduced time to identify new categories
  • Definition of new intents in the request flow

Allowed difficulties
  • Model resistant to changing themes
  • Stability of the model when changing the size of the text corpus
  • Correction of typos (including for a corpus with very specific vocabulary)

Customer: Telecom
Technology stack: TopicNet, BigARTM, Flask, Python, PyTorch, gensim, UMAP
NLP Research Engineering CC Prompter
Made on