Motivation for the customer to launch the project: the customer needed to add new functionality to their own product - the ability to search for a translation of a scientific article among the most common languages.
Description of the initial situation: Antiplagiarism did not have such functionality for searching translations of scientific articles; there was a need to add new functionality.
Project goals: build a topic model that can be used to solve two problems with a high level of quality: the problem of semantic search for the translation of scientific articles, as well as the problem of classifying scientific articles relative to scientific headings.
MIL Team solution: the team’s experience in the field of topic modeling and microservice architecture made it possible to create a service for searching translations of scientific articles and determining scientific headings of articles, which can be launched in a virtual machine.
To build the model we used:
A parallel corpus of scientific articles from the elibrary website;
A parallel corpus of Wikipedia articles in 100 languages;
Labels of belonging to scientific headings of different rubricators (UDC, OECD).