Motivation for the customer to launch the project:
the customer needed to add new functionality to their own product - the ability to search for a translation of a scientific article among the most common languages.
Description of the initial situation:
Antiplagiarism did not have such functionality for searching translations of scientific articles; there was a need to add new functionality.
Project goals:
build a topic model that can be used to solve two problems with a high level of quality: the problem of semantic search for the translation of scientific articles, as well as the problem of classifying scientific articles relative to scientific headings.
MIL Team solution:
the team’s experience in the field of topic modeling and microservice architecture made it possible to create a service for searching translations of scientific articles and determining scientific headings of articles, which can be launched in a virtual machine.
To build the model we used:
Simulation results:
Client: Antiplagiarism
Technology stack: grpc, Python, sklearn, BigARTM
the customer needed to add new functionality to their own product - the ability to search for a translation of a scientific article among the most common languages.
Description of the initial situation:
Antiplagiarism did not have such functionality for searching translations of scientific articles; there was a need to add new functionality.
Project goals:
build a topic model that can be used to solve two problems with a high level of quality: the problem of semantic search for the translation of scientific articles, as well as the problem of classifying scientific articles relative to scientific headings.
MIL Team solution:
the team’s experience in the field of topic modeling and microservice architecture made it possible to create a service for searching translations of scientific articles and determining scientific headings of articles, which can be launched in a virtual machine.
To build the model we used:
- A parallel corpus of scientific articles from the elibrary website;
- A parallel corpus of Wikipedia articles in 100 languages;
- Labels of belonging to scientific headings of different rubricators (UDC, OECD).
Simulation results:
- Thematic model of scientific rubrics;
- A virtual machine on which the model can be run.
Client: Antiplagiarism
Technology stack: grpc, Python, sklearn, BigARTM