Compression Group

We help to implement and optimize the resources of advanced deep learning models:

We study methods of automatic search and iteration of optimal neural network architectures that belong to a given class, fulfill the constraints defined by the task and solve the target task with the best quality.

Pruning

We create methods for thinning the weights and relationships of models in order to optimize the resources consumed by the model.

Distillation

We are exploring ways to train light models at the output of heavy analogues without loss as a solution to the final problem.

Quantization

We reduce the bitness of operations and weights of neural network models for the possibility of application on low-board processors, as well as to speed up the calculations of the model.

Evaluation of the potential quality of the model

We create methods for predicting the expected quality of the model on specific samples to automate the selection of the best candidates.

Effective methods of training models

We apply algorithms for automated initialization, optimization, and change approaches to model training to accelerate convergence to the best model configuration.

Compression Group clients form model optimization goals such as reducing OPEX for training and applying neural network models
We have identified the most frequent optimization requests:

Optimization at the stage of model application
Our customers are interested in reducing the resources consumed by the model: RAM, CPU and GPU, SSD, power consumption. Reducing the necessary resources leads to an improvement in user characteristics: the speed of operation, the retention of charge by the device, etc.
Optimization of learning processes
Training of complex neural network architectures takes a lot of time and requires a large amount of computing resources. To save money, it is necessary to automate and optimize the processes of training and choosing the best models.
Preparing to transfer to the device
Saving on resources is also possible when transferring the computing load from a centralized to a decentralized format (that is, to user devices). In order for the device to have enough resources to start up, it is necessary to optimize the model.
Compatibility with new calculators
Analog chips are available on the market, where models can be integrated; low-bit processors that accelerate calculations in low-bit operations, and so on. It is possible to run only those models that fulfill the specified restrictions on them.

Quality

our methods are superior to the quality of ready-made Pwtorch or Tensorflow methods for complex architectures

Measurability

the results of our methods are provided by honest methods of comparing the resources consumed

Flexibility

in adapting the methods to the customer's tasks, the solution architecture and research approach are provided

Guarantees

the results from the compression team are confirmed by a successful project track

The research results are fully transmitted to the client

Software implementation
An easy-to-use library for with readable and reproducible code
Database of materials
A database of materials with reviews for a quick dive into the field and a technical report
Trained models
Parameters of trained models packed in the format required by the client
Anything else
We can prepare project artifacts in the format required by the customer