Motivation for the customer to launch a project: searching and comparing optimal configurations of neural networks is an important stage in the research or implementation of a knowledge-intensive project, but it takes a lot of time and requires a significant investment of computing resources. The situation becomes especially complicated when you need to compare many architectures on a large dataset. Therefore, the task was set to create a fast and accurate method for comparing neural networks, which does not require a complete training procedure for all compared architectures.
Description of the initial situation:
there are a large number of promising configurations of neural network architectures from a certain space;
there is a large dataset - ImageNet, on which the quality of these architectures should be assessed;
quality assessment is about correctly ranking architectures relative to each other. A completely correct ranking is considered to be the ranking of architectures when they are trained individually from scratch on the entire dataset until convergence.
Existing NAS approaches focus on finding one or more “best” architectures. This problem requires correctly ranking models from the entire search space that are potentially trainable. A special search space that differs from that used in the literature on the topic.
Project goals: Design and implementation of methods for quickly and accurately comparing neural network architectures. The implemented methods should significantly exceed the speed of the direct method of complete single training of networks, with a slight drop in the quality of ranking. In particular, a tenfold acceleration in comparison of architectures should be obtained with a loss in ranking quality of no more than 10% (in terms of the ranking metric - Kendall Tau).
MIL Team solution: implementation, analysis and improvement of various methods for assessing the quality of architectures. One of the implemented methods is to create a super-network based on the space of the considered architectures. This approach allows you to train all models from space in one-shot mode and can significantly save time and computing resources. In addition, methods such as assessing model quality using less training data, early stopping of training, and the use of classifiers and regression models are considered.
To build the model we used:
Open datasets ImageNet and Cifar10;
A dataset of architectures indicating the quality of their complete training on ImageNet.