Success Story - en

Recognizing human gestures using a smartwatch

Motivation for launching the project by the client: in the process of eating, a person makes certain movements with the hands, which can be recognized using data from sensors on smartwatches or fitness trackers. By the number and frequency of these movements, you can determine the amount of eaten food and the eating speed, which can be useful for people who watch what they eat. The task was to build a model to obtain this information from the sensors data.


What we had initially:

  • a set of data from the IMU sensors of a smartwatch, taken while a person is eating;
  • set of videos of people eating; 
  • annotation of typical hand movements while eating for 3% of videos;
  • time from video and sensors is often not synchronized.

Project goals:

  • build an algorithm for detecting each hand movement with the device during meals based on data from the IMU sensors of the smartwatch.

MIL Team's solution: first, we annotated an additional 6% of the videos by using an outsourced team. Before sending the video annotation, we used an open model of face detection with subsequent blooming to anonymize the data. Next, we used the output of the pose estimation model on the video with a meal as input to the gesture recognition model. We ran the trained video annotation model on the remaining 91% of the videos for automatic annotation. This annotation was used to train a gesture recognition model based on IMU data. According to the correlation between the responses of these models to the video and IMU series, the time between the video and the sensors was synchronized. The final model was trained on the already synchronized automatic annotation and sensor data. We also solved the problem of classifying what a person is standing while eating according to data from sensors.


Tools for building the model:

  • the output of pose estimation model on video;
  • outsourcing team for video annotation;
  • opensource model for face detection.

The model results: 

  • Moment of movement of a person's hand with cutlery.

Client: under NDA

Technological stack: Python, PyTorch

Research Group Sensors
Made on
Tilda