Middle+/Senior ML Engineer
Fork: 250-400k net for full-time

Format: full-time or part-time (20 hours/week), remote work (including outside the Russian Federation)

Description:
We do cool research projects as a team mil-team.ru , we publish scientific articles and develop the product compressa.ai to optimize the inference of LLM models. We make LLMs (and not only them) work in a loop, cheaper, faster. We are looking for a cool and proactive specialist to join the key team.

What problems will you solve:
  • Develop the inference engine for LLM models (making it faster than vllm);
  • Propose and bring to market improvements in LLM compression methods;
  • Implement SotA inference and model compression technologies into the platform.

Your experience:
  • Used and modified grid launch frameworks (ONNX, TensorRT-LLM, llama.cpp, Vllm) + wrote custom CUDA kernels;
  • Applied and modified LLM software optimization methods, you know SotA: sparsification, quantization, distillation;
  • Trained LoRA adapters or feature tune LLM models.

Next steps:
  • Attach your CV (it would be cool if there are links to your open source code in git);
  • A brief description (up to 3 paragraphs) of your experience in the topic;
  • Motivashku (1 paragraph), why we are on the same path;
  • We will review the application and interview the team.
DO YOU WANT TO BE ON THE TEAM?
SUBMIT YOUR RESUME!
CV
Name CV_Name_Surname.pdf
Why do you like us as a team? Why choose our position? You can say a few words about yourself and your plans.
Description of example research projects.
This website uses cookies to ensure you get the best experience
OK
Made on
Tilda