Senior/ Staff Machine Learning Engineer
We foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering our team. As part of the expanding VirtusLab Group, we offer a compelling environment for those seeking to make a substantial impact in the software industry within a forward-thinking organization.
About the role
Join our team to drive business innovation with production-ready machine learning pipelines. You will play a key role in deploying and maintaining ML workflows, leveraging Azure for cloud computing and on-prem clusters for ETLs. Collaborating closely with Data Scientists, you will contribute to AI-powered projects while shaping the organization’s technical culture.
We provide price optimisation solutions in close collaboration with a major UK retailer’s data science team. Together, we build projects to enable quick exploration and productionisation of the ML models and respective optimisation algorithms in a hybrid-cloud environment. The end goal is to provide APIs for the optimiser to solve pricing class problems across multiple business domains.
- Implementation of the end-to-end Machine Learning Lifecycle, starting from data preparation over experimentation to continuous monitoring of data and models in production
- PySpark data pipelines to load and transform large amounts of data to produce smaller, significant features used in modelling and analytics.
- Provisioning all cloud resources that support the model development in AzureML with IaC using Terraform.
- The selection of the best architectural patterns to solve business problems.
- Building robust and maintainable code in the cloud and on-prem to bring models fast and reliably to production.
- Establishing a mature DevOps culture and working on solutions that are reusable in multiple business domains.
Python (pySpark, Airflow, Azure SDK), Spark K8S cluster, Azure ML and IaC with Terraform, CI/CD with GitHub Actions, Docker
There are many pricing problems in the global retailer world which share similar structures and constraints. We focus on building robust solutions that are reusable across multiple domains, leveraging a hybrid on-prem and cloud infrastructure, and ensuring top quality while maintaining quick iterations.
The core engineering team consists of 5-6 talented people in Poland. We collaborate closely with the client’s product management and data science team.
Our client is a NASDAQ-listed B2B data company powering Go-To-Market strategies with a 360-degree view of every customer, a view whose value depends on the quality of billions of person and company records.
Anomalsky is the ML system we built to catch what traditional observability misses: row-level semantic anomalies (e.g., a first_name, title, company_name). Three layers, an ML layer (embeddings + unsupervised clustering) flags suspicious records at scale, an LLM layer removes false positives and explains each cluster, and an optional human-in-the-loop lets domain experts resolve whole clusters at once. The MVP already drove ~40k crucial record corrections in production.
What’s next: the MVP is landing on GCP now. Once it’s operational, the mission is to scale Anomalsky across the entire organization, embedding it into Acquisition pipelines and building a real-time variant that scans data before it reaches customers.
- Productionize Anomalsky on GCP and scale it to operational, organization-wide use.
- Evolve the ML / LLM / human-in-the-loop design and the feedback loop that turns expert reviews into reusable knowledge.
- Prototype the low-latency real-time variant.
- Integrate Anomalsky into existing workflows, starting with Acquisition.
Python, Airflow, BigQuery, Snowflake, Spark (Dataproc), Databricks, Iceberg, Starburst, Trino, AWS, GCP, Docker, Terraform, Jenkins, GitHub, Scikit Learn, unsupervised anomaly detection (kNN, Isolation Forest, autoencoders), recursive clustering, classifiers on real + synthetic data, MLflow, LLM-based reasoning.
ML and data engineers from VirtusLab working alongside customer data engineers, a manager.
What we expect in general:
- 5+ years of hands-on machine learning engineering experience
- Hands-on experience in deploying Python projects.
- Strong experience in writing high-quality Python code.
- Experience with orchestration tools such as Airflow.
- Knowledge of Spark or other distributed data processing tools.
- Experience with Kubernetes ecosystem as a user.
- Strong experience in Azure and Docker
- Ability to work in a team and participate in the design process.
- Good command of English (B2/C1).
- Strong communicator
- Team player with mentoring ability
- Proactive and responsible
- Strategic thinker with big-picture perspective
- A hybrid model is preferred (1-2 days per week in the Kraków office); alternatively, candidates must be available for on-site collaboration as required (approx. once a month).
Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!
A few perks of being with us
Apply now
"*" indicates required fields