Software Engineer with ML and Data Skills
VirtusLab is a leading European software consulting and engineering company. Our mission is to craft clean code and practical solutions with precision and purpose. We foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering professionals to make a substantial impact in the software industry.
About the role
Productionizing and scaling an ML-driven data quality system across the organization. The scope of services involves: building and tuning anomaly-detection and clustering pipelines, pairing classic ML with LLM reasoning to flag and explain issues, collaborating with data producers to fix root causes, and creating as well as maintaining validator models that turn detected anomalies into better future data.
Our client is a NASDAQ-listed B2B data company powering Go-To-Market strategies with a 360-degree view of every customer, a view whose value depends on the quality of billions of person and company records.
Anomalsky is the ML system built to catch what traditional observability misses: row-level semantic anomalies (e.g., a first_name, title, company_name). Three layers, an ML layer (embeddings + unsupervised clustering) flags suspicious records at scale, an LLM layer removes false positives and explains each cluster, and an optional human-in-the-loop lets domain experts resolve whole clusters at once. The MVP already drove ~40k crucial record corrections in production.
What’s next: the MVP is landing on GCP now. Once it’s operational, the mission is to scale Anomalsky across the entire organization, embedding it into Acquisition pipelines and building a real-time variant that scans data before it reaches customers.
- Productionizing Anomalsky on GCP and scaling it to operational, organization-wide use.
- Evolving the ML / LLM / human-in-the-loop design and the feedback loop that turns expert reviews into reusable knowledge.
- Prototyping the low-latency real-time variant.
- Integrating Anomalsky into existing workflows, starting with Acquisition.
Python, Airflow, BigQuery, Snowflake, Spark (Dataproc), Databricks, Iceberg, Starburst, Trino, AWS, GCP, Docker, Terraform, Jenkins, GitHub, Scikit Learn, unsupervised anomaly detection (kNN, Isolation Forest, autoencoders), recursive clustering, classifiers on real + synthetic data, MLflow, LLM-based reasoning.
ML and data engineers from VirtusLab collaborating with customer data engineers and product management.
What we expect in general
- Strong Python and production ML skills, with a proven track record of shipping models into real production pipelines.
- Hands-on experience using classic ML to surface data quality issues at scale: unsupervised anomaly detection (kNN, Isolation Forest, autoencoders) and clustering on messy real-world tabular data.
- Practical experience pairing classic ML with LLMs: using models to flag suspicious records and LLMs for reasoning, false-positive filtering, and the final verification of anomalies.
- Solid data engineering background across the modern stack (Airflow, Spark/Dataproc, BigQuery, Snowflake, Iceberg/Trino) and the production toolchain (GCP, Docker, Terraform, CI, MLflow).
- Pragmatic, product-oriented approach focused on incremental value delivery and seamless integration into existing workflows.
- Professional fluency in English, enabling smooth technical and business discussions in an international environment.
Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!
A few perks of being with us
Apply now
"*" indicates required fields