Software Engineer with ML and Data Skills

B2B140 - 185 PLN NET PER HOUR

LOCATION + Remote Poland

VirtusLab is a leading European software consulting and engineering company. Our mission is to craft clean code and practical solutions with precision and purpose. We foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering professionals to make a substantial impact in the software industry.

About the role

Productionizing and scaling an ML-driven data quality system across the organization. The scope of services involves: building and tuning anomaly-detection and clustering pipelines, pairing classic ML with LLM reasoning to flag and explain issues, collaborating with data producers to fix root causes, and creating as well as maintaining validator models that turn detected anomalies into better future data.

Python Expert

Airflow Advanced

Spark (Dataproc) Advanced

Scikit-learn Advanced

Apache Iceberg Advanced

BigQuery Regular

Snowflake Regular

Trino/Starburst with Iceberg Regular

AWS / GCP Regular

GitHub Actions Regular

Jenkins Basic

Terraform Basic

Docker Basic

Project

Anomalsky

Project Scope

Our client is a NASDAQ-listed B2B data company powering Go-To-Market strategies with a 360-degree view of every customer, a view whose value depends on the quality of billions of person and company records.

Anomalsky is the ML system built to catch what traditional observability misses: row-level semantic anomalies (e.g., a first_name, title, company_name). Three layers, an ML layer (embeddings + unsupervised clustering) flags suspicious records at scale, an LLM layer removes false positives and explains each cluster, and an optional human-in-the-loop lets domain experts resolve whole clusters at once. The MVP already drove ~40k crucial record corrections in production.

What’s next: the MVP is landing on GCP now. Once it’s operational, the mission is to scale Anomalsky across the entire organization, embedding it into Acquisition pipelines and building a real-time variant that scans data before it reaches customers.

The scope of cooperation covers

Productionizing Anomalsky on GCP and scaling it to operational, organization-wide use.
Evolving the ML / LLM / human-in-the-loop design and the feedback loop that turns expert reviews into reusable knowledge.
Prototyping the low-latency real-time variant.
Integrating Anomalsky into existing workflows, starting with Acquisition.

Tech Stack

Python, Airflow, BigQuery, Snowflake, Spark (Dataproc), Databricks, Iceberg, Starburst, Trino, AWS, GCP, Docker, Terraform, Jenkins, GitHub, Scikit Learn, unsupervised anomaly detection (kNN, Isolation Forest, autoencoders), recursive clustering, classifiers on real + synthetic data, MLflow, LLM-based reasoning.

Project environment

ML and data engineers from VirtusLab collaborating with customer data engineers and product management.

What we expect in general

Strong Python and production ML skills, with a proven track record of shipping models into real production pipelines.
Hands-on experience using classic ML to surface data quality issues at scale: unsupervised anomaly detection (kNN, Isolation Forest, autoencoders) and clustering on messy real-world tabular data.
Practical experience pairing classic ML with LLMs: using models to flag suspicious records and LLMs for reasoning, false-positive filtering, and the final verification of anomalies.
Solid data engineering background across the modern stack (Airflow, Spark/Dataproc, BigQuery, Snowflake, Iceberg/Trino) and the production toolchain (GCP, Docker, Terraform, CI, MLflow).
Pragmatic, product-oriented approach focused on incremental value delivery and seamless integration into existing workflows.
Professional fluency in English, enabling smooth technical and business discussions in an international environment.

Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!

A few perks of being with us

Building tech community

Flexible hybrid work model

Home office reimbursement

Language lessons

MyBenefit points

Private healthcare

Training Package

Virtusity / in-house training

Access to the above perks is optional and completely voluntary for B2B contractors

Apply now

Software Engineer with ML and Data Skills

"*" indicates required fields

Full name*

Email*

Phone number*

CV/Resume*

File uploadAccepted file types: pdf, Max. file size: 5 MB.

Please submit a CV no longer than two pages.

CAPTCHA

Current recruitment process: For the purpose of recruitment, I hereby give consent as per art. 6.1.a of the GDPR to processing of my personal data (other than that listed in art. 22 [1] § 1 Labour Code) by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that my personal data will be kept for the duration of the recruitment process and as regards any potential claims, for the period of 36 months maximum, and that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any time, but this withdrawal does not make the previous processing illegal*.

(Required)

Yes

Future recruitment processes: I hereby give consent as per art. 6.1.a of the GDPR to the processing of my personal data by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow, in order to use this data in future recruitment processes. I hereby agree to possible storage of my personal data for this purpose in Virtus Lab’s database for a period of 36 months maximum. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any point, but this does not make the previous processing illegal*.

Yes

Are you interested in specific tech stack/domain/project? Let us know!

If you would like to add something?

Coordinated by

Aleksandra Grabowska

IT Talent Acquisition Specialist

Software Engineer with ML and Data Skills

B2B140 - 185 PLN NET PER HOUR

LOCATION + Remote Poland

Apply now