Data Engineer (PySpark&Airflow)

B2B21 000 - 27 000 PLN NET

LOCATION + Hybrid Poland: Kraków

We are #VLteam – tech enthusiasts constantly striving for growth. The team is our foundation, that’s why we care the most about the friendly atmosphere, a lot of self-development opportunities and good working conditions. Trust and autonomy are two essential qualities that drive our performance. We simply believe in the idea of “measuring outcomes, not hours”. Join us & see for yourself!

About the role

Join our team to developing heavy data pipelines with cooperation with data scientists and other engineers. Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines. Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused. Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work. Championing best practices in code quality, security, and scalability by leading by example. Taking your own, informed decisions moving a business forward.

Python Advanced

Pyspark Regular

Airflow Regular

Docker Regular

Kubernetes Regular

xgboost Regular

Pandas Regular

Scikit-learn Regular

Numpy Regular

GitHub Actions Regular

Azure DevOps Regular

Project

STORE OPS

Project Scope

The project aims at constructing, scaling and maintaining data pipelines for a simulation platform. You will be working on a solution to grant connectivity between AWS s3 and Cloudian s3. A previously completed Proof of Concept used Airflow to spin Spark job for some data extraction and to then expose the collected data via Airflow built-in XComs feature. Further work required productionization of the PoC solution, testing it on scale, or proposing an alternate solution.

As a Data Engineer in Store Ops, you will dive into projects that streamlining retail operations through the use of analytics and ML, by applying your Python, Spark, Airflow, Kubernetes skills.

Responsibilities

Developing heavy data pipelines with cooperation with data scientists and other engineers.
Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines.
Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused.
Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work.
Championing best practices in code quality, security, and scalability by leading by example.
Taking your own, informed decisions moving a business forward.

Tech Stack

Python, PySpark, Airflow, Docker, Kubernetes, Dask, xgboost, pandas, scikit-learn, numpy, GitHub Actions, Azure DevOps, Terraform, Git @ GitHub

Project Challenges

Enhancing the monitoring, reliability, and stability of deployed solutions, including the development of automated testing suites.
Productionization of new data pipeline responsible for exposing data on demand and improve the performance on production.
Collaborating with cross-functional teams enhancing customer experiences through innovative technologies.

Team

5 engineers

What we expect in general:

Hands-on experience with Python.
Proven experience with PySpark.
Proven experience with Data Manipulation libraries (Pandas, NumPy, and Scikit-learn)
Regular-level experience with Apache Airflow.
Strong background in ETL/ELT design.
Regular-level proficiency in Docker and Kubernetes to containerize and scale simulation platform components.
Ability to occasionally visit Krakow office.
Good command of English (B2/C1).

Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!

A few perks of being with us

Building tech community

Flexible hybrid work model

Home office reimbursement

Language lessons

MyBenefit points

Private healthcare

Training Package

Virtusity / in-house training

And a lot more!

Apply now

Data Engineer (PySpark&Airflow)

"*" indicates required fields

Full name*

Email*

Phone number*

CV/Resume*

File uploadAccepted file types: pdf, Max. file size: 5 MB.

Please submit a CV no longer than two pages.

CAPTCHA

Current recruitment process: For the purpose of recruitment, I hereby give consent as per art. 6.1.a of the GDPR to processing of my personal data (other than that listed in art. 22 [1] § 1 Labour Code) by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that my personal data will be kept for the duration of the recruitment process and as regards any potential claims, for the period of 36 months maximum, and that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any time, but this withdrawal does not make the previous processing illegal*.

(Required)

Yes

Future recruitment processes: I hereby give consent as per art. 6.1.a of the GDPR to the processing of my personal data by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow, in order to use this data in future recruitment processes. I hereby agree to possible storage of my personal data for this purpose in Virtus Lab’s database for a period of 36 months maximum. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any point, but this does not make the previous processing illegal*.

Yes

Are you interested in specific tech stack/domain/project? Let us know!

If you would like to add something?

Coordinated by

Aleksandra Grabowska

IT Talent Acquisition Specialist

Data Engineer (PySpark&Airflow)

B2B21 000 - 27 000 PLN NET

LOCATION + Hybrid Poland: Kraków

Apply now