• Open positions
  • Get to know us
  • FAQ
Open positions

Data Engineer (PySpark&Airflow)

Data Engineer (PySpark&Airflow)
B2B21 000 - 27 000 PLN NET
LOCATION + Hybrid Poland: Kraków
Apply now

We are #VLteam – tech enthusiasts constantly striving for growth. The team is our foundation, that’s why we care the most about the friendly atmosphere, a lot of self-development opportunities and good working conditions. Trust and autonomy are two essential qualities that drive our performance. We simply believe in the idea of ​​“measuring outcomes, not hours”. Join us & see for yourself!

About the role

Join our team to developing heavy data pipelines with cooperation with data scientists and other engineers. Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines. Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused. Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work. Championing best practices in code quality, security, and scalability by leading by example. Taking your own, informed decisions moving a business forward.

Python Advanced
Pyspark Regular
Airflow Regular
Docker Regular
Kubernetes Regular
xgboost Regular
Pandas Regular
Scikit-learn Regular
Numpy Regular
GitHub Actions Regular
Azure DevOps Regular
View available projects
Project
STORE OPS
Project Scope

The project aims at constructing, scaling and maintaining data pipelines for a simulation platform. You will be working on a solution to grant connectivity between AWS s3 and Cloudian s3. A previously completed Proof of Concept used Airflow to spin Spark job for some data extraction and to then expose the collected data via Airflow built-in XComs feature. Further work required productionization of the PoC solution, testing it on scale, or proposing an alternate solution.

As a Data Engineer in Store Ops, you will dive into projects that streamlining retail operations through the use of analytics and ML, by applying your Python, Spark, Airflow, Kubernetes skills.

Responsibilities
  • Developing heavy data pipelines with cooperation with data scientists and other engineers.
  • Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines.
  • Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused.
  • Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work.
  • Championing best practices in code quality, security, and scalability by leading by example.
  • Taking your own, informed decisions moving a business forward.
Tech Stack

Python, PySpark, Airflow, Docker, Kubernetes, Dask, xgboost, pandas, scikit-learn, numpy, GitHub Actions, Azure DevOps, Terraform, Git @ GitHub

Project Challenges
  • Enhancing the monitoring, reliability, and stability of deployed solutions, including the development of automated testing suites.
  • Productionization of new data pipeline responsible for exposing data on demand and improve the performance on production.
  • Collaborating with cross-functional teams enhancing customer experiences through innovative technologies.
Team

5 engineers

What we expect in general:

  • Hands-on experience with Python.
  • Proven experience with PySpark.
  • Proven experience with Data Manipulation libraries (Pandas, NumPy, and Scikit-learn)
  • Regular-level experience with Apache Airflow.
  • Strong background in ETL/ELT design.
  • Regular-level proficiency in Docker and Kubernetes to containerize and scale simulation platform components.
  • Ability to occasionally visit Krakow office.
  • Good command of English (B2/C1).

 

Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!

A few perks of being with us

Building tech community
Building tech community
Flexible hybrid work model
Flexible hybrid work model
Home office reimbursement
Home office reimbursement
Language lessons
Language lessons
MyBenefit points
MyBenefit points
Private healthcare
Private healthcare
Training Package
Training Package
Virtusity / in-house training
Virtusity / in-house training
And a lot more!

Apply now

Data Engineer (PySpark&Airflow)

"*" indicates required fields

Accepted file types: pdf, Max. file size: 5 MB.
Please submit a CV no longer than two pages.
Current recruitment process: For the purpose of recruitment, I hereby give consent as per art. 6.1.a of the GDPR to processing of my personal data (other than that listed in art. 22 [1] § 1 Labour Code) by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that my personal data will be kept for the duration of the recruitment process and as regards any potential claims, for the period of 36 months maximum, and that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any time, but this withdrawal does not make the previous processing illegal*.(Required)*

Current recruitment process: For the purpose of recruitment, I hereby give consent as per art. 6.1.a of the GDPR to processing of my personal data (other than that listed in art. 22 [1] § 1 Labour Code) by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that my personal data will be kept for the duration of the recruitment process and as regards any potential claims, for the period of 36 months maximum, and that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any time, but this withdrawal does not make the previous processing illegal*.

(Required)
Future recruitment processes: I hereby give consent as per art. 6.1.a of the GDPR to the processing of my personal data by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow, in order to use this data in future recruitment processes. I hereby agree to possible storage of my personal data for this purpose in Virtus Lab’s database for a period of 36 months maximum. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any point, but this does not make the previous processing illegal*.

Future recruitment processes: I hereby give consent as per art. 6.1.a of the GDPR to the processing of my personal data by Virtus Lab Sp. z o. o. (as Co-Controller for a full list of joint controllers, see Privacy Policy) with its headquarters at Szlak 49 Street, 31-153 Cracow, in order to use this data in future recruitment processes. I hereby agree to possible storage of my personal data for this purpose in Virtus Lab’s database for a period of 36 months maximum. At the same time I accept the Privacy Policy of the Data Controller. I acknowledge that I have the right to access this data or have it rectified or deleted on demand. This consent can be withdrawn at any point, but this does not make the previous processing illegal*.

Aleksandra Grabowska
Coordinated by
Aleksandra Grabowska
IT Talent Acquisition Specialist
linkedin
Data Engineer (PySpark&Airflow)
B2B21 000 - 27 000 PLN NET
LOCATION + Hybrid Poland: Kraków
Apply now
group of people gathered together
Not sure if this role is right for you?
It doesn't mean that you don't match. Tell us about yourself and let us work on it together.
Contact us
We create and engineer software
Privacy Policy