DevJobs

Principal Data Engineer

Overview
Skills
  • CI/CD CI/CD
  • GCP GCP
  • Airflow Airflow
  • cloud data platforms
  • MLOps
  • batch tools
  • Dataflow
  • streaming tools
  • data modeling
  • data structures
  • DBT
  • BigQuery

Zedge is seeking a Principal Data Engineer


At Zedge, we rely on data to make decisions and solve problems. We’re looking for a highly experienced principal data engineer with strong skills in maintaining large data infrastructure.


You will maintain the systems that empower our data scientists, analysts and less technical stakeholders. You lead the creation of new, and overhaul of existing, data pipelines and tables so that they are stable, accurate and well monitored. You will challenge ‘table creep’ and push data users in the business towards a single source of truth, in a data ecosystem that is understandable and ergonomic for data scientists and analysts.


You will lead the team that makes our data infrastructure elegant, robust, stable and well-maintained. You will spar with other tech leaders on the best methods, tooling and integration.


Responsibilities:

  • Create and maintain pipeline architectures in Airflow and DBT
  • Assemble large and / or complex datasets for business requirements
  • Improve our own processes and infrastructure for scale, delivery and automation.
  • Maintain and improve our data warehouse structure so that it is fit for purpose.
  • Adjust methods, queries and techniques to suit our very large data environment.
  • Adopt best-practice coding and review processes
  • Communicate technical details and edge cases in the data to specialist and non-specialist stakeholders
  • Notice, investigate, resolve and communicate about anomalies in the data.
  • Develop and maintain brief, relevant documentation for data products


Requirements:

  • Advanced Degree in computer science, mathematics, data science or other related fields
  • Proven work experience, at least 8 years of working as a data engineer
  • Proficiency with Airflow and DBT or similar products (eg Dataflow, streaming and batch tools)
  • Advanced knowledge of data structures and data modeling
  • CI/CD pipeline and MLOPs experience
  • Experience with very large data sets (several terabytes per day) is essential
  • Experience with cloud data platforms is essential, deep experience with GCP / BigQuery is advantageous
  • Good communication and presentation skills.


GuruShots