DevJobs

Senior Data Engineer

Overview
Skills
  • Python Python ꞏ 5y
  • Scala Scala
  • Flink Flink
  • Kafka Kafka
  • AWS AWS ꞏ 5y
  • Azure Azure
  • GCP GCP
  • Apache Spark ꞏ 5y
  • Apache Spark Streaming ꞏ 5y
  • Databricks ꞏ 5y
  • Compliance practices
  • Data governance
  • Feature engineering
  • Feature store concepts
  • Kinesis
  • ML flow systems
Responsibilities:

  • Lead the Design and Development of Data Platform Infrastructure: Design and architect software solutions with a focus on scalability, high availability, security, robustness, and performance. Build and maintain a data platform infrastructure.
  • Pipeline Development: Create and manage data pipelines using Python and Spark for efficient data processing, transformation, and integration.
  • Machine Learning Workflow Management: Implement ML workflow management systems to support model development and deployment.
  • Feature Store Implementation: Design and develop a feature store for efficient feature engineering and reuse.
  • Data Quality Assurance: Take ownership of ensuring data quality by designing and implementing data validation and quality checks.
  • Technology Evaluation and Adoption: Stay updated with the latest technologies and bring new tech into the ecosystem when needed.
  • Cross-Functional Collaboration: Work closely with product management, data scientists, additional software engineers, architects, and team leads to align data engineering efforts with business goals.
  • Automation and CI/CD: Implement automation, continuous integration, and continuous delivery practices in the data engineering processes.
  • Testing and Quality Assurance: Establish testing processes, including Test-Driven Development (TDD), to ensure the reliability of data pipelines and platform components.
  • Oversee Development Lifecycle: Understand and oversee all phases of the development life cycle, including integration, builds, and deployment.

Requirements:

  • Experience: At least 5 years of hands-on experience in data engineering, with a proven track record of building data platforms and pipelines.
  • Programming Skills: Proficiency in at least one language (e.g., Python, Scala) with a strong coding ability.
  • Big Data Frameworks: Experience with real-time streaming frameworks like Apache Spark Streaming, Apache Flink, and batch processing frameworks like Apache Spark and Databricks.
  • Cloud Platforms: Strong experience with cloud platforms such as AWS, GCP, or Azure.
  • Message Queues: Familiarity with distributed message queues like Kinesis or Kafka is a plus.
  • Teamwork and Communication: Excellent teamwork and communication skills to collaborate effectively with cross-functional teams.
  • Self-Learning: High self-learning abilities to keep up with evolving technologies and industry trends.
  • Education: Bachelor's degree in Computer Science or equivalent. Master's or PhD is an advantage.
  • Language: Strong spoken and written English skills.
  • Global Experience: Prior experience working for a global company is a plus.

Preferred Skills:

  • Experience in building and maintaining ML flow systems.
  • Understanding of feature engineering and feature store concepts.
  • Knowledge of data governance and compliance practices.