Lead the Design and Development of Data Platform Infrastructure: Design and architect software solutions with a focus on scalability, high availability, security, robustness, and performance. Build and maintain a data platform infrastructure.
Pipeline Development: Create and manage data pipelines using Python and Spark for efficient data processing, transformation, and integration.
Machine Learning Workflow Management: Implement ML workflow management systems to support model development and deployment.
Feature Store Implementation: Design and develop a feature store for efficient feature engineering and reuse.
Data Quality Assurance: Take ownership of ensuring data quality by designing and implementing data validation and quality checks.
Technology Evaluation and Adoption: Stay updated with the latest technologies and bring new tech into the ecosystem when needed.
Cross-Functional Collaboration: Work closely with product management, data scientists, additional software engineers, architects, and team leads to align data engineering efforts with business goals.
Automation and CI/CD: Implement automation, continuous integration, and continuous delivery practices in the data engineering processes.
Testing and Quality Assurance: Establish testing processes, including Test-Driven Development (TDD), to ensure the reliability of data pipelines and platform components.
Oversee Development Lifecycle: Understand and oversee all phases of the development life cycle, including integration, builds, and deployment.
Requirements:
Experience: At least 5 years of hands-on experience in data engineering, with a proven track record of building data platforms and pipelines.
Programming Skills: Proficiency in at least one language (e.g., Python, Scala) with a strong coding ability.
Big Data Frameworks: Experience with real-time streaming frameworks like Apache Spark Streaming, Apache Flink, and batch processing frameworks like Apache Spark and Databricks.
Cloud Platforms: Strong experience with cloud platforms such as AWS, GCP, or Azure.
Message Queues: Familiarity with distributed message queues like Kinesis or Kafka is a plus.
Teamwork and Communication: Excellent teamwork and communication skills to collaborate effectively with cross-functional teams.
Self-Learning: High self-learning abilities to keep up with evolving technologies and industry trends.
Education: Bachelor's degree in Computer Science or equivalent. Master's or PhD is an advantage.
Language: Strong spoken and written English skills.
Global Experience: Prior experience working for a global company is a plus.
Preferred Skills:
Experience in building and maintaining ML flow systems.
Understanding of feature engineering and feature store concepts.
Knowledge of data governance and compliance practices.