DevJobs

Senior Data Engineer, Network Clustering

Overview
Skills
  • SQL SQL
  • Scala Scala
  • Python Python
  • Kafka Kafka
  • Flink Flink
  • CI/CD CI/CD
  • AWS AWS
  • GCP GCP
  • Azure Azure
  • Airflow Airflow
  • Databricks
  • Apache Spark
  • Delta Lake
  • Great Expectations
  • Iceberg
  • PySpark
  • Alation
  • Spark Structured Streaming
  • Prefect
  • Infrastructure-as-code
  • DataHub
  • Dagster
  • Collibra
We are looking for an expert Data Engineer to build and evolve the data backbone for our R&D telemetry and performance analytics ecosystem. Responsibilities include processing raw, large quantities of data from live systems at the cluster level: hardware, communication units, software, and efficiency indicators. You’ll be part of a fast paced R&D organization, where system behavior, schemas, and requirements evolve constantly. Your mission is to develop flexible, reliable, and scalable data handling pipelines that can adapt to rapid change and deliver clean, trusted data for engineers and researchers.

What You’ll Be Doing

  • Build flexible data ingestion and transformation frameworks that can easily handle evolving schemas and changing data contracts
  • Develop and maintain ETL/ELT workflows for refining, enriching, and classifying raw data into analytics-ready form
  • Collaborate with R&D, hardware, DevOps, ML engineers, data scientists and performance analysts to ensure accurate data collection from embedded systems, firmware, and performance tools
  • Automate schema detection, versioning, and validation to ensure smooth evolution of data structures over time
  • Maintain data quality and reliability standards, including tagging, metadata management, and lineage tracking
  • Enable self-service analytics by providing curated datasets, APIs, and Databricks notebooks

What We Need To See

  • B.Sc. or M.Sc. in Computer Science, Computer Engineering, or a related field
  • 5+ years of experience in data engineering, ideally in telemetry, streaming, or performance analytics domains
  • Confirmed experience with Databricks and Apache Spark (PySpark or Scala)
  • Understanding of streaming processes and their applications (e.g., Apache Kafka for ingestion, schema registry, event processing)
  • Proficiency in Python and SQL for data transformation and automation
  • Shown knowledge in schema evolution, data versioning, and data validation frameworks (e.g., Delta Lake, Great Expectations, Iceberg, or similar)
  • Experience working with cloud platforms (AWS, GCP, or Azure) — AWS preferred
  • Familiarity with data orchestration tools (Airflow, Prefect, or Dagster)
  • Experience handling time-series, telemetry, or real-time data from distributed systems

Ways To Stand Out From The Crowd

  • Exposure to hardware, firmware, or embedded telemetry environments
  • Knowledge of real-time analytics frameworks (Spark Structured Streaming, Flink, Kafka Streams)
  • Understanding of system performance metrics (latency, throughput, resource utilization)
  • Experience with data cataloging or governance tools (DataHub, Collibra, Alation)
  • Familiarity with CI/CD for data pipelines and infrastructure-as-code practices

With competitive salaries and a generous benefits package, NVIDIA is widely considered one of the technology world’s most desirable employers. Our team comprises some of the most forward-thinking and hardworking individuals in the industry. Due to unprecedented growth, our exclusive engineering teams are rapidly expanding. If you're a creative engineer with a real passion for technology, we want to hear from you.

JR2009129

Nvidia