We're looking for a Senior MLOps Engineer to join a group that specializes in Security and Networking, in relation to ML/AI development. As a Senior MLOps Engineer, you’ll build and maintain the infrastructure, tools and processes necessary to support the machine learning and AI lifecycle in a production environment. You collaborate closely with data scientists, software engineers and devOps teams to ensure smooth deployment, modeling and optimization of AI models. This role involves creative problem solving alongside engineering teams, and is pivotal for the continued success of AI networking security.
What You’ll Be Doing
- Developing, improving and optimizing scalable infrastructure for handling and deploying security and networking AI models in production, ensuring high availability, scalability, performance.
- Designing and implementing data pipelines to efficiently process and transform large volumes of data for training and inference purposes.
- Optimizing and fine-tuning ML models for performance, scalability, and resource utilization, considering factors such as latency, efficiency, and cost.
- Collaborating closely with data scientists and software engineers to operationalize and deploy ML models, including versioning, packaging and integration with existing systems. Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews.
- Collaborating with DevOps teams to integrate pipelines and workflows into the CI/CD process, ensuring flawless deployments and rollbacks.
- Implementing and managing A/B testing frameworks.
- Building and maintaining monitoring and alerting systems to proactively identify and resolve issues relating to quality, performance and infrastructure.
- Implementing access controls, authentication mechanisms, and encryption standards for ML models and data.
- Documenting guidelines, and standard operating procedures for MLOps processes and sharing knowledge with the wider team.
- Develop proof-of-concepts for new features
What We Need To See
- BS/MSc in CS/CE or related field (or equivalent experience)
- Strong background in machine learning with a track record of deploying and maintaining models in production - at least 5 years of experience.
- Proficiency in programming languages such as Python, Java, or Scala, along with experience in using ML frameworks and libraries (e.g. TensorFlow, PyTorch).
- Proficiency in microservices architecture, container orchestration, and cloud platforms for deploying and scaling ML applications.
- Knowledge of inference optimization techniques.
- Experience with tools for data processing and storage (e.g. Apache Spark, Hadoop, SQL databases, NoSQL databases).
- Understanding of build infrastructure and CI/CD tools and practices (e.g. Jenkins)
- Detail-oriented and care deeply about robust, well tested, high-performance code in production environments.
- You are proactive, take full ownership of your deliverables, have a can-do approach, and excellent communication and collaboration skills, able to work effectively in multifunctional teams.
Ways To Stand Out From The Crowd
- Knowledge of network protocols and Linux internals
- Security and networking background, with knowledge of security protocols, network architectures, firewalls, intrusion detection systems, and other relevant security and networking concepts
- Familiarity with generative models and their serving
- Experience with vector databases, similarity search and reranking algorithms
- Knowledge of network security principles and practices
NVIDIA has some of the most forward-thinking and hardworking people on the planet working for us and, due to unprecedented growth, our special engineering teams are growing fast. If you're a creative and autonomous engineer with a genuine passion for technology, we want to hear from you.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
JR1994287