Senior Performance Engineer - OpenShift AI - Red Hat - Raanana

Senior Performance Engineer - OpenShift AI

No longer accepting applications

Overview

Job TypeOn-site

Experience5 years

Job PositionAI/ML

UpdatedAug 30, 2023

LocationRaanana

SalaryN/A

About The Job

The Red Hat Performance and Scale Engineering team is looking for a Senior Performance Engineer to join us in the PSAP - Performance and Scale for AI Platforms team. As recent advances in AI technologies have taken the world by storm, IBM and Red Hat are also jointly engineering an enterprise grade platform for leveraging the full potential of generative AI technologies. As part of this team, you will be responsible for the performance and scalability assessments of large scale multi-node, multi-GPU distributed training jobs. Our goal is to make OpenShift AI the platform of choice for our customers when leveraging generative AI technologies. You will help us achieve those goals through targeted improvements in the performance and scalability of the platform for large scale distributed training.

You will be required to formulate and execute performance test plans, investigate linux, OpenShift, cloud infrastructure, and OpenShift AI performance tuning knobs, triage and potentially fix performance issues, create new benchmarking tests and tools as needed, and socialize performance results on a regular basis. This role needs an engineer that thinks creatively, adapts to rapid change, and has the willingness to learn and apply new technologies. You will be joining a vibrant open source culture, and helping promote performance and innovation in this Red Hat engineering team.

The border mission of the Performance and Scale team is to establish performance and scale leadership of the Red Hat product and cloud services portfolio. The scope includes component level, system and solution analysis and targeted enhancements. The team collaborates with engineering, product management, product marketing and customer support as well as hardware and software partners.

What You Will Do

Execute performance and scalability benchmarks against OpenShift AI with a targeted focus on large scale multi-node, multi-GPU distributed training jobs
Collaborate with Development teams to resolve performance issues
Triage, debug, and solve customer cases related to AI performance
Publish results, conclusions, recommendations and best practices via documents and blogs to the support team, partners and customers.
Participate in internal and external conferences about your work and results

What You Will Bring

5+ year of relevant technical experience
Experience in running performance tests, data capture, data analysis, and visualization
Programming experience in Python or willingness to learn
Experience working with the Linux operating system (RHEL, Fedora or CentOS preferred)
Experience with AI/ML technologies and frameworks (classifiers, pytorch, tensorflow etc)
Good written and verbal language skills in English

Following is considered a plus

Bachelor’s degree or equivalent experience
Experience with container technologies (podman, Kubernetes, docker)
Experience with systems performance engineering and metrics collection tools such as iostat, vmstat, sar, perf, and prometheus.
Knowledge of AI/ML benchmarking suites such as MLperf
Knowledge of generative AI (such as transformers) and distributed training technologies (such as Ray)

LI-MM3

Red Hat

Your Account

Your Account

Senior Performance Engineer - OpenShift AI

Overview

Similar jobs

Senior AI&ML Researcher

Senior Data Scientist

AI Applied Scientist – Insights & Recommendations

Algorithm Engineer

Software Development Engineer - Shopping Personalization AI, Amazon Stores

AI Research Scientist

Algorithm Developer in MicroTech - 769

AI Engineer (GenAI & Integration)