
חדש באתר! העלו קורות חיים אנונימיים לאתר ואפשרו למעסיקים לפנות אליכם!
The Weizmann Institute is looking for a DevOps position in the High Performance Computing Section
We are seeking to recruit a senior, highly motivated DevOps professional with at least 3 years’ experience in the GPU/AI operations fields, to play a key role in HPC/AI/Hybrid cloud systems operation and evaluation of new technologies to support frontier research activities of WIS scientists
.This individual will be part of a group that design & build HPC/AI/Cloud solutions, ensuring that upgrades and changes comply with product/projects management guidelines
.He/she will work under the head of HPC section supervising for the planning and development of a robust and scalable infrastructure for AI/ML/DL workloads, DL/ML frameworks integration and application profiling, researchers support
Requirements:
* B.A./B.Sc in information technology or equivalent academic degree.
* Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras.
* Experience supporting centralized systems, at the core of the data center.
* Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads.
* Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible).
* Containers automation and orchestration (experience with Dockers, Kubernetes).
* Service/Customer oriented attitude.
* Strong troubleshooting skills.
* Strong interpersonal and communication skills.
* Ability to work as a team player.
* Proactive and solution-oriented problem solver.
Desired skills:
* Experience working with public cloud service providers – AWS, GCP, Azure.
* M.Sc degree in information technology is an advantage.
* Experience with any of below HPC schedulers (Slurm, SGE, Torque/PBS, LSF or alike).
* Experience with CI/CD in complex distributed systems.
* Documenting system administration procedures for routine and complex tasks.
* Knowledge of storage operation – parallel filesystem performance oriented (GPFS, Lustre, OrangeFS, BeeGFS)
* Experience with Infiniband technology.
* B.A./B.Sc in information technology or equivalent academic degre.
* Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras.
* Experience supporting centralized systems, at the core of the data center.
* Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads.
* Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible).
* Containers automation and orchestration (experience with Dockers, Kubernetes).
* Service/Customer oriented attitude.
* Strong troubleshooting skills.
* Strong interpersonal and communication skills.
* Ability to work as a team player.
* Proactive and solution-oriented problem solver.
Send your CV to [email protected] email with job number 62785