DevJobs

DevOps Engineer

Overview
Skills
  • Shell Shell ꞏ 3y
  • TensorFlow TensorFlow ꞏ 3y
  • PyTorch PyTorch ꞏ 3y
  • ML ML ꞏ 3y
  • Keras Keras ꞏ 3y
  • Jenkins Jenkins ꞏ 3y
  • CI/CD CI/CD
  • AWS AWS
  • Azure Azure
  • GCP GCP
  • Kubernetes Kubernetes ꞏ 3y
  • Docker Docker ꞏ 3y
  • Ansible Ansible ꞏ 3y
  • Puppet Puppet ꞏ 3y
  • Chef Chef ꞏ 3y
  • AI ꞏ 3y
  • Systems performance analysis ꞏ 3y
  • DL frameworks ꞏ 3y
  • Containers automation ꞏ 3y
  • Automated installation ꞏ 3y
  • Monitoring methods ꞏ 3y
  • Mxnet ꞏ 3y
  • GPU technologies ꞏ 3y
  • Orchestration ꞏ 3y
  • Patches updates ꞏ 3y
  • Benchmarking ꞏ 3y
  • OrangeFS
  • Torque
  • Slurm
  • SGE
  • BeeGFS
  • PBS
  • Lustre
  • LSF
  • GPFS
  • Infiniband technology
  • HPC schedulers

The Weizmann Institute is looking for a DevOps position in the High Performance Computing Section



We are seeking to recruit a senior, highly motivated DevOps professional with at least 3 years’ experience in the GPU/AI operations fields, to play a key role in HPC/AI/Hybrid cloud systems operation and evaluation of new technologies to support frontier research activities of WIS scientists

.This individual will be part of a group that design & build HPC/AI/Cloud solutions, ensuring that upgrades and changes comply with product/projects management guidelines

.He/she will work under the head of HPC section supervising for the planning and development of a robust and scalable infrastructure for AI/ML/DL workloads, DL/ML frameworks integration and application profiling, researchers support


Requirements:

* B.A./B.Sc in information technology or equivalent academic degree.

* Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras.

* Experience supporting centralized systems, at the core of the data center.

* Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads.

* Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible).

* Containers automation and orchestration (experience with Dockers, Kubernetes).

* Service/Customer oriented attitude.

* Strong troubleshooting skills.

* Strong interpersonal and communication skills.

* Ability to work as a team player.

* Proactive and solution-oriented problem solver.



Desired skills:

* Experience working with public cloud service providers – AWS, GCP, Azure.

* M.Sc degree in information technology is an advantage.

* Experience with any of below HPC schedulers (Slurm, SGE, Torque/PBS, LSF or alike).

* Experience with CI/CD in complex distributed systems.

* Documenting system administration procedures for routine and complex tasks.

* Knowledge of storage operation – parallel filesystem performance oriented (GPFS, Lustre, OrangeFS, BeeGFS)

* Experience with Infiniband technology.

* B.A./B.Sc in information technology or equivalent academic degre.

* Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras.

* Experience supporting centralized systems, at the core of the data center.

* Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads.

* Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible).

* Containers automation and orchestration (experience with Dockers, Kubernetes).

* Service/Customer oriented attitude.

* Strong troubleshooting skills.

* Strong interpersonal and communication skills.

* Ability to work as a team player.

* Proactive and solution-oriented problem solver.


Send your CV to [email protected] email with job number 62785

Weizmann Institute of Science