DevJobs

Senior Linux & Infrastructure IT Engineer

Overview
Skills
  • Python Python
  • Bash Bash
  • Linux Linux ꞏ 7y
  • GitHub Actions GitHub Actions
  • AWS AWS ꞏ 7y
  • GCP GCP
  • Azure Azure
  • Terraform Terraform
  • Ansible Ansible
  • Grafana Grafana
  • VPC
  • Slurm
  • S3
  • Prometheus Prometheus
  • Network performance tuning
  • LSF
  • VPN
  • Kernel
  • IAM
  • Grid Engine
  • GitLab CI
  • FSx
  • Filesystem
  • EC2
  • EBS
  • FinOps
About The Role

We are a fast-growing semiconductor startup building next-generation silicon. Our design and verification pipelines rely on large-scale Linux compute infrastructure spanning AWS and on-prem environments.

We are seeking a senior, hands-on Cloud & Infrastructure IT Engineer to own the reliability, performance, and automation of our mission-critical EDA platforms. You will work directly with chip design teams to ensure our compute environments are fast, stable, secure, and ready to scale.

Requirements:

What You’ll Do

  • Operate and scale hybrid AWS + on-prem Linux compute infrastructure for chip design and verification workloads.
  • Own day-to-day reliability, performance tuning, capacity planning, and incident response.
  • Build and maintain AWS environments using Terraform and Ansible.
  • Automate provisioning of VPCs, IAM, EC2, FSx, EBS, S3, VPNs, and security controls.
  • Tune Linux systems for CPU-, memory-, and I/O-intensive EDA workloads.
  • Operate and optimize grid / job scheduling platforms such as Slurm, LSF, or Grid Engine.
  • Design and manage high-throughput storage solutions for simulation pipelines.
  • Develop automation and self-service tooling using Python and Bash.
  • Implement observability and alerting using Prometheus and Grafana.
  • Participate in on-call rotation and lead root-cause analysis for production incidents.

Required Qualifications

  • AWS: VPC, EC2, IAM, FSx, EBS, S3, VPN, security controls
  • Infrastructure as Code: Terraform, Ansible
  • Linux / HPC: Kernel, filesystem, and network performance tuning
  • Schedulers: Slurm / LSF / Grid Engine
  • Automation: Python, Bash
  • Observability: Prometheus, Grafana
  • CI/CD: GitHub Actions / GitLab CI

Requirements

  • 7+ years of hands-on experience operating large-scale Linux infrastructure.
  • Strong experience managing AWS production environments.
  • Advanced proficiency with Terraform, Ansible, Python, and Bash.
  • Deep understanding of networking, storage, and Linux internals.
  • Comfortable owning business-critical systems in a fast-moving startup.
  • Experience supporting semiconductor / EDA / HPC workloads.

Preferred

  • Exposure to Azure or GCP.
  • Experience with cloud cost optimization / FinOps.
Retym