Job Description
Join our DeviantArt team as a Senior DevOps Engineer and play a pivotal role in maintaining and architecting a robust infrastructure that powers one of the largest online art communities. You'll be at the forefront of ensuring our platform's high availability, performance, and security, handling over 1.5 billion monthly page views.
The DeviantArt DevOps Team is a very small remote team that performs all tasks normally inclusive of SRE/DevOps/Infrastructure Engineers, with a bit of networking, security, and database administration mixed in. We are responsible for the day-to-day management and implementation of large-scale, mission-critical production systems that run on a public cloud.
This role requires wearing a lot of hats, and is equal parts fun and challenging. In this role, you will:
- Architect and maintain a highly available infrastructure with a focus on proactive and reactive DDOS mitigation, autoscaling, self-healing, site performance, and cost optimization
- Participate in a 24/7 on-call rotation, responding swiftly to outages or performance issues, and focus on less urgent alerts during normal work hours
- Maintain and develop a developer environment and CI/CD pipelines in parity with production systems, for seamless testing and release of changes
- Automate infrastructure provisioning and management using configuration management tools, complete with tests and documentation
- Optimize and support sharded MySQL databases for efficient and reliable data handling amidst growing data reads and writes
- Regularly update system components to avoid security issues and ensure up-to-date technology
We take our work seriously, but we don’t take ourselves too seriously! We enjoy designing and building systems using open source tools and industry standards, and are in the fortunate position to be able to make decisions as a team about adopting newer technologies, and redesigning our infrastructure when appropriate.
This role is on a
fully remote and distributed team, and asynchronous communication within and across teams is crucial. To be successful in this role, a candidate will need to work flexibly, balancing server and service issues, needs from development teams, security needs, and shifting priorities in our own tasks in managing our infrastructure
Qualifications
- 5+ years of experience managing systems at scale as a DevOps Engineer, Site Reliability Engineer, or Platform Engineer
- Excellent technical analytical skills with the ability to implement DDOS mitigation, troubleshoot complex problems, analyze system bottlenecks, and implement effective solutions, from frontend through backend systems, sometimes during production degradation or outage for a high traffic site
- Exceptional command line Linux skills, with proficiency in Bash and Python for investigation of server and services issues, scripting, and automation
- In-depth knowledge of AWS services, infrastructure as code using Terraform, GitOps tools and methodologies, and container orchestration using Docker, Helm, and Kubernetes
- Experience with setup, administration, and maintenance of sharded MySQL database clusters while maintaining no downtime or data loss
- Excellent communication skills with fluent English, and the ability to collaborate effectively across teams while articulating technical concepts to non-technical stakeholders
- The ability to get up to speed on systems, make decisions, be flexible, and execute independently with attention to detail for production systems
Additional Information
Founded in 2000 and a part of Wix since 2017, DeviantArt is the largest online social network for artists and art enthusiasts. For emerging and established artists, DeviantArt is the foremost platform to exhibit, promote, and share works with an enthusiastic, art-centric community. We have over 86 million registered users worldwide, and our users -- lovingly referred to as “deviants” -- upload tens of thousands of original pieces of art every day, from painting and sculpture to digital art, pixel art, films, and anime.