Site Reliability Engineer (SRE) Job Description Template

We are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in our organization. The responsibilities of the SRE include monitoring computer systems and building alerts for various operational issues that the systems can experience. If hired for this role, you will work with our IT team to ensure our organization can continue to deliver projects and services in our computer system environment. 

Typical Duties and Responsibilities

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our software solutions
  • Administer production jobs
  • Measure and optimize system performance and innovate for continuous improvement
  • Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding 
  • Provide primary operational support and engineering for multiple large-scale distributed software applications
  • Roll back a bad software push
  • Block or rate-limit unwanted traffic
  • Use monitoring systems for alerting and dashboards
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation 
  • Document all of your activities

Education

  • Bachelor’s degree in computer science or a related field 

Required Skills and Experience

  • 3+ years of experience in a technical role
  • Experience as a site reliability engineer or a similar role
  • Experience using one or more high-level programing languages, such as Python, Bash, Ruby, C/C++, Java, and JavaScript
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, or Amazon S3
  • Experience with dynamic resource management frameworks
  • Experience with databases such as MySQL or PostgreSQL
  • Experience with web servers such as Apache and Nginx
  • Proficient in UNIX/Linux operating systems
  • Knowledge of network protocols such as TCP/IP, HTTP, and DNS
  • Knowledge of system design and architecture

Preferred Qualifications

  • Previous success in technical engineering
Contact us

Recruit with Nexus IT Group