- Government IT Recruiters and Staffing Specialists Site Reliability Engineer (SRE)
Government IT Jobs
- .NET Developer
- Agile Coach
- AI Engineer
- AWS Engineer
- Azure Cloud Engineer
- BI Developer
- Big Data Engineer
- Business Intelligence Developer
- Cloud Architect
- Cybersecurity Engineer
- Data Analyst
- Data Architect
- Data Engineer
- Data Scientist
- Database Administrator (DBA)
- DevOps Engineer
- DevSecOps Specialist
- Enterprise Architect
- ETL Developer
- Full Stack Engineer
- GRC Analyst
- IT Business Analyst
- IT Compliance Analyst
- IT Project Manager
- Java Engineer
- Javascript Developer
- Linux Admin
- Machine Learning Engineer
- Network Engineer
- Network Security Analyst
- Python Engineer
- QA Automation Engineer
- Salesforce Admin
- Salesforce Developer
- Scrum Master
- Security Engineer
- Senior IT Project Manager
- SharePoint Admin/Developer
- Site Reliability Engineer (SRE)
- Software Developer
- Solutions Architect
- System Administrator
- Systems Engineer
- Technical Writer
- Unix Admin
- UX/UI Developer
We are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in our organization. The responsibilities of the SRE include monitoring computer systems and building alerts for various operational issues that the systems can experience. If hired for this role, you will work with our IT team to ensure our organization can continue to deliver projects and services in our computer system environment.
Typical Duties and Responsibilities
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our software solutions
- Administer production jobs
- Measure and optimize system performance and innovate for continuous improvement
- Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding
- Provide primary operational support and engineering for multiple large-scale distributed software applications
- Roll back a bad software push
- Block or rate-limit unwanted traffic
- Use monitoring systems for alerting and dashboards
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation
- Document all of your activities
Education
- Bachelor’s degree in computer science or a related field
Required Skills and Experience
- 3+ years of experience in a technical role
- Experience as a site reliability engineer or a similar role
- Experience using one or more high-level programing languages, such as Python, Bash, Ruby, C/C++, Java, and JavaScript
- Experience with distributed storage technologies such as NFS, HDFS, Ceph, or Amazon S3
- Experience with dynamic resource management frameworks
- Experience with databases such as MySQL or PostgreSQL
- Experience with web servers such as Apache and Nginx
- Proficient in UNIX/Linux operating systems
- Knowledge of network protocols such as TCP/IP, HTTP, and DNS
- Knowledge of system design and architecture
Preferred Qualifications
- Previous success in technical engineering