Site Reliability Engineer (SRE)

This job is no longer open

Description

The Site Reliability Engineer (SRE) will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services. Working closely with cross-functional teams, the SRE will design, implement, and maintain tools and processes to monitor, manage, and automate our infrastructure. The ideal candidate is passionate about building robust and resilient systems, with a strong focus on automation and continuous improvement.Responsibilities:1. System Monitoring and Incident Response:

Design and implement monitoring solutions to detect and mitigate system issues proactively.

Respond to alerts and incidents promptly, troubleshoot issues, and implement effective solutions to minimize downtime.

2. Infrastructure Automation:

Develop and maintain automation scripts and tools to streamline deployment, configuration, and scaling of infrastructure components.

Implement Infrastructure as Code (IaC) practices to manage and provision infrastructure resources efficiently.

3. Performance Optimization:

Identify performance bottlenecks and inefficiencies in the system and work collaboratively with development teams to optimize performance.

Conduct capacity planning and scalability assessments to ensure our systems can handle current and future demands.

4. Reliability Engineering:

Design and implement fault-tolerant and resilient architectures to ensure high availability of services.

Conduct post-mortem analysis of incidents to identify root causes and implement preventive measures.

5. Continuous Improvement:

Stay current with industry best practices and emerging technologies related to site reliability and infrastructure automation.

Drive initiatives to continuously improve the reliability, scalability, and performance of our systems.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Proven experience in a Site Reliability Engineer, DevOps Engineer, or similar role.

Proficiency in scripting and automation using languages such as Python, Bash, or PowerShell.

Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes).

Experience with configuration management tools (e.g., Ansible, Puppet, Chef) and version control systems (e.g., Git).

Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

Excellent problem-solving skills and the ability to troubleshoot complex issues in a production environment.

Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Benefits

Health, dental, vision, life, and short/long-term disability insurance
Paid vacation, holidays, and sick leave
Competitive compensation and opportunities for advancement
Retirement plan with employer contribution match
Welcoming, family-style corporate culture uniquely suited to fast-paced, entrepreneurial, and motivated individuals
One of San Antonio’s “Best Places to Work” for nine consecutive years

This job is no longer open

Life at Futurex

Thrive Here & What We Value1. Welcoming, Family-Style Corporate Culture2. Scenic Corporate Campus with Amenities3. Generous Compensation and Incentive Pay Package4. Opportunities for Growth and Rapid Advancement5. FastPaced Merit-Based Culture6. Comprehensive Benefits7. Equal Opportunity Employer8. Dynamic Team that values innovation and fosters creativity

Site Reliability Engineer (SRE)

Description

Requirements

Benefits

Life at Futurex

Your tracker settings