About the team:
Our vision is to Empower R&D to build faster, secure, reliable, scalable, and reusable products.
Core Infrastructure owns all of the infrastructure at Sift. We are responsible for managing and maintaining the foundational systems that support the company's engineering efforts. This involves system ownership of core data, developer infrastructure, and platform components. We manage various tools and services such as Bigtable, Spanner, Data Infrastructure, Bosun, GKE clusters, Build processes, secrets management, IaaS tools, Load Balancers, Grafana dashboards, and log management. Additionally, we oversee infrastructure provisioning, access management, and on-call rotations to address urgent issues.
As a team we collaborate closely with other engineering teams to identify and prioritize dependencies, ensuring smooth project execution and delivery
What we’re looking for:
Managers who are inspiring, creative, flexible and can identify and bring focus to What’s Important Now (while still keeping an eye on the long term) will excel in our organization and make the biggest impact. Bring your toolbox of effective and varied communication skills and techniques.You value collaboration, transparency and have a Get Stuff Done mindset. As a leader of teams, you understand the importance of these aspects of engineering success and lead by example.You strengthen a culture of mentorship, give regular constructive feedback, set goals and understand how to motivate your staff. You actively invest in the career development of your team and grow engineering groups by hiring effectively.
What you’ll do:
Team Leadership: Leading and mentoring a team of global distributed infrastructure engineers, fostering a culture of reliability and continuous improvement.
System Reliability: Ensuring the availability and performance of core systems through proactive monitoring, incident response, and root cause analysis.
Automation and Efficiency: Driving automation efforts to reduce manual intervention, improve deployment processes, and enhance system efficiency.
Capacity Planning: Managing capacity and performance planning to anticipate and handle growth and demand.
Incident Management: Leading incident response efforts, coordinating with stakeholders, and implementing post-incident reviews to prevent recurrence.
Technical stack:
GCP, AWS, Terraform, Kubernetes, Vault, Jenkins, Kafka, Snowflake, Spark, Flink, Java 11, Python 3, Ruby 2.7, Ruby on Rails.
What would make you a strong fit:
You have a deep understanding of large-scale computing and approach infrastructure as code. You're passionate about building immutable infrastructure and resilient, multi-AZ/multi-region systems that can withstand failures. While you recognize the importance of monitoring and alerting, your ultimate goal is to design self-healing systems. Collaboration is key to you, and you strive to act as a force multiplier by making thoughtful trade-offs to drive success.
3+ years of experience managing infrastructure teams and working on critical infrastructure. With 10+ years of overall experience.
Experience building and managing cloud infrastructure on GCP or AWS.
Expertise in building infrastructure as code and automating provisioning processes using tools like CloudFormation or Terraform.
Proven expertise in automation and a solid understanding of configuration management tools.
We are looking for someone who genuinely cares about their team, is exited about infrastructure problems and focuses on execution.
A little about us:
Sift is the AI-powered fraud platform securing digital trust for leading global businesses. Our deep investments in machine learning and user identity, a data network scoring 1 trillion events per year, and a commitment to long-term customer success empower more than 700 customers to grow fearlessly. Brands including DoorDash, Yelp, and Poshmark rely on Sift to unlock growth and deliver seamless consumer experiences. Visit us at sift.com and follow us on LinkedIn.