Site Reliability Engineer

RocksetLondon, United KingdomHybrid, Onsite

ABOUT ROCKSET

At Rockset, we’ve built the real-time analytics database for the world's data applications. Our team and technology come from a rich heritage, rooted in the experience of building massive scale data systems at the world’s leading companies, and we created Rockset to make those kinds of powerful data platforms available to real-time application developers everywhere. We are creating a world where developers can go from complex data sets to fast, interactive applications and analysis effortlessly.We’re a fast-growing company that values curiosity, diversity, and open-mindedness.

You will solve interesting problems, surrounded by exceptional people, while making customers happy. We work hard, but also take our personal lives and experiences seriously. Our investors include Greylock Partners and Sequoia Capital. We are headquartered in San Mateo, CA with offices in Boston, MA and London, UK.As a site reliability engineer, you will be responsible for the automation, stability, security, configuration, monitoring, alerting, and capacity planning of Rockset's network, systems, and infrastructure. You will also build tools that help the rest of the engineering team be more productive, and including the ones that Rockset engineers use to deploy and manage their services.

You will have a foundational impact on shaping the team and the systems we create. The on-call pager is shared by most of the engineering team, not just SRE.Our infrastructure is completely hosted in Amazon Web Services. We use a variety of home grown, open source, and commercial tools, including Kubernetes, Docker, Kafka, Zookeeper, Prometheus, Grafana, Salt, Terraform, Phacility, and Buildkite. We try to deploy new code to our production environment twice a week, but as an SRE you can expect to make production changes on a daily basis.You should expect to collaborate with all other engineering teams to develop solutions that meet reliability, security, and business requirements. Lastly, you will diagnose, triage, and build solutions for complex technical issues at scale.

You'd be a great fit if you are:

Passionate about distributed systems, database technologies, and highly scalable services
Poised under fire and willing to share an on-call rotation with the rest of the team
A self-starter who thrives in a fast-paced environment
Willing to learn new skills and technologies
Attentive to details and comfortable with ambiguity

It would be even more awesome if you also have:

Bachelor's or Master's degree in Computer Science or a related field, or relevant work experience
Experience as an SRE for 3+ years
Experience building and operating public-facing 24x7 web applications at scale
Experience working with cloud infrastructure and patterns (AWS preferred)
Strong programming skills in a scripted language (Python, Ruby, Bash)
Experience with Kubernetes, Mesos, Swarm, or similar container orchestration tools
Experience with Terraform, Salt, Chef, Packer, or similar configuration management tools
Experience with Grafana, Prometheus, Datadog, or similar monitoring tools

OUR COMMITMENT TO DIVERSITYWe are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.Apply for this job

Life at Rockset

Our Vision We believe that a data-driven world has the potential to make life better for everyone. Enterprises are still struggling to use complex data primarily because real-world data is messy and cannot be put to use easily. We are bridging the gap by changing the way data is stored, processed and accessed for making better, faster data-driven decisions and data powered apps. Empowering enterprises to unleash all their data is a difficult challenge that inspires us every day. Our Team Rockset's team has deep expertise in storage, data management and distributed systems. Members of our team started the Hadoop File System project back in 2006 that helped ignite the big data movement. We previously founded and led the creation of Facebook's online social graph serving engine and graph search projects - TAO, and Unicorn - that power all of Facebook's user facing and search products. Our team also helped build the original backend for Gmail at Google. On the enterprise side, members of our team have experience launching VMware's vSAN and building the industry's first nested virtualization in the cloud at Ravello. We intimately understand data, cloud and scale as well as the challenges and opportunities it creates for enterprises.

Thrive Here & What We Value1. Fastpaced environment2. Emphasis on curiosity, diversity, and open-mindedness3. Values personal lives and experiences4. Competitive salary and stock options5. Flexible schedule6. Regular companywide Hackathons7. Monthly allowance for private health insurance and pension8. Lunch provided daily in office9. 25 days vacation per year (plus public holidays)10. MacBook Pro plus one-off allowance for home office equipment

Site Reliability Engineer

You'd be a great fit if you are:

It would be even more awesome if you also have:

Life at Rockset

Your tracker settings