Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.
If you'd like to build the world's best deep learning cloud, join us.
What You’ll Do
- Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio
- Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations
- Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs)
- Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management
- Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance
- Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers
- Collaborate with teams to design and implement CI/CD pipelines for containerized applications
- Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts
- Stay up-to-date with the latest Kubernetes ecosystem developments and evaluate new technologies for potential adoption
- Mentor junior engineers and contribute to the development of platform engineering best practices
You
- Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes)
- Have 5+ years automating the provisioning, configuration management, and deployment of production systems
- Have 5+ years building resilient, scalable systems with Python/Go
- Have 5+ years managing and securing infrastructure at scale (2,000+ hosts)
- Possess Sound experience with Infrastructure as Code (Terraform, Ansible, etc.)
- Possess Sound knowledge of DevOps, Infrastructure, and Platform concepts
- Possess Strong development skills in Python or Golang
- Possess Strong proficiency with Linux command line and debugging tools
Nice to Have
- Experience with building complex hybrid environments (AWS and on-premise preferred)
- Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative)
- Experience with multi-cluster or multi-cloud Kubernetes deployments
- Experience in the machine learning or computer hardware industry
- Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification
- Contributions to open-source Kubernetes projects or tools
- Familiarity with GitOps principles and tools like ArgoCD or Flux
Salary Range Information
Based on market data and other factors, the salary range for this position is $153,000-$240,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
About Lambda
- We offer generous cash & equity compensation
- Investors include Gradient Ventures, Google’s AI-focused venture fund
- We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
- Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
- We have a wildly talented team of 300, and growing fast
- Health, dental, and vision coverage for you and your dependents
- Commuter/Work from home stipends for select roles
- 401k Plan with 2% company match
- Flexible Paid Time Off Plan that we all actually use
A Final Note:
You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.
Equal Opportunity Employer
Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.