Technical Program Manager, Evaluations

AnthropicSan Francisco, California, United States | Ca Seattle, Washington, United StatesOnsite

This job is no longer open

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role:

We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple workstreams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely with our Research, Trust & Safety, Frontier Redteaming, and Policy teams, you will drive high-priority evaluation projects to build new processes, align metrics with policy, and track measurable progress. You will help build and adapt the model evaluation program to ensure model deployments are rigorous and aligned with our commitment to responsible AI development.

The ideal candidate will have a strong technical background and experience managing cross-functional programs in AI development, ML engineering, or related fields.You’ll be joining a team of Technical Program Managers who own and drive cross-functional programs that align to the company’s top priorities. In this role, you’ll have the opportunity to make a foundational impact as you contribute the scaling of a centralized TPM function for the company. Extremely strong soft skills are paramount, as our team is front and center in driving lots of company-wide changes and top priority initiatives that require generating buy-in, balancing various opinions, and competing for attention in our rapidly scaling environment. This role is a great fit for someone who has both seen excellence at scale and operated in rapidly scaling, high-ambiguity teams and scope. We are seeking candidates with deep TPM expertise but who are comfortable acting as adaptable generalists who add value fast. We excel at maintaining a broad view of our work but diving deep into the details when necessary.

We understand business goals, translate and organize them into technical programs and projects, and drive execution. We are adept at engaging with both non-technical and technical stakeholders at all levels of the company, including executive leadership.In this role, you will have the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI benefits all of humanity. If you are passionate about responsible AI development, have a strong technical background, and thrive in a fast-paced, collaborative environment, we'd love to hear from you.

Responsibilities:

Partner with teams like Frontier Risk Evaluations, Security, and Trust & Safety to develop and implement comprehensive evaluation protocols for our latest frontier AI models
Build a single source of truth for tracking all types of model evaluations as required by our Responsible Scaling Policy, AI safety institutes, the White House, and others
Develop and maintain procedures for conducting evaluations, including designing test suites, coordinating red team exercises, and analyzing results
Create and manage dashboards and reporting systems to track model performance, safety metrics, and evaluation outcomes across different AI systems and versions
Lead cross-functional workshops to identify potential risks and edge cases for evaluation, ensuring thorough coverage of AI capabilities and limitations
Coordinate with external partners and industry standards bodies to align our evaluation practices with emerging best practices in responsible AI development
Provide detailed status reports, identifying technical risks, dependencies, and areas requiring additional support
Facilitate communication and coordination between technical workstreams and stakeholders
Continuously identify opportunities for technical process improvements and implement changes as needed
Stay up-to-date with the latest developments in AI safety, ML engineering, and related fields to ensure the program remains at the forefront of responsible AI development

You might be a good fit if you:

Have several years of experience in technical program management, with a track record of successfully delivering complex technical programs, preferably in AI development, ML engineering, or related fields
Have experience executing technical programs that require systems and engineering-level knowledge.
Have exceptionally strong interpersonal and communication skills that enable you to influence without authority, build cross-organizational support, cooperation and action around initiatives and process adoption.
Have experience prompt engineering on language models
Have experience designing and/or running evaluations on Large Language Models
Have knowledge of emerging AI governance frameworks and best practices
Have a high threshold for navigating ambiguity and are able to balance setting strategic priorities with rapid, high-quality execution.
Thrive in unstructured environments, and have a knack for bringing order to chaos.

The expected salary range for this position is:Annual Salary:$300,000—$320,000 USD

Logistics

Location-based hybrid policy:

Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

US visa sponsorship:

We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate; operations roles are especially difficult to support. But if we make you an offer, we will make every effort to get you into the United States, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification.

Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Compensation and Benefits*

Anthropic’s compensation package consists of three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements collectively to be highly competitive with market rates.Equity - For eligible roles, equity will be a major component of the total compensation. We aim to offer higher-than-average equity compensation for a company of our size, and communicate equity amounts at the time of offer issuance.US Benefits - The following benefits are for our US-based employees:

Optional equity donation matching.
Comprehensive health, dental, and vision insurance for you and all your dependents.
401(k) plan with 4% matching.
22 weeks of paid parental leave.
Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more!
Stipends for education, home office improvements, commuting, and wellness.
Fertility benefits via Carrot.
Daily lunches and snacks in our office.
Relocation support for those moving to the Bay Area.

UK Benefits - The following benefits are for our UK-based employees:

Optional equity donation matching.
Private health, dental, and vision insurance for you and your dependents.
Pension contribution (matching 4% of your salary).
21 weeks of paid parental leave.
Unlimited PTO – most staff take between 4-6 weeks each year, sometimes more!
Health cash plan.
Life insurance and income protection.
Daily lunches and snacks in our office.

* This compensation and benefits information is based on Anthropic’s good faith estimate for this position as of the date of publication and may be modified in the future. Employees based outside of the UK or US will receive a different benefits package. The level of pay within the range will depend on a variety of job-related factors, including where you place on our internal performance ladders, which is based on factors including past work experience, relevant education, and performance on our interviews or in a work trial.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time.

As such, we greatly value communication skills.The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Come work with us!

Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

This job is no longer open

Life at Anthropic

Anthropic PBC is a U.S.-based artificial intelligence (AI) startup company, founded in 2021, researching artificial intelligence as a public-benefit company to develop AI systems to “study their safety properties at the technological frontier” and use this research to deploy safe, reliable models for the public. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI’s ChatGPT and Google’s Gemini.

Thrive Here & What We Value1. Mission-driven organization focused on creating safe and beneficial AI systems2. Collaborative team working towards long-term goals of steerable, trustworthy AI3. Emphasis on impact rather than smaller puzzles4. View AI research as an empirical science with physics and biology parallels5. Values communication skills and frequent research discussions to ensure highest-impact work6. Believes in big science approach to AI research7. Collaborative group that values impact over smaller puzzles8. Emphasizes collaboration and alignment across internal teams9. Commitment to creating reliable, interpretable, and steerable AI systems10. Values representation and diverse perspectives on the team.</s>

Technical Program Manager, Evaluations

About Anthropic

About the role:

Responsibilities:

You might be a good fit if you:

Logistics

Location-based hybrid policy:

US visa sponsorship:

We encourage you to apply even if you do not believe you meet every single qualification.

Compensation and Benefits*

How we're different

Come work with us!

Life at Anthropic

Your tracker settings