Scale-out Engineer

TenstorrentOnsite

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

We're seeking a skilled AI Scale-Out Software Engineer to build and optimize our Tenstorrent scale-out fabric (TT-fabric) for distributed inference and training infrastructure. The ideal candidate will have expertise in deep learning, distributed systems, and low-level networking.This role ishybrid, based out of Santa Clara, CA; Austin, TX; or Toronto, ON.

Responsibilities:

Design, develop, and maintain TT-fabric, a low-level networking library for Tenstorrent AI processors built on top of Ethernet protocol
Design and implement efficient distributed training systems for large-scale deep learning models
Optimize network communication for multi-node AI processor clusters
Tune system performance for inference and training of key AI models
Work in the TT-Metalium team and integrate scale-out APIs into the Programming Model
Work with AI model builder and researchers to improve both the scale out infrastructure and as well as model design

Experience & Qualifications:

Bachelor's or Master’s degree in Computer Science, Electrical Engineering, or a related field.
Proven experience in low-level software development.
Strong proficiency in programming languages such as C / C++.
Experience with MPI or similar distributed computing frameworks
Experience with low-level networking libraries (e.g., libfabric, libibverbs)
Knowledge of networking protocols, especially Ethernet and InfiniBand
Knowledge of high-performance interconnects
Familiarity with RDMA programming
Familiarity with large-scale deep learning frameworks (e.g., PyTorch, TensorFlow)
Familiarity with network offload engines and SmartNICs
Strong communication skills and the ability to work effectively with cross-functional teams.
Passion for technology and a commitment to pushing the boundaries of what is possible in AI.

Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set by the U.S.

government.Our engineering positions and certain engineering support positions require access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process.If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S.

government. If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Life at Tenstorrent

At Tenstorrent, we are creating the next generation of high-performance processor ASICs, specifically engineered for deep learning and smart hardware. Our processor is designed to excel at both learning and inference, while being software-programmable to support future innovations in the field of machine learning. The processor's architecture easily scales from battery-powered IoT devices to large cloud servers, and surpasses today's solutions by several orders of magnitude in raw performance and energy efficiency. Our team, made up of alumni from hardware industry leaders like NVIDIA and AMD, is committed to providing the core hardware necessary to increase the pace of deep learning research and enable smart devices to live untethered from the power grid and the Internet. We are based in Toronto and proudly backed by Real Ventures, the Canadian VC of the Year two years running.

Thrive Here & What We Value* Innovation, collaboration, problem-solving* Competitive compensation package* Diverse team with varying seniorities* Hybrid work arrangement (Santa Clara, CA; Austin, TX)* Equal opportunity employer* Cutting-edge AI technology leadership* Passionate technologists in diverse teams* High performance RISCV CPU development

Related Sub

This job belongs to these sub. Explore related roles here:

Scale-out Engineer

Responsibilities:

Experience & Qualifications:

Life at Tenstorrent

Related Sub

Your tracker settings