logo inner

Chaos Engineer

SolaceOttawa, Ontario, CanadaHybrid, Onsite

Harnessing the Power of Data, Together.
The world’s leading enterprises are using Solace’s event streaming and management platform to transform their organizations by harnessing the power of events.The more quickly an enterprise can get information about events to where it needs to be, the more effectively a business can react to opportunities and improve the customer experience. That’s where an event broker (modern messaging-oriented-middleware) comes in.Help Us, Help Them, Help You.By joining our first-class team, you will be helping leading enterprises, including common household brands we all know and love, reach their full potential in this real-time, digital world.The next time you drive a luxury vehicle, do some online banking, fly on a plane, or order some furniture online, you could be getting a better experience as a direct result of our technology, and your hard work.

Wouldn’t that be great!?In This Role, You Will

  • Design and implement chaos engineering experiments to simulate failures and disruptions in a MaaS platform to ensure system resilience.
  • Identify and assess potential weak points and vulnerabilities in the platform's architecture by running stress and fault tolerance tests.
  • Collaborate with cross-functional teams (dev, ops, product, etc.) to understand the platform's requirements and ensure proper QA testing protocols are followed.
  • Automate chaos experiments using tools and frameworks (e.g., Chaos Monkey, Gremlin) to continuously inject failures and simulate real-world conditions.
  • Monitor platform performance during chaos tests and identify any performance degradation, ensuring that services meet predefined service level objectives (SLOs).
  • Develop test scenarios that mimic real-life user behavior, environmental factors, and edge cases to simulate complex failure conditions.
  • Analyze the results of chaos engineering tests, identify system weaknesses, and provide actionable feedback to development teams to improve platform resilience.
  • Conduct post-incident analysis and write detailed reports on chaos experiments, including root cause analysis, recovery processes, and suggestions for improvement.
  • Ensure adherence to best practices for testing high availability, scalability, and disaster recovery features of the MaaS platform.
  • Collaborate on incident response and recovery testing to validate system recovery mechanisms in case of critical failures.
  • Continuously improve the platform's overall reliability by designing new test scenarios, improving automation scripts, and optimizing existing workflows.
  • Integrate feedback loops from chaos experiments into the continuous integration/continuous deployment (CI/CD) pipeline to validate fixes in real-time.
  • Maintain a proactive mindset in identifying potential risks before they occur, minimizing downtime, and ensuring user satisfaction across MaaS services.

What You’ll Bring to the Role

  • B.S. degree or higher in Computer Science, Engineering, or a related field
  • At least three years of experience in a technical role related to chaos engineering, cloud-native environments or distributed systems
  • Strong knowledge of chaos engineering principles and experience designing and running chaos experiments to improve system reliability.
  • Proficiency with tools such as Gremlin, Chaos Monkey or other chaos engineering frameworks.
  • Experience with cloud platforms (AWS, Azure, GCP) and infrastructure-as-code tools (e.g. Terraform) for managing and testing distributed systems.
  • Expertise in software testing methodologies, including load testing, stress testing, and resilience testing, with a focus on high-availability systems.
  • Hands-on experience with CI/CD pipelines, integrating chaos engineering tests into automated workflows for continuous testing and feedback.
  • Solid understanding of microservices architectures and distributed systems, with an ability to simulate real-world failure scenarios in complex environments.
  • Familiarity with monitoring and observability tools like Datadog to assess system performance and visualize the impact of chaos experiments.
  • Excellent problem-solving skills to quickly identify issues, determine root causes, and develop solutions to enhance system robustness.
  • Strong collaboration and communication skills, with the ability to work cross-functionally with developers, SREs, and product teams to drive reliability improvements.
  • Ability to analyze large datasets and logs generated during chaos experiments to derive insights, identify patterns, and recommend improvements.
  • Experience in writing comprehensive test plans, test cases, and reports, providing clear and actionable feedback for development and operations teams.
  • A proactive and creative mindset, with a passion for identifying hidden risks and continuously improving platform reliability in a fast-paced, dynamic environment.
  • Experience in performance tuning and optimization of distributed systems, focusing on scalability and fault tolerance.
  • Adaptability to change, thriving in a fast-evolving technical environment and staying up-to-date with industry best practices and new tools in chaos engineering and QA.

Why You’ll Want to Join Us at Solace

  • We have an awesome team! You’ll get to work with some of the smartest individuals in the business
  • We believe in work-life balance and believe it’s important to love what you do
  • We have adopted a hybrid work model to create an inclusive working environment for everyone
  • Our training programs are top-notch (LinkedIn Learning, Mentorship program, Solace Academy) · We like to brag about our stellar customer lineup!
  • We are social – we like to keep things simple and fun!
  • We are one of the top-ranked employers on Glassdoor · We have a sense of humour and make cool videos on cool topics like MITT and this!

Not sure you meet all the requirements? We still want to hear from you — we know experience comes in all forms, so don't let that hold you back from applying!We believe that diversity in all of its forms drives innovation and growth, both in business and in life. This is why we strive to create an enriching and safe workplace where you can be who you are. It is only because of you that we can be us.If you want to do the best work of your career and feel supported every step of the way, we encourage you to join us.We thank all candidates for their interest, however, only those selected to continue in the selection process will be contacted.

Solace welcomes and encourages applications from people with disabilities.Accommodations are available on request for candidates taking part in all aspects of the selection process.

Life at Solace

Thrive Here & What We Value1. Solace stands at the forefront of technology and innovation2. Cutting edge technology intelligently solves use cases like hybrid cloud integration, IoT connectivity, microservices, and big data distribution3. Worklife balance and love for what you do4. Hybrid work model to create an inclusive working environment for everyone5. Values: craftsmanship, trust, courage, freedom, momentum, humility, and human experience6. Topnotch training programs (LinkedIn Learning, Mentorship program, Solace Academy)7. We are one of the top-ranked employers on Glassdoor8. Social and fun work environment
Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025