Founded in 2017, Coalition combines cybersecurity and insurance to help organizations prevent digital risk before it strikes. Coalition's Active Insurance policies combine traditional coverage with a digital risk assessment and monitoring technology to help small and medium-sized businesses protect themselves in today's hyper-connected world.

The team at Coalition is made up of cybersecurity and technology experts, as well insurance industry veterans. Our secret sauce is bringing this expertise together to create a world-class organization with a massive technological advantage. Coalition is also backed by leading global insurers like Allianz, Arch Insurance, Lloyd's of London, Swiss Re and Zurich North America, Today, Coalition is one of the world's largest commercial insurtech serving over 160,000 customers.

In June 2022, Coalition closed an additional $250 million in Series F Funding to accelerate its rapid growth at a time when many other companies struggled to find funding. This latest funding validated that Coalition is building a long-term business that can deliver profitable growth with a clear strategic advantage.

Coalition has experienced tremendous growth by helping organizations of all sizes solve real-world problems and by remaining true to our founding values of character, humility, responsibility, purpose, authenticity and inclusion. We are proud to have been named among Inc.'s Best Workplaces of 2021 and one of Fast Company's most innovative companies for 2022.

About The Role

We are looking for a Senior Site Reliability Engineer (Remote) who has the experience, aptitude, and mental fortitude to instrument and monitor the breadth of our full platform stack (hosts, applications, and performance). In this role you will work closely with our engineering and information security teams to enhance the automated system provisioning and deployment subsystems within codified infrastructure. You will work with developers to create more robust and scalable services proactive of cloud implementations. You will help to isolate, trap, and answer from the inevitability of system failure and develop strategies for continuous monitoring and assessment to reduce both downtime and required manual intervention. You will participate in On-Call rotation to maintain platform SLAs.

Our core platform is written mostly in Python with some services in Java and Go. We prefer to use the right tool for the job and make pragmatic judgements about how to scale and decouple systems as we continue to grow. We're looking for someone who can navigate a cloud environment (AWS) with many moving pieces and systems to help the team identify how they fit into the broader puzzle.

Requirements

5+ years of combined experience in SRE/DevOps or Software Development roles in a full stack engineering environment
Experience soliciting systems requirements, designing, and implementing new platform components leveraging infrastructure or SaaS services
Must have experience with a customer facing production environment using containerization and orchestration tools such as ECS, Kubernetes, or Swarm
Experience working with fault tolerance services and the iterative development of highly-available systems
Experience with running a production environment in one or more Infrastructure as a Service cloud providers (AWS/Azure/DigitalOcean/Google Cloud)
Solid development experience in Python and GO for bot scripting and product development purposes or other scripting and systems languages
Some knowledge of software engineering design patterns, agile development, and architecture fundamentals
Prior experience with full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
Knowledge of CI/CD pipelines to accelerate deployments and improve both security and auditability (e.g. Jenkins, Travis, or CircleCI)
Excellent organizational, verbal, and written communication experience
Mentor junior engineers in SRE top practices and software engineering
Experience working in an agile methodology development lifecycle
Bachelor's or Master's degree in Computer Science, related field, or equivalent experience

Bonus Points

Experience with converting monolithic applications to microservices and service discovery technology
Experience automating system provisioning, configuration, and Infrastructure as Code (Cloudformation, Terraform, Ansible, etc)
Exposure to systems security requirements, information assurance techniques, and system hardening
Exposure to Kafka, AMQP, Kinesis, job queue and other pub/sub queuing systems

Why Coalition?

We're a highly fulfilling, mission-driven team who is committed to building a more diverse and inclusive culture. We want to work with people of all different backgrounds and paths in life, and we trust our team members to take responsibility, share ownership and put in the work, no matter how small the task. We are always looking for collaborative, inquisitive and dedicated individuals to join #OurCoalition and help us on our mission to solve digital risk.

To learn more, check out our featured press releases:

Coalition is proud to be an Equal Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

Summary
Company name: Coalition
Remote job title: Senior Site Reliability Engineer