Sprout Social

Senior Site Reliability Engineer

Sprout Social

Remote job description

At Sprout Social we are building software that is made to last. Our 25,000+ customers depend on us daily to connect them with their customers, so reliability, scalability, and performance are top of mind. Our software is used by companies like the Chicago Bulls, Sony Electronics, Indiana University, Make-a-wish Foundation, Edelman, and Subaru to create stronger relationships with their customers through social media. Each day, our platform processes 10s of millions of social media messages and our APIs handle over 10 billion requests per year to deliver our software to customers in over 100 countries.

On our Infrastructure engineering team, we strive to create "Paved Roads": standard production-ready technology that all of our engineering teams can leverage to deliver value quickly. At the same time, we seek to empower product engineering teams to take on as much production ownership for their services as possible. We work to improve all aspects of engineering through automation, observability of metrics, and clear processes in order to build sustainable and fault tolerant solutions. Learning from system failures and human mistakes is part of our culture.

We do not operate as lone wolves or "10x devs." Instead, we are building diverse, collaborative teams that get the best results sustainably. Our Site Reliability Engineers work in tandem with Web, Platform, and QA Engineers to drive our product initiatives to successful outcomes.

We are looking for a creative, collaborative, highly motivated, and pragmatic engineer to help us design and build reliable, scalable, performant systems that empower engineers to rapidly and safely deliver value to our customers. If this sounds like you and you want to be on a team that has a huge impact across all of engineering, we'd love to talk with you!

Within 1 month, you will:

  • Experience Sprout's in-depth onboarding, covering everything from our company mission and values, hearing directly from executives and founders, to deep training on our products and the value that Sprout delivers to our customers
  • Make a plan with your manager to set initial priorities, align on expectations for your role, plant goalposts for your career, and learn about Sprout's approach to site reliability engineering
  • Interact with our production infrastructure and perform operational tasks
  • Collaborate with your team members and fellow developers to deliver value to our users
  • Receive feedback on pull request(s) and actively pair with teammates
  • Ramp up on our core technology stack including AWS, Chef, Terraform, and Kubernetes
  • Shadow a team member for an on-call rotation

Within 3 months, you will:

  • Complete your first end to end project, such as a new infrastructure deployment using Chef and Terraform
  • Gain familiarity with our platform architecture
  • Learn about and interact with some of our key storage technologies: MySQL, Elasticsearch, Cassandra, and Hadoop
  • Learn about our use of NSQ in our streaming data ingest pipeline
  • Use our observability tools to troubleshoot production performance or stability issues
  • Join our on-call rotation (don't worry we've got your back!)
  • Focus on code quality with meaningful test coverage
  • Participate in code reviews and give feedback to team members
  • Contribute to our team's culture of continuous improvement through retros and experimentation-oriented thinking
  • Proactively identify, advocate for, and make high impact improvements to reduce operational toil

Within 6 months, you will

  • Accidentally break something, recover, and learn from it
  • Help complete a impactful project that is well-baked and bug-free
  • Work with services in our Kubernetes platform
  • Write design documents, gather feedback from peers, coordinate dependencies, and be a domain owner for a new project
  • Form a career growth plan with your manager and begin work towards it
  • Interact with and maintain distributed systems
  • Build effective working relationships with team members across engineering through active networking, collaboration, and community building
  • Influence other developers and model engineering best practices
  • Help promote DevOps culture by working with engineers to assume operational ownership

Within 12 months, you will

  • Be comfortable and confident in most technical aspects of your team's core systems and services
  • Mentor junior engineers via pairing, design review, and code review
  • Continue growing your knowledge of our environment and services
  • Actively mitigate risk of poor quality or missed deadlines
  • Continually evaluate and refine your technical toolkit: teach what you learn to the team
  • Retire a service that is EOL and clean up artifacts
  • Have opportunities to contribute to in-house technical presentations and workshops that share your expertise with large groups of Sprout engineers
  • Have opportunities to advocate for Sprout Engineering in the software community by participating/speaking at conferences, user groups, etc.
  • Surprise us! Use your unique ideas and abilities to change Sprout Engineering in beneficial ways that we haven't even considered yet

Of course, what is outlined above is the ideal timeline, but things may shift based on business needs and other projects and tasks could be added at the discretion of your manager.

Summary
Company: Sprout Social
Job title: Senior Site Reliability Engineer at Sprout Social (Chicago, IL) (allows remote)
Job tags: aws, kubernetes, linux, sre, terraform

Share or copy

Job alerts