At Sprout Social we are building software that is made to last. Our 25,000+ customers depend on us daily to connect them with their customers, so reliability, scalability, and performance are top of mind. Our software is used by companies like the Chicago Bulls, Sony Electronics, Indiana University, Make-a-wish Foundation, Edelman, and Subaru to create stronger relationships with their customers through social media. Each day, our platform processes 10s of millions of social media messages and our APIs handle over 10 billion requests per year to deliver our software to customers in over 100 countries.

On our Infrastructure engineering team, we strive to create "Paved Roads": standard production-ready technology that all of our engineering teams can leverage to deliver value quickly. At the same time, we seek to empower product engineering teams to take on as much production ownership for their services as possible. We work to improve all aspects of engineering through automation, observability of metrics, and clear processes in order to build sustainable and fault tolerant solutions. Learning from system failures and human mistakes is part of our culture.

We do not operate as lone wolves or "10x devs." Instead, we are building diverse, collaborative teams that get the best results sustainably. Our Site Reliability Engineers work in tandem with Web, Platform, and QA Engineers to drive our product initiatives to successful outcomes.

We are looking for a creative, collaborative, highly motivated, and pragmatic engineer to help us design and build reliable, scalable, performant systems that empower engineers to rapidly and safely deliver value to our customers. If this sounds like you and you want to be on a team that has a huge impact across all of engineering, we'd love to talk with you!

Within 1 month, you will:

Experience Sprout's in-depth onboarding, covering everything from our company mission and values, hearing directly from executives and founders, to deep training on our products and the value that Sprout delivers to our customers
Make a plan with your manager to set initial priorities, align on expectations for your role, plant goalposts for your career, and learn about Sprout's approach to site reliability engineering
Interact with our production infrastructure and perform operational tasks
Collaborate with your team members and fellow developers to deliver value to our users
Receive feedback on pull request(s) and actively pair with teammates
Ramp up on our core technology stack including AWS, Chef, Terraform, and Kubernetes
Shadow a team member for an on-call rotation

Within 3 months, you will:

Complete your first end to end project, such as a new infrastructure deployment using Chef and Terraform
Gain familiarity with our platform architecture
Learn about and interact with some of our key storage technologies: MySQL, Elasticsearch, Cassandra, and Hadoop
Learn about our use of NSQ in our streaming data ingest pipeline
Use our observability tools to troubleshoot production performance or stability issues
Join our on-call rotation (don't worry we've got your back!)
Focus on code quality with meaningful test coverage
Participate in code reviews and give feedback to team members
Contribute to our team's culture of continuous improvement through retros and experimentation-oriented thinking
Proactively identify, advocate for, and make high impact improvements to reduce operational toil

Within 6 months, you will

Accidentally break something, recover, and learn from it
Help complete a impactful project that is well-baked and bug-free
Work with services in our Kubernetes platform
Write design documents, gather feedback from peers, coordinate dependencies, and be a domain owner for a new project
Form a career growth plan with your manager and begin work towards it
Interact with and maintain distributed systems
Build effective working relationships with team members across engineering through active networking, collaboration, and community building
Influence other developers and model engineering best practices
Help promote DevOps culture by working with engineers to assume operational ownership

Within 12 months, you will

Be comfortable and confident in most technical aspects of your team's core systems and services
Mentor junior engineers via pairing, design review, and code review
Continue growing your knowledge of our environment and services
Actively mitigate risk of poor quality or missed deadlines
Continually evaluate and refine your technical toolkit: teach what you learn to the team
Retire a service that is EOL and clean up artifacts
Have opportunities to contribute to in-house technical presentations and workshops that share your expertise with large groups of Sprout engineers
Have opportunities to advocate for Sprout Engineering in the software community by participating/speaking at conferences, user groups, etc.
Surprise us! Use your unique ideas and abilities to change Sprout Engineering in beneficial ways that we haven't even considered yet

Of course, what is outlined above is the ideal timeline, but things may shift based on business needs and other projects and tasks could be added at the discretion of your manager.

Summary
Company: Sprout Social
Job title: Senior Site Reliability Engineer at Sprout Social (Chicago, IL) (allows remote)
Job tags: aws, kubernetes, linux, sre, terraform