As a Site Reliability Engineer, your primary responsibility is to ensure that our cloud infrastructure can scale sustainably and securely through the use of automation. You will be maintaining services once they are live by measuring and monitoring availability, latency, and overall system health.
We use a serverless technology stack rooted in Cloud Dataflow, Cloud Functions, App Engine, Pub/Sub, IoT Core with plans to expand into GKE. We have a growing multi-tenant infrastructure, all managed by Terraform, Jenkins and other tooling. As a Tausight SRE, you will collaborate with engineering to implement changes that improve reliability and velocity for these technologies.

One of your main goals will be to minimize toil, or the amount of manual effort required to sustain production infrastructure. By managing load and canarying releases, you'll work to make engineering throughout Tausight more efficient, productive and secure.

What you will do:
· Site Reliability Engineering role with additional focus on resiliency and testing.
· You will be responsible for implementing tooling that will enable us to continue to scale our operations, using container orchestration and other technologies like Kubernetes.
· Help evolve and streamline CI/CD systems using all facets of the Tausight stack ranging from our build servers to our backend services.
· Advocate for best practices around code quality, design and engineering productivity, including Source Control Management.
· Enhance and add to our existing Stackdriver monitoring and alerting suite to ensure security and performance, leveraging tools such as Grafana or Prometheus.
· Derive disaster recovery best practices.

What you need:
· 2+ years of experience in DevOps Engineering/SRE/Systems Administration.
· Sound knowledge of provisioning, securing, monitoring and load balancing resources on a cloud platform such as GCP, AWS, or Azure.
· Familiarity with an interpreted Object-oriented language, such as Golang or Python.
· Proficiency with a shell scripting language, such as Bash.
· Familiarity with cloud transforming tools such as Terraform, CloudFormation or Ansible.
· Experience with source control management using Git, and provisioning a CI system such as Jenkins, Gitlab, or CircleCI.
· Exposure to Kubernetes, ECS or other container orchestration tools.
· Analytical approach coupled with strong communication skills
· Shares the values of ownership, diversity of thought, and empathy for our users and coworkers

Nice to Have:
1. Previous experience managing Windows Clusters with VMWare ESXi
2. Previous experience with either SQL or NoSQL Database administration
3. Exposure to automated testing frameworks, such as Selenium.
4. Previous experience with machine learning

Summary
Company: Tausight
Job title: Site Reliability Engineer at Tausight () (allows remote)
Job tags: jenkins, ci/cd, python, cloudformation, serverless