Site Reliability Engineer Ml Ops
Abacus.ai
Remote job description
Responsible for building, tuning and operating the entire infrastructure that powers Abacus.AI's multi-cloud SaaS products. We have a modern technology stack built on Kubernetes, Spark, TensorFlow, Python, Go, Mysql & Redis
Requirements:
- BS or MS from a top notch CS programs
- 2+ years professional experience in hands-on engineering roles including operating production environments in public clouds: AWS, GCP, Azure
- Strong Linux/Unix systems fundamentals
- Python programming experience in production environments
- Experience with modern cloud environments: containerization, infrastructure-as-code, devops, CI/CD pipelines and automation
Preferred:
- Operating Kubernetes clusters
- Experience with ML Ops: Spark, TensorFlow, GPUs
- Experience with Terraform
- Hands-on experience with network security, databases systems
Summary
Company name: Abacus.ai
Remote job title: Site Reliability Engineer Ml Ops
-
location or timezone
-
category
DevOps and SysAdmin -
posted
647 days ago
https://www.remote.io/remote-devops-and-sysadmin-jobs/site-reliability-engineer-ml-ops-33761