Abacus.ai

Site Reliability Engineer Ml Ops

Abacus.ai

Remote job description

Responsible for building, tuning and operating the entire infrastructure that powers Abacus.AI's multi-cloud SaaS products. We have a modern technology stack built on Kubernetes, Spark, TensorFlow, Python, Go, Mysql & Redis

Requirements:

  • BS or MS from a top notch CS programs
  • 2+ years professional experience in hands-on engineering roles including operating production environments in public clouds: AWS, GCP, Azure
  • Strong Linux/Unix systems fundamentals
  • Python programming experience in production environments
  • Experience with modern cloud environments: containerization, infrastructure-as-code, devops, CI/CD pipelines and automation

Preferred:

  • Operating Kubernetes clusters
  • Experience with ML Ops: Spark, TensorFlow, GPUs
  • Experience with Terraform
  • Hands-on experience with network security, databases systems



Summary
Company name: Abacus.ai
Remote job title: Site Reliability Engineer Ml Ops

Share or copy

Job alerts