Senior Site Reliability Engineer


Remote job description

What You'll Be a Part Of:

ActionIQ is a leader in the massive and fast growing category of Customer Data Platforms (CDP). Our product brings order to Customer Experience (CX) chaos. ActionIQ's CX Hub empowers everyone to be a CX champion by giving business teams the freedom to explore and take action on customer data, while helping technical teams regain control of where data lives and how it is used. We are backed by top-tier VCs Andreessen Horowitz, Sequoia Capital, and March Capital. Enterprise brands such as Autodesk, Bloomberg, Morgan Stanley, The Washington Post, Hertz, Atlassian and many more use our CX Hub to achieve growth through extraordinary customer experiences..


Careers Page

Join the AIQ SysOps Team:

As the Sr. Site Reliability Engineer, you will report into the Engineering Manager on our Infrastructure Team. The "Infra" Team at ActionIQ plays an important role in operations and software engineering to support our production environment. We handle incident management, 24/7 on call rotation, and troubleshooting of AIQs SaaS environment. As the lead member of the team, you will be the bridge that allows for seamless collaboration between SysOps, Engineering and Professional Services. Your impact will be felt throughout the company, visible both internally and externally. As a critical team in a startup, your input on improving the tools and processes the team uses will be valued

One Year From Now You Will Have:

  • Modified existing systems to detect and report symptoms in addition to disruptions. We need to be aware of potential problems.
  • Designed and maintain monitoring, log centralization, and alerting for all services to facilitate observability and incident management
  • Used log analysis - troubleshoot performance problems and system outages
  • Partnered with the product development teams to design and enhance software architecture to improve scalability, service reliability, cost, and performance
  • Worked with engineers to re-architect and rebuild core services their teams rely on to be more efficient and cost effective.

How You'll Contribute:

  • Standardize our approach to observability so it is easy for a developer to do the right thing.
  • Manage low and mid-level severity incidents; escalate high severity incidents to resolution team as appropriate, ensuring each incident has a incident declared and JIRA ticket assigned.
  • Introduce SLOs, Error Budget and actionable alerts such as auto scaling, self healing, etc.
  • You will work with your team to monitor and ensure the health of the platform, which includes a 24/7 on-call rotation, to ensure a great customer experience.
  • Monitor system health, latency, and availability to maintain services after they are in production.
  • Provide assistance with faultless post-mortems and troubleshoot priority incidents.

What You Bring:

  • 5+ years' experience in a Site Reliability Engineering (SRE) role / Software Engineering Role.
  • Experience mentoring and leading a team in a high-growth organization.
  • Experience with SRE topics like SLOs, Error Budgets, resiliency, auto-scaling, self-healing, performance, and more.
  • Experience in one or more of the following: node.js, Java, Linux, Python, Go, PHP, or Scala.
  • Understanding of Monitoring & Alerting tools (Datadog, Pagerduty, Alert Manager, etc).

Our work is broad and complex in nature - please don't rule yourself out if you do not meet every requirement.

Compensation and Benefits:

  • Our compensation package includes base salary, stock options, and the great benefits shown below.
    • The salary range for this role is: $215,460 - 239,400.
  • Enjoy leading Medical, Dental and Vision benefits, 401k, FSA, Commuter Benefits, Gym Reimbursement, flexible PTO and 12-weeks paid parental leave
  • Get to work with a fun, inclusive, and smart team of people as we build a New York City based enterprise software company.
  • We have a beautiful office in NYC right on Madison Square Park, and local employees come into the office on a hybrid schedule, three days a week (M, W, Th) #LI-Hybrid.
  • Office perks include catered lunches, a stocked kitchen with beverages and snacks, and monthly social hours.
  • Learn more about the next chapter for us, our customers and the future of customer experience here.
  • To find out more about our people and Life At AIQ, be sure to visit our Medium Tech and Life blogs.

ActionIQ is committed to building an inclusive, equitable, and diverse organization. We embrace equal opportunities for all applicants and want to foster a culture of belonging for our employees. We recognize and appreciate that the more inclusive we are, the better we will function as a team. AIQ welcomes applicants of any race, color, ancestry, religion, sex, national origin, gender identity, gender expression, age, marital or family status, disability, military veteran status, and any other status or background.

Company name: ActionIQ
Remote job title: Senior Site Reliability Engineer
Job tags: Golang, Node.js, Java

Share or copy

Job alerts