[Remote] Sr. Staff Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Obsidian Security is a company focused on securing SaaS applications for enterprises. As a Sr. Staff Site Reliability Engineer, you will define the company's reliability vision and partner with DevOps to ensure system issues are detected and resolved before impacting customers.
Responsibilities
- Define and lead long-term reliability strategy across services
- Establish end-to-end system visibility frameworks and guide architecture for observability, detection, and resilience
- Partner across teams to embed reliability, standardize SLI/SLOs, and serve as a technical escalation expert
- Build intelligent detection systems (anomaly detection, connector health models) and enable self-service observability
- Define and evolve a tiered incident communication strategy, improve response practices, and lead postmortems to strengthen reliability and customer trust
- Contribute hands-on to system design, monitoring, and debugging across distributed systems and data pipelines
Skills
- 5+ years in SRE, Production Engineering, or related roles
- 3+ years operating at a senior or technical leadership level (Staff or equivalent scope)
- Deep expertise in: AWS and/or GCP
- Deep expertise in: Kubernetes and Helm
- Deep expertise in: Observability stacks (Prometheus, Grafana, or equivalent)
- Deep expertise in: CI/CD systems (GitLab CI/CD, ArgoCD, etc.)
- Proven experience designing and scaling reliability systems for multi-tenant SaaS platforms
- Strong debugging and systems thinking across distributed microservices and legacy systems
- Demonstrated ability to lead initiatives that improve incident detection, response, and system resilience
- Hands-on engineering approach with a track record of building—not just configuring—reliability systems
- Experience in B2B SaaS serving enterprise or financial customers
- Familiarity with third-party SaaS connector architectures and ingestion patterns
- Experience building anomaly detection or intelligent alerting systems
- Experience designing customer-facing status pages and incident communication frameworks
Benefits
- Competitive compensation with equity and 401k
- Comprehensive healthcare with dental and vision coverage
- Flexible paid time off and paid holiday time off
- 12 weeks of new parent or family leave
- Personal and professional development resources
- In addition to a competitive base salary, this position is eligible for equity awards and may be eligible for sales commission or incentive compensation based on the role or function within the company
Company Overview