[Remote] Dev Ops Engineer
Note: The job is a remote job and is open to candidates in USA. Vehlo is a provider of repair shop technology that enhances vehicle service success through software and financial solutions. The role involves delivering IT performance visibility and risk management strategies, including implementation of New Relic and migration to GitHub Enterprise, while managing operational tasks to ensure platform stability.
Responsibilities
- Deploy and manage New Relic APM, infrastructure, and browser agents across AWS services (ECS, Elastic Beanstalk, Lambda, EC2, EKS). Establish standardized alert policies and dashboards using Terraform. Optimize telemetry ingest, manage drop rules, and control observability costs
- Lead migration to GitHub Enterprise, implementing SSO, branch protections, CODEOWNERS, and Advanced Security features. Develop reusable GitHub Actions workflows to streamline CI/CD. Operationalize vulnerability data into actionable Jira workflows with SLA tracking and brand-level reporting
- Design and manage Jira projects, workflows, automation rules, and permissions. Administer Confluence spaces, templates, and backups. Build centralized reporting to provide leadership with visibility into delivery performance, risk, and application health (APM)
- Create domain-level cost dashboards leveraging AWS, New Relic, and SaaS data. Drive cost optimization initiatives (e.g., S3 Intelligent Tiering, lifecycle policies, telemetry drop rules, resource decommissioning). Support vendor renewal evaluations and cost analysis
- Develop reusable Terraform modules, GitHub Actions workflows, and engineering templates. Author reference documentation and promote adoption of best practices across teams
- Own shared AWS infrastructure, including provisioning, access management, networking, and ongoing maintenance. Triage Dependabot PRs, fine-tune alerts, support team migrations, participate in on-call rotations, and create/run operational runbooks
Skills
- 3–6 years of experience in Cloud Engineering, DevOps, Site Reliability Engineering (SRE), Platform Engineering, or Developer Productivity
- Hands-on experience with observability platforms at scale (e.g., New Relic, Datadog, or similar), including agent deployment, alerting, dashboards, ingest management, and integrations
- Experience with GitHub at an organizational level, including teams, SSO, branch protection, OIDC, and reusable GitHub Actions workflows
- Working knowledge of Jira and Confluence as a user; familiarity with project configuration, workflows, and collaboration
- Production experience with AWS services such as IAM, S3, Lambda, and at least one compute platform (e.g., ECS, EC2, EKS)
- Experience using Terraform (or equivalent IaC tools), including authoring and maintaining reusable modules from scratch
- Proficiency in at least one scripting or programming language such as Python or Bash
- Strong written communication skills with experience creating runbooks, technical design documents, and stakeholder-facing reports
- Exposure to GitHub Advanced Security is a plus; willingness to grow into admin-level ownership
- Administrative experience with Jira and Confluence is a plus but not required
Benefits
- Medical, dental, vision, and life insurance
- 401(k) with company match
- Paid time off and holidays
Company Overview