Embedded SRE

Who we are

DigiCert is a global leader in intelligent trust. We protect the digital world by ensuring the security, privacy, and authenticity of every interaction. Our AI-powered DigiCert ONE platform unifies PKI, DNS, and certificate lifecycle management, to secure infrastructure, software, devices, messages, AI content and agents. Learn why more than 100,000 organizations, including 90% of the Fortune 500, choose DigiCert to stop today’s threats and prepare for a quantum-safe future at www.digicert.com

Job summary

We are looking for an experienced Embedded Site Reliability Engineer (SRE) to join our engineering teams. As an Embedded SRE, you will be directly embedded within product development team to bridge the gap between software development and operations. You will drive reliability, scalability, and operational excellence from the inside of the team — not from the outside. You will work closely with developers, architects, and platform engineers to design resilient systems, automate toil, define SLOs/SLIs, and ensure production systems always run smoothly.

What you will do

Design, build, and maintain scalable infrastructure on vSphere or cloud platforms preferably on AWS.
Manage and optimize Kubernetes clusters and container workloads for production reliability.
Administer and optimize CI/CD pipelines to support safe, fast, and frequent deployments (Harness, GitHub Actions, etc.)
Manage load balancers (F5), networking components, and service mesh configurations.
Support OS-level operations including Linux patching.
Identify and eliminate operational toil through automation using Python, Bash, Terraform or similar languages.
Build and maintain Infrastructure as Code (IaC) using Terraform or Salt.
Build and maintain comprehensive monitoring, alerting, and dashboards using tools like New Relic.
Work directly within product team as an embedded SRE - attending standups, sprint planning, and design reviews.
Represent team during high priority incidents and contribute for root cause analysis.

What you will have

3+ years of experience in SRE, DevOps, or Platform Engineering roles.
Proven experience being embedded within or closely partnering with product development teams.
Hands-on experience managing production Kubernetes environments at scale.
Strong background in Linux system administration and OS-level troubleshooting.

Technical Skills

Container orchestration: Kubernetes, Helm, Docker.
Cloud platforms: AWS.
Infrastructure as Code: Terraform, Salt.
Scripting & automation: Python, Bash, Terraform or similar.
Monitoring & observability: Splunk and New Relic.
CI/CD tooling: Harness, GitHub Actions, Jenkins.
Networking fundamentals: DNS, TCP/IP, TLS/SSL, load balancing (F5).
Incident management: PagerDuty, OpsGenie, or equivalent.

Nice to have

Experience with service mesh technologies (Istio).
Experience with certificate lifecycle management tools.
Knowledge of security practices: RBAC, network policies, secrets management (Vault).
AI experience to automate our day-to-day task to remove manual efforts and toil.

Benefits

Generous time off policies
Top shelf benefits
Education, wellness and lifestyle support

To protect candidate information and maintain a secure hiring process, all applications must be submitted through our careers portal. Resumes or CVs sent directly via email will not be reviewed or considered.

#LI-SD1