Platform Engineer

It's fun to work in a company where people truly BELIEVE in what they're doing!

We're committed to bringing passion and customer focus to the business.

Job Summary

About the Role :

We are looking for a Platform Engineer with 5 to 10 years of experience focused on the reliable operation of our Kubernetes-based platform. In this role, you will be responsible for day-to-day platform operations, environment provisioning, and maintaining platform services that support our customers and engineering teams.You will work closely with senior platform engineers and application teams to ensure that our infrastructure and platform services remain stable, secure, and scalable. This role is ideal for engineers who enjoy operational problem solving, troubleshooting production systems, and improving reliability through automation.We value engineers who take a customer-focused and result-oriented approach understanding that the platform exists to serve customers, development teams and ultimately deliver value to end users. Our goal is to build a platform that enables teams to move faster while maintaining high reliability and operational excellence.

What You Will Do

Platform Operations :

  • Provision and maintain Kubernetes environments for development, staging, and production

  • Perform day-2 operations for platform services (databases, message queues, caching systems)

  • Manage cluster upgrades, patching, and routine maintenance

  • Monitor platform health and respond to alerts

Environment Provisioning

  • Set up new environments and namespaces for application teams

  • Configure networking, ingress, secrets, and storage

  • Maintain environment consistency across clusters

Platform Service Operations

Operate and maintain shared platform services such as:

  • PostgreSQL / MySQL

  • Redis

  • Kafka / RabbitMQ

  • Object storage

  • CI/CD infrastructure

Responsibilities include:

  • Provisioning new instances

  • Monitoring and capacity management

  • Backup verification

  • Incident troubleshooting

Reliability & Incident Response

  • Investigate and resolve platform incidents

  • Participate in on-call rotation

  • Perform root cause analysis and implement improvements

Automation & Improvements

  • Automate operational tasks where possible

  • Improve monitoring, alerting, and operational runbooks

  • Contribute to infrastructure-as-code repositories

Required Skills & Experience
  • Experience operating Kubernetes clusters in production

  • Strong Linux and container fundamentals

  • Experience with Kubernetes troubleshooting and debugging

  • Understanding of:

    • Kubernetes deployments, services, and networking

    • Resource management and scaling

    • Persistent storage in Kubernetes

  • Experience operating at least some of the following:

    • Databases (PostgreSQL, MySQL)

    • Message queues (Kafka, RabbitMQ)

    • Caching systems (Redis)

  • Familiarity with infrastructure automation tools (Terraform, Helm, or similar)

  • Experience with monitoring and logging systems

  • Strong troubleshooting and incident response skills

Nice to Have

  • Experience with CI/CD platforms

  • Familiarity with GitOps workflows

  • Cloud provider experience (AWS, GCP, Azure)

  • Experience with observability platforms (Prometheus, Grafana, etc.)

  • Experience supporting multi-team Kubernetes environments

What We Value

Customer-Focused Mindset

The platform serves developers and internal teams. Understanding their needs and helping them succeed is a key part of the role.

Result-Oriented Approach

We value engineers who focus on outcomes — solving real problems, improving reliability, and delivering practical solutions.

Operational Excellence

You keep critical infrastructure reliable and well-maintained.

Ownership

You take responsibility for issues and follow them through to resolution.

Continuous Improvement

You automate repetitive tasks and improve operational processes.

Collaboration

You work closely with developers and platform engineers to keep systems running smoothly.

What Success Looks Like
  • Platform environments are reliable and well-maintained

  • Engineers can request and receive environments quickly and consistently

  • Platform services run without operational surprises

  • Incidents are handled efficiently and professionally

  • Operational tasks are gradually automated and standardized

  • Internal teams experience the platform as stable, predictable, and easy to use.

Similar jobs