Senior AI Ops Architect

We are seeking a highly experienced Senior AI Ops Architect with exceptional expertise in Gen-AI-enabled Cloud Engineering, Observability, Operational Intelligence, and AI-driven automation. The ideal candidate will bring 10+ years of enterprise-level architecture experience, with a focus on building innovative Gen-AI-enabled platforms, data-driven automation frameworks, and enterprise-grade AIOps solutions to advance operational efficiency. Responsibilities Design and deliver scalable Gen-AI-powered AIOps solutions for large enterprise platforms to improve MTTR, achieve automated incident resolution, and drive operational excellence Architect and implement Gen-AI & LLM Engineering solutions using tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, and LangChain Develop and optimize MLOps pipelines and model deployment workflows leveraging SageMaker, Azure ML, clustering, topic modeling, and anomaly detection techniques Implement RAG, Vector DBs, and advanced semantic search across platforms using PGVector, Elasticsearch, and Bedrock Knowledge Sources Create and automate solutions for Cloud Platforms and Infrastructure with AWS, Azure, GCP, Terraform, CloudFormation, and Helm, alongside Python and Shell Scripting Lead Kubernetes-based container orchestration and DevSecOps initiatives, including CI/CD pipelines, Istio, and KEDA deployment strategies Design and integrate serverless and cloud-native architectures using API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis Implement end-to-end Observability solutions using DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda Ensure seamless ITSM and ServiceNow integration for AI-driven operations and automation Work with ITSM tools like ServiceNow, Jira Service Management, and Manage Engine to streamline incident management workflows Provide thought leadership in AIOps, automation, and AI-powered operational intelligence to leadership and engineering teams Requirements 19+ years of overall IT experience 10+ years of professional experience in Enterprise Cloud, Infrastructure Engineering, SRE, Automation, and Architecture roles Proven track record of delivering Gen-AI-powered AIOps solutions in production environments, driving efficiencies like MTTR improvement and operational automation Expertise in Gen-AI and LLM Engineering tools such as Amazon Bedrock, Azure OpenAI, Vertex AI, Anthropic, LangChain, and Bedrock Agents Proficiency in RAG, Vector Databases, and semantic search solutions like PGVector, Elasticsearch, and Bedrock Knowledge Sources Background in MLOps, model development, and machine learning techniques using SageMaker, Azure ML, clustering, topic modeling, and anomaly detection Skills in cloud engineering and automation technologies, including AWS, Azure, GCP, Terraform, CloudFormation, Helm, Python, and Shell Scripting Capability to design and operate Kubernetes-based infrastructure, CI/CD pipelines, security automation, Istio, and KEDA Familiarity with serverless computing and cloud-native tools like API Gateway, Lambda, Step Functions, DynamoDB, S3, and Kinesis Knowledge of Observability platforms such as DataDog, OpenTelemetry, Dynatrace, New Relic, Splunk, Moogsoft, and BigPanda Understanding of ITSM platforms, including ServiceNow, Jira Service Management, and Manage Engine Showcase of AI and Machine Learning expertise in areas like anomaly detection, GenAI implementation, and agentic AI solutions Ability to communicate effectively in both written and spoken English (B2 level or higher) Nice to have Experience leading AIOps/Cloud Practices or platform engineering organizations Certifications in AWS ML, Cloud Architecture, or AI Leadership

Similar jobs