Agentic Infrastructure Observability Engineer
<\/div>
ABOUT XENONSTACK<\/b>
<\/h3>
XenonStack is the fastest\-growing <\/span>Data and AI Foundry for Agentic Systems<\/b>, enabling enterprises to gain <\/span>real\-time and intelligent business insights<\/b>. We deliver innovation through: Agentic Systems for AI Agents<\/b> <\/span>→ <\/span>akira.ai<\/span><\/a><\/p><\/li> Vision AI Platform<\/b> <\/span>→ <\/span>xenonstack.ai<\/span><\/a><\/p><\/li> Inference AI Infrastructure for Agentic Systems<\/b> <\/span>→ <\/span>nexastack.ai<\/span><\/a><\/p><\/li><\/ul> Our mission is to accelerate the world\u2019s transition to <\/span>AI + Human Intelligence<\/b> <\/span>by building platforms that are <\/span>scalable, reliable, and observable by design<\/b>. We are seeking an <\/span>Agentic Infrastructure Observability Engineer<\/b> <\/span>to design and implement <\/span>end\-to\-end observability frameworks<\/b> <\/span>for AI\-native and multi\-agent systems. This role sits at the heart of <\/span>AgentOps and Reliability Engineering<\/b> <\/span>\u2014 ensuring that <\/span>agents, pipelines, and infrastructure<\/b> <\/span>are monitored, measurable, and continuously optimized. If you thrive on <\/span>metrics, monitoring, and making complex systems transparent and reliable<\/b>, this role offers a chance to define observability for the next generation of enterprise AI. Observability Frameworks<\/b> Design and implement <\/span>observability pipelines<\/b> <\/span>covering metrics, logs, traces, and cost telemetry for agentic systems. Build <\/span>dashboards and alerting systems<\/b> <\/span>to monitor reliability, performance, and drift in real\-time. Agentic AI Monitoring<\/b> Track <\/span>LLM usage, context windows, token allocation, and multi\-agent interactions<\/b>. Build monitoring hooks into <\/span>LangChain, LangGraph, MCP, and RAG pipelines<\/b>. Reliability & Performance<\/b> Define and monitor <\/span>SLOs, SLIs, and SLAs<\/b> <\/span>for agentic workflows and inference infrastructure. Conduct root cause analysis of <\/span>agent failures, latency issues, and cost spikes<\/b>. Automation & Tooling<\/b> Integrate observability into <\/span>CI/CD and AgentOps pipelines<\/b>. Develop custom plugins/scripts to extend observability for LLMs, agents, and data pipelines. Collaboration & Reporting<\/b> Work with <\/span>AgentOps, DevOps, and Data Engineering teams<\/b> <\/span>to ensure system\-wide observability. Provide <\/span>executive\-level reporting<\/b> <\/span>on reliability, efficiency, and adoption metrics. Continuous Improvement<\/b> Implement <\/span>feedback loops<\/b> <\/span>to improve agent performance and reduce downtime. Stay updated with <\/span>state\-of\-the\-art observability and AI monitoring frameworks<\/b>. Must\-Have<\/b> 3\u20136 years of experience in <\/span>SRE, DevOps, or Observability Engineering<\/b>. Strong knowledge of <\/span>observability tools<\/b> <\/span>(Prometheus, Grafana, ELK, OpenTelemetry, Jaeger). Experience with <\/span>cloud\-native infrastructure (AWS, GCP, Azure)<\/b> <\/span>and Kubernetes monitoring. Proficiency in <\/span>Python, Go, or Bash<\/b> <\/span>for scripting and automation. Understanding of <\/span>AI/LLM pipelines, RAG systems, and vector databases<\/b>. Hands\-on with <\/span>CI/CD pipelines and monitoring\-as\-code<\/b>. Good\-to\-Have<\/b> Experience with <\/span>AgentOps tools<\/b> <\/span>(LangSmith, PromptLayer, Arize AI, Weights & Biases). Exposure to <\/span>AI\-specific observability<\/b> <\/span>(token usage, model latency, hallucination tracking). Knowledge of <\/span>Responsible AI monitoring frameworks<\/b>. Background in <\/span>BFSI, GRC, SOC, or other regulated industries<\/b>. <\/p> <\/p><\/li> <\/p> <\/p><\/li> <\/p> <\/p><\/li> <\/p> <\/p><\/li> <\/p> <\/p><\/li> <\/p> <\/p><\/li> <\/p> <\/p><\/li><\/ol> At XenonStack, we believe in <\/span>shaping the future of intelligent systems<\/b>. We foster a <\/span>culture of cultivation<\/b> <\/span>built on bold, human\-centric leadership principles, where <\/span>deep work, simplicity, and adoption<\/b> <\/span>define everything we do. Our Cultural Values<\/b> Agency<\/b> <\/span>\u2013 Be self\-directed and proactive. Taste<\/b> <\/span>\u2013 Sweat the details and build with precision. Ownership<\/b> <\/span>\u2013 Take responsibility for outcomes. Mastery<\/b> <\/span>\u2013 Commit to continuous learning and growth. Impatience<\/b> <\/span>\u2013 Move fast and embrace progress.
<\/p>
<\/p>
<\/p>THE OPPORTUNITY<\/b>
<\/h3>
<\/p>
<\/p>
<\/p>KEY RESPONSIBILITIES<\/b>
<\/h3>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li>
<\/p>
<\/p><\/li>
<\/p><\/li><\/ul><\/li><\/ul>SKILLS & QUALIFICATIONS<\/b>
<\/h3>
<\/p>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li><\/ul>
<\/p>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li><\/ul>WHY SHOULD YOU JOIN US?<\/b>
<\/h3>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>
<\/div>XENONSTACK CULTURE \u2013 JOIN US & MAKE AN IMPACT!<\/b>
<\/h3>
<\/p>
<\/p>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>
<\/p><\/li>