Lead Engineer - AI
What You'll Do
Architecture & Platform Ownership
Own architecture and scaling decisions for core AI Studio platform components (e.g., App Generation Engine, React Build & Render Pipeline, Domain & Publishing Pipeline, AI Content Systems)
Lead design and implementation of cross-cutting initiatives to improve system responsiveness, generation accuracy, and platform robustness
Build scalable, fault-tolerant LLM pipelines for content generation, layout creation, experimentation, and AI-driven user guidance
AI & Distributed Systems Engineering
Work hands-on with technologies like Go, NestJS, Node, PostgreSQL, Firestore, Vector Databases, Cloudflare Workers, Cloudflare KV, and microservices
Design distributed systems that handle high-throughput generative and analytical workloads while ensuring correctness and low latency
Build and optimize embeddings pipelines, retrieval-augmented generation (RAG), and multi-agent orchestration frameworks
Quality, Observability & Reliability
Drive improvements in observability using Prometheus/Grafana, OpenTelemetry, and structured logging
Establish and maintain SLOs for generation latency, correctness, model safety, and system uptime
Strengthen resiliency and failover strategies to ensure seamless user experiences during traffic spikes and model load variations
AI Guardrails, Safety & Tooling
Implement hallucination mitigation strategies (code constraints, structured outputs, model verification layers, retrieval guards, test scaffolding)
Collaborate with infra and security to enforce data governance, access control, privacy boundaries, and compliance around user-generated data
Leverage LLMs and AI tools to write, test, and debug code — while driving standards that ensure reliability and consistency across engineering teams
Technical Leadership & Collaboration
Mentor engineers through code reviews, design discussions, and pairing, leading by influence to promote technical excellence, code quality, and a high-ownership culture (this is an individual contributor role with no direct reports)
Partner with PMs, designers, and other engineering leaders to define long-term roadmap and deliver high-performing AI-powered application-building features
Participate in reviews, deep dives, and on-call rotations to maintain a culture of accountability and operational excellence
What You'll Bring
-
6+ years of backend engineering experience, including distributed system design and high-scale platform development
-
Strong proficiency in Go (Golang) for building high-performance, concurrent backend services, alongside Node/NestJS experience
-
Hands-on experience building and operating edge/serverless services with Cloudflare Workers and Cloudflare KV (or comparable edge compute and distributed key-value stores)
-
Deep expertise in event-driven architectures, asynchronous workflows, and high-throughput data pipelines.
-
Strong command of relational (PostgreSQL) and NoSQL data models, query optimization, and complex transactional data
-
Fluency in LLM integrations, vector search, embeddings, RAG patterns, and generative AI frameworks
-
Familiarity with frontend architecture using Vue, and a solid understanding of UI/UX principles
-
Experience with monitoring, alerting, and incident response for production systems
-
Strong instincts for scaling, latency optimization, and reliability under real-world load
-
Experience implementing structured generation, verification layers, retrieval-based grounding, and guardrails to reduce hallucinations
-
Understanding of prompt engineering patterns, context injection, and model evaluation
-
Exceptional communication and cross-functional leadership capabilities