Staff Engineer — Agentic AI

About the Role

This is a senior technical leadership role at a well-funded Series A startup building AI agents for hardware engineers. You'll own the core agent intelligence layer — the system that translates mechanical engineers' intent into reliable, cost-efficient multi-step workflows across complex desktop engineering tools (CAD, simulation, PLM, and more).

You'll report directly to the CTO and serve as technical lead for a small team of AI engineers, a user researcher, and domain expert contractors. This role sits at the intersection of applied agentic AI, user research, and product delivery — and it directly determines the product's real-world value to enterprise customers.

The company serves Fortune 100 customers and has significant backing from top-tier investors. This is a high-impact, on-site role based in San Francisco.

What You'll Do

Drive agent task success rate. Own the metric that matters most — define the eval framework, establish baselines, and systematically improve whether the agent can complete the workflows engineers actually need.
Set and enforce token budgets per problem. Define per-task token budgets, track cost per completed workflow, and ensure commercial viability — not just technical impressiveness.
Build rigorous evaluation infrastructure. Design benchmarks grounded in real user stories with SWE-bench-level rigor — reproducible, adversarial, and tied to measurable customer value.
Lead user story mapping and validation. Work directly with the user researcher and domain experts to interview engineers, document workflows in detail, and validate that what you're building against reflects reality.
Expand workflow coverage. Systematically grow the percentage of top user story steps the agent handles end-to-end, prioritizing by customer value and technical feasibility.
Translate user stories into evals. Close the loop between user research and agent benchmarking — every validated user story becomes a test case.
Own the agent architecture. Make foundational decisions on tool-calling strategies, state management, error recovery, model routing, and context management.
Lead as a player-coach. Set technical direction, review architecture decisions, write production code, unblock the team, and raise the engineering bar.
Collaborate cross-functionally with integrations, product, and customers during POCs to align agent behavior with real-world enterprise usage.

What We're Looking For

Dealbreakers (must-haves):

7+ years in software engineering, with at least 2 years building agentic LLM-based systems — agents that call tools, manage multi-step workflows, handle failures, and operate under cost constraints.
Deep experience with LLM application architecture: model selection, context window management, retrieval strategies, tool-calling frameworks, and orchestration patterns.
Strong evaluation and benchmarking instincts for agentic systems — task completion, cost efficiency, failure mode analysis; familiarity with benchmarks such as SWE-bench, GAIA, or τ-bench.

Required:

Proven track record of shipping AI systems with measurable outcomes (agent task success rate, cost efficiency) — not just demos.
Strong Python skills and working knowledge of the LLM tooling ecosystem: function calling, tool use APIs, tracing/observability tools (e.g., Logfire, LangSmith), and evaluation frameworks.
Experience leading a small technical team (3–6 engineers): setting technical direction, performing code reviews, driving architecture decisions.

Nice to Have:

Published work or open-source contributions in agentic AI systems.
Familiarity with enterprise deployment constraints — agent behavior on locked-down corporate workstations.
Experience with desktop automation, COM, or programmatic control of applications (beyond web APIs).
Background in mechanical engineering, CAD/CAE, PLM, or adjacent industries.
Experience building or contributing to public AI agent benchmarks.

Compensation & Benefits

Salary: $160,000 – $250,000 USD annually, depending on experience.
Equity participation in a well-funded, early-stage company with significant enterprise traction.
Visa sponsorship: Not available.

Location

This is an on-site role based in San Francisco, CA. Remote work is not available for this position.