Web Crawling Engineer

We are partnering with a fast-growing AI infrastructure company building foundational technology powering the next generation of AI-native search and retrieval systems. The team is building large-scale infrastructure to crawl and index the web, train advanced embedding/retrieval models, and operate high-performance distributed systems at massive scale.

They are looking for experienced engineers who enjoy solving deep infrastructure and internet-scale engineering problems.

The Role

As a Web Crawler Engineer, you will help design and build internet-scale crawling infrastructure capable of processing hundreds of millions of webpages efficiently and reliably.

This role sits at the intersection of distributed systems, performance engineering, browser automation, and large-scale data infrastructure.

You will work on problems such as:

Distributed web crawling at massive scale
Intelligent crawl scheduling and prioritisation
JavaScript-heavy and dynamic site rendering
Anti-bot detection and evasion handling
Crawl politeness, rate limiting, and domain-aware orchestration
High-throughput data pipelines and infrastructure optimisation

Responsibilities

Build and scale distributed web crawling systems processing 100M+ pages daily
Design highly performant infrastructure for crawl orchestration and scheduling
Improve handling of dynamic content and JavaScript-rendered websites
Work with browser automation tooling and Chrome DevTools Protocol (CDP)
Optimise crawling efficiency, reliability, and resource utilisation
Develop systems for rate limiting, crawl politeness, and domain management
Contribute to low-level performance optimisation across the stack

Requirements

Strong experience building scalable backend or distributed systems
Prior experience working on web crawlers, scraping infrastructure, search infrastructure, browser automation, or adjacent systems
Experience with high-performance languages such as Rust, C++, or Go
Familiarity with TypeScript, Playwright, Puppeteer, or browser automation tooling
Understanding of modern web technologies and dynamic rendering
Strong systems thinking and performance optimisation mindset
Interest in AI infrastructure, search, and knowledge retrieval systems

Nice to Have

Experience with Chrome DevTools Protocol (CDP)
Experience handling anti-bot systems at scale
Experience with distributed job orchestration systems
Exposure to search engines, indexing systems, or retrieval infrastructure
Kubernetes / cloud infrastructure experience