Web Crawling Engineer

We are partnering with a fast-growing AI infrastructure company building foundational technology powering the next generation of AI-native search and retrieval systems. The team is building large-scale infrastructure to crawl and index the web, train advanced embedding/retrieval models, and operate high-performance distributed systems at massive scale.

They are looking for experienced engineers who enjoy solving deep infrastructure and internet-scale engineering problems.

The Role

As a Web Crawler Engineer, you will help design and build internet-scale crawling infrastructure capable of processing hundreds of millions of webpages efficiently and reliably.

This role sits at the intersection of distributed systems, performance engineering, browser automation, and large-scale data infrastructure.

You will work on problems such as:

  • Distributed web crawling at massive scale
  • Intelligent crawl scheduling and prioritisation
  • JavaScript-heavy and dynamic site rendering
  • Anti-bot detection and evasion handling
  • Crawl politeness, rate limiting, and domain-aware orchestration
  • High-throughput data pipelines and infrastructure optimisation

Responsibilities

  • Build and scale distributed web crawling systems processing 100M+ pages daily
  • Design highly performant infrastructure for crawl orchestration and scheduling
  • Improve handling of dynamic content and JavaScript-rendered websites
  • Work with browser automation tooling and Chrome DevTools Protocol (CDP)
  • Optimise crawling efficiency, reliability, and resource utilisation
  • Develop systems for rate limiting, crawl politeness, and domain management
  • Contribute to low-level performance optimisation across the stack

Requirements

  • Strong experience building scalable backend or distributed systems
  • Prior experience working on web crawlers, scraping infrastructure, search infrastructure, browser automation, or adjacent systems
  • Experience with high-performance languages such as Rust, C++, or Go
  • Familiarity with TypeScript, Playwright, Puppeteer, or browser automation tooling
  • Understanding of modern web technologies and dynamic rendering
  • Strong systems thinking and performance optimisation mindset
  • Interest in AI infrastructure, search, and knowledge retrieval systems

Nice to Have

  • Experience with Chrome DevTools Protocol (CDP)
  • Experience handling anti-bot systems at scale
  • Experience with distributed job orchestration systems
  • Exposure to search engines, indexing systems, or retrieval infrastructure
  • Kubernetes / cloud infrastructure experience