AWS Data Platform Engineer

Xebia is a global AI-first, digital transformation, and engineering partner. With over 25 years of experience and a team of 5,000 professionals across 16 countries, we help organizations design and build scalable products, platforms, and data-driven solutions.

We specialize in Artificial Intelligence, Data and Cloud, Intelligent Automation, and Digital Products, combining deep technical expertise with a strong focus on engineering excellence and a people-first culture.

In the CEE region, we’re a team of nearly 1,000 experts delivering modern applications, data platforms, and AI solutions for clients such as McLaren, Aviva, Deloitte, Spotify, Disney, ING, UPS, Tesco, Truecaller, AllSaints, Volotea, Schmitz Cargobull, Allegro, InPost, and many, many more. We work with leading technologies including AWS, Azure, GCP, Databricks, and Snowflake, and combine strong engineering culture with a consulting mindset and a continuous focus on growth and knowledge sharing.

You will be:

Designing and provisioning the lakehouse foundation on AWS: S3 lake zones (raw, canonical, curated), Apache Iceberg tables, Glue Data Catalog, Athena, and Lake Formation,
delivering all infrastructure as code with Terraform, reproducible from the client's GitHub repositories and CI/CD, across isolated environments (ci, dev, staging, prod),
building and enforcing the tenant-isolation security gate: Lake Formation row-level security and physical partitioning by CompanyId, separate IAM roles for customer versus internal access, and fail-closed handling that quarantines and alerts on rows with missing or unresolved CompanyId,
implementing the upstream entitlement mapping (principal to allowed CompanyIds) that drives access control,
seting up ingestion infrastructure: CDC paths from DynamoDB Streams, AWS DMS extracts from Aurora PostgreSQL, and bulk export to Parquet, supporting the near-real-time (5 min) and batch (4 hr) SLAs,
standing up workflow orchestration with Apache Airflow and Astronomer Cosmos, including profile-based connections, model-level retries, and lineage emission,
building CI/CD pipelines (GitHub Actions): dbt compile and slim builds, SQLFluff linting, DAG validation, branch protection, environment promotion, and Git-revert rollback,
configuring end-to-end governance and observability: catalog and data contracts, OpenLineage capture, and org-wide audit through CloudTrail,
owning cost governance and monitoring: resource tagging, usage alerting, and right-sizing of compute,
collaborating with the Data Platform Architect, Data Engineers, and Analytics Engineer, and support knowledge transfer to the client team.

Your profile:

strong experience with AWS data and platform services: S3, Glue (Data Catalog and Jobs), Athena, Lake Formation, IAM, VPC networking, DMS, DynamoDB, Aurora PostgreSQL,
experience with infrastructure as Code with Terraform, including modular design and YAML-driven configuration,
knowledge about Apache Iceberg and open table formats on S3,
experience with security and access control,
knowledge about CI/CD engineering with GitHub Actions, trunk-based development, and environment promotion,
solid Python and SQL skills,
experience with dbt on AWS (dbt-athena, dbt-glue adapters) and analytics-engineering workflows,
previous exposure to Airflow orchestration, ideally with Astronomer Cosmos,
data lineage and observability (OpenLineage, dbt-elementary) and data quality tooling (dbt tests, dbt-expectations) skills,
experience with CDC and streaming ingestion patterns.

Work from the European Union region and a work permit are required.

Nice to have:

exposure to BI serving layers such as QuickSight or Sigma,
snowflake experience.

Recruitment Process:

CV review – HR call – Interview – Client Interview – Decision