Data Engineer

\n <\/head>\n \n

This is a remote position.<\/p>\n

We are seeking a Data Engineer to build and maintain our cutting\-edge ELT pipelines that transform raw, diverse data sources into reliable, structured datasets powering our analytics.
<\/span><\/span><\/p>\n

You will play a pivotal role in transforming unstructured data into structured, actionable formats, enabling efficient and reliable data processing across Yirifi's operations. This includes designing and maintaining data architecture, building robust pipelines, and collaborating with cross\-functional teams to deliver scalable, high\-quality data solutions.
<\/span><\/span>
<\/p>\n <\/div><\/span>

Requirements<\/h3>\n
\n
\n
\n Experience:<\/span><\/span><\/b> <\/span><\/span>
\n <\/div>\n
\n
Strong proficiency in <\/span><\/span>Python<\/span><\/span><\/b>, including libraries such as Pandas, NumPy, and data parsing frameworks for unstructured data.
<\/span><\/span><\/li>\n
Proficiency in designing and implementing ETL/ELT workflows using <\/span><\/span>dbt<\/span><\/span><\/b> and <\/span><\/span>Airflow<\/span><\/span><\/b>.
<\/span><\/span><\/li>\n
Experience with <\/span><\/span>SQL<\/span><\/span><\/b> and relational databases to build structured data repositories.
<\/span><\/span><\/li>\n
In\-depth experience with AWS services like <\/span><\/span>S3<\/span><\/span><\/b>, <\/span><\/span>Glue<\/span><\/span><\/b>, <\/span><\/span>Lambda<\/span><\/span><\/b>, and <\/span><\/span>CloudWatch<\/span><\/span><\/b>.
<\/span><\/span><\/li>\n <\/ul><\/li>\n
\n
\n Operational Competencies:<\/span><\/span><\/b> <\/span><\/span>
\n <\/div>\n
\n
Analytical Thinking:<\/b> Strong analytical and problem\-solving skills, particularly in working with unstructured data sources.
<\/span><\/span><\/li>\n
Adaptability:<\/b> Ability to work effectively in a fast\-paced, dynamic environment with competing priorities.
<\/span><\/span><\/li>\n
Communication:<\/b> Excellent communication skills to partner with data scientists, analysts, and stakeholders.<\/span><\/span>
<\/li>\n <\/ul><\/li>\n <\/ul>\n
\n Job Responsibilities <\/span>
<\/span><\/b>Design and Build Scalable Data Pipelines
<\/b><\/span><\/span>\n <\/div>\n
\n
Develop scalable ETL/ELT pipelines to process and transform unstructured data into structured formats using Airflow, AWS Glue, Python, and dbt.
<\/span><\/span><\/span><\/li>\n
Ensure pipelines handle diverse data sources (e.g., JSON, XML, text, and raw logs) and support structured outputs like relational databases or parquet files.<\/span><\/span><\/span>
<\/span><\/span><\/li>\n <\/ul>\n
\n Data Quality and Validation<\/b>
<\/span><\/span>\n <\/div>\n
\n
Implement automated validation checks for data consistency, completeness, and accuracy using dbt tests, AWS Glue Data Quality, or custom Python scripts.
<\/span><\/span><\/span><\/li>\n
Implement logic that ensures consistency and completeness across all data ingestion points.
<\/span><\/span><\/span><\/li>\n
Build a reporting dashboard to monitor data quality metrics and pipeline health.<\/span><\/span><\/span>
<\/span><\/span><\/li>\n <\/ul>\n
\n Metadata Management<\/b>
<\/span><\/span>\n <\/div>\n
\n
Enhance and maintain metadata in AWS Glue Catalog, ensuring all datasets have clear descriptions, schema definitions, and lineage tracking.
<\/span><\/span><\/span><\/li>\n
Create a centralized metadata repository for easy data discovery and governance.<\/span><\/span><\/span>
<\/span><\/span><\/li>\n <\/ul>\n
\n Pipeline Reliability and Performance Optimization<\/b>
<\/span><\/span>\n <\/div>\n
\n
Establish monitoring mechanisms (e.g., AWS CloudWatch) to track pipeline performance and detect bottlenecks or failures proactively.
<\/span><\/span><\/span><\/li>\n
Optimize data workflows for speed and cost efficiency, particularly for high\-volume unstructured data processing.<\/span><\/span><\/span>
<\/span><\/span><\/li>\n <\/ul>\n
\n Collaborate with Stakeholders<\/b>
<\/span><\/span>\n <\/div>\n
\n
Translate business and compliance requirements into technical specifications for data ingestion and transformation.
<\/span><\/span><\/span><\/li>\n
Partner with data scientists and analysts to create tailored data structures for machine learning models and analytics.<\/span><\/span><\/span>
<\/span><\/span><\/li>\n <\/ul>\n
\n Emerging Technologies<\/b>
<\/span><\/span>\n <\/div>\n
\n
Identify and pilot emerging tools and frameworks for handling unstructured data (e.g., text parsing with NLP libraries or distributed processing with Spark).<\/span><\/span><\/span>
<\/li>\n <\/ul>\n
\n
\n <\/div><\/span>
Benefits<\/h3>\n
\n Be part of a rapidly growing company revolutionizing crypto risk and compliance. As a Data Engineer at Yirifi, you will have the opportunity to build innovative data solutions that directly impact our mission. Work with cutting\-edge technologies, solve complex challenges, and drive meaningful change in the digital assets space. Join us to accelerate your career and make a significant impact!<\/span><\/span>
\n <\/div><\/span>
\n <\/body>\n<\/html>