Data Engineer - AWS
AWS Data Engineer (PySpark)
Location: Northampton - 3 Days Onsite (Hybrid Mode)
Experience: 5+ Years
Employment Type: Contract
Job Summary
We are looking for an experienced AWS Data Engineer with strong PySpark expertise to design, develop, and optimize scalable data pipelines and cloud-based data platforms. The ideal candidate will have hands-on experience in building ETL/ELT solutions on AWS, processing large datasets using Spark, and implementing data engineering best practices.
Key Responsibilities
- Develop and maintain scalable data pipelines using PySpark and AWS services.
- Build robust ETL/ELT workflows for ingesting, transforming, and loading data from multiple sources.
- Design and manage data lakes and data warehouse solutions on AWS.
- Work with AWS services such as S3, Glue, EMR, Redshift, Lambda, Athena, IAM, and CloudWatch.
- Optimize Spark jobs for performance, scalability, and cost efficiency.
- Implement data quality, validation, and monitoring processes.
- Collaborate with business stakeholders, analysts, and architects to deliver data solutions.
- Support production deployments, troubleshooting, and performance tuning.
- Maintain technical documentation and follow data governance standards.
Required Skills
- 5+ years of Data Engineering experience.
- Strong hands-on experience with PySpark and Apache Spark.
- Extensive experience with AWS Cloud Services:
- S3
- Glue
- EMR
- Redshift
- Athena
- Lambda
- IAM
- CloudWatch
- Strong programming skills in Python.
- Advanced SQL development and query optimization skills.
- Experience building large-scale ETL/ELT pipelines.
- Knowledge of Data Warehousing and dimensional data modeling.
- Experience with Git and CI/CD practices.
Preferred Skills
- Experience with Databricks.
- Knowledge of Apache Airflow or AWS Step Functions.
- Experience with Kafka or real-time data processing.
- Exposure to Terraform and Infrastructure as Code (IaC).
- Experience with Snowflake or Lakehouse architectures.
- AWS Certification is highly desirable.