As a Data Engineer, you will be responsible for designing, building, and maintaining large-scale data systems, as well as working with cross-functional teams to ensure efficient data processing and integration. You will leverage your knowledge of Apache Spark to create robust ETL processes, optimize data workflows, and manage high volumes of structured and unstructured data.
How will you make an impact?
- Design, implement, and maintain data pipelines using Apache Spark for processing large datasets.
- Work with data engineering teams to optimize data workflows for performance and scalability.
- Integrate data from various sources, ensuring clean, reliable, and high-quality data for analysis.
- Develop and maintain data models, databases, and data lakes.
- Build and manage scalable ETL solutions to support business intelligence and data science initiatives.
- Monitor and troubleshoot data processing jobs, ensuring they run efficiently and effectively.
- Collaborate with data scientists, analysts, and other stakeholders to understand business needs and deliver data solutions.
- Implement data security best practices to protect sensitive information.
- Maintain a high level of data quality and ensure timely delivery of data to end-users.
- Continuously evaluate new technologies and frameworks to improve data engineering processes.
Have you got what it takes?
- 8-11 years of experience as a Data Engineer, with a strong focus on Apache Spark and big data technologies.
- Expertise in Spark SQL , DataFrames , and RDDs for data processing and analysis.
- Proficient in programming languages such as Python , Scala , or Java for data engineering tasks.
- Hands-on experience with cloud platforms like AWS , specifically with data processing and storage services (e.g., S3 , BigQuery , Redshift , Databricks ).
- Experience with ETL frameworks and tools such as Apache Kafka , Airflow , or NiFi .
- Strong knowledge of data warehousing concepts and technologies (e.g., Redshift , Snowflake , BigQuery ).
- Familiarity with containerization technologies like Docker and Kubernetes .
- Knowledge of SQL and relational databases, with the ability to design and query databases effectively.
- Solid understanding of distributed computing, data modeling, and data architecture principles.
- Strong problem-solving skills and the ability to work with large and complex datasets.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
You will have an advantage if you also have:
- Knowledge of SQL and relational databases, with the ability to design and query databases effectively.
- Solid understanding of distributed computing, data modeling, and data architecture principles.
- Strong problem-solving skills and the ability to work with large and complex datasets.
Reporting into:Tech ManagerRole Type:Individual Contributor