Finding the best job has never been easier
Share
• Utilize data management and processing capabilities in PySpark/SparkSQL to design, build, and optimize scalable data pipelines.
• Leverage the big data platforms for large-scale data processing, ensuring efficient data workflows and integration.
• Implement robust ETL processes to extract, transform, and load data from various sources into data lakes and warehouses.
• Identify, troubleshoot, and resolve data quality issues, ensuring the integrity and reliability of data across all pipelines.
• Optimize data storage and retrieval for both batch and real-time data processing.
• Work with diverse datasets, ensuring data availability and consistency for stakeholders.
• Collaborate with data scientists and analysts to enable advanced analytics and machine learning models through well-engineered data pipelines.
• Collaborate with the client and other vendor teams in complex projects and lead the technical solution for data management & migration projectsRequired Technical and Professional Expertise
• Bachelor’s degree or higher in Computer Science, Information Technology, or a related field.
• 4+ years of hands-on experience in data engineering and data pipeline development.
• Proficiency in PySpark/SparkSQL/SQL for big data processing and optimization.
Familiarity with cloud platforms, particularly Azure, for deploying and managing data infrastructure.
Solid understanding of ETL processes, data warehousing, and data lake architectures.
• Experience with Databricks or similar big data platforms.
• Experience in managing data quality
• Experience in development with MSSQL server & SSIS
Preferred Technical and Professional Expertise
• Working experience in Banking sector projects
• Azure, Databricks certification or equivalent experience is highly preferred.
• Experience with version control systems (e.g. Git) and CI/CD pipelines for data engineering.
• Cluster Tuning experience, for optimization of performance
These jobs might be a good fit