What you’ll do: As a Data Engineer – Data Platform Services, you will be responsible for:
Data Migration & Modernization
• Leading the migration of ETL workflows from IBM DataStage to PySpark, ensuring performance optimization and cost efficiency.
• Designing and implementing data ingestion frameworks using Kafka and PySpark, replacing legacy ETL Pipeline using DataStage.
• Migrating the analytical platform from IBM Integrated Analytics System (IIAS) to Cloudera Data Lake on CDP.
Data Engineering & Pipeline Development
• Developing and maintaining scalable, fault-tolerant, and optimized data pipelines on Cloudera Data Platform.
• Implementing data transformations, enrichment, and quality checks to ensure accuracy and reliability.
• Leveraging Denodo for data virtualization and enabling seamless access to distributed datasets.
Performance Tuning & Optimization
• Optimizing PySpark jobs for efficiency, scalability, and reduced cost on Cloudera.
• Fine-tuning query performance on Iceberg tables and ensuring efficient data storage and retrieval.
• Collaborating with Cloudera ML engineers to integrate machine learning workloads into data pipelines.
Security & Compliance
• Implementing Thales CipherTrust encryption and tokenization mechanisms for secure data processing.
• Ensuring compliance with Bank/regulatory body security guidelines, data governance policies, and best practices.
Collaboration & Leadership
• Working closely with business stakeholders, architects, and data scientists to align solutions with business goals.
• Leading and mentoring junior data engineers, conducting code reviews, and promoting best practices.
• Collaborating with DevOps teams to streamline CI/CD pipelines, using GitLab and Nexus Repository for efficient deployments.
12+ years of experience in Data Engineering, ETL, and Data Platform Modernization.
• Hands-on experience in IBM DataStage and PySpark, with a track record of migrating legacy ETL workloads.
• Expertise in Apache Iceberg, Cloudera Data Platform, and Big-data processing frameworks.
• Strong knowledge of Kafka, Airflow, and cloud-native data processing solutions.
• Experience with Denodo for data virtualization and Talend DQ for data quality.
• Proficiency in SQL, NoSQL, and Graph DBs (DGraph Enterprise).
• Strong understanding of data security, encryption, and compliance standards (Thales CipherTrust).
• Experience with DevOps, CI/CD pipelines, GitLab, and Sonatype Nexus Repository.
• Excellent problem-solving, analytical, and communication skills.
Experience with Cloudera migration projects in Banking or financial domains.
• Experience working with Banking Data model.
• Knowledge of Cloudera ML, Qlik Sense/Tableau reporting, and integration with data lakes.
• Hands-on experience with QuerySurge for automated data testing.
• Understanding of code quality and security best practices using CheckMarx.
• IBM, Cloudera, or AWS/GCP certifications in Data Engineering, Cloud, or Security.
• “Meghdoot” Cloud platform knowledge.
• Architectural designing and recommendations the best possible solutions.