Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Citi Group Data Engineer - Controls Technology 
India, Tamil Nadu, Chennai 
840885075

01.04.2025

Data Pipeline Development:

  • Design, build, and optimize ETL/ELT pipelines for structured and unstructured data.
  • Develop real-time and batch data ingestion pipelines using distributed data processing frameworks.
  • Ensure pipelines are highly performant, cost-efficient, and secure.

Apache Iceberg & Starburst Integration:

  • Work extensively with Apache Iceberg for data lake storage optimization and schema evolution.
  • Manage Iceberg Catalogs and ensure seamless integration with query engines.
  • Configure and maintain Hive MetaStore (HMS) for Iceberg-backed tables and ensure proper metadata management.
  • Utilize Starburst and Stargate to enable distributed SQL-based analytics and seamless data federation.
  • Optimize performance tuning for large-scale querying and federated access to structured and semi-structured data.

Data Mesh Implementation:

  • Implement Data Mesh principles by developing domain-specific data products that are discoverable, interoperable, and governed.
  • Collaborate with data domain owners to enable self-service data access while ensuring consistency and quality.

Hybrid Cloud Data Integration:

  • Develop and manage data storage, processing, and retrieval solutions across AWS and on-premise environments.
  • Work with cloud-native tools such as AWS S3, RDS, Lambda, Glue, Redshift, and Athena to support scalable data architectures.
  • Ensure hybrid cloud data flows are optimized, secure, and compliant with organizational standards.

Data Governance & Security:

  • Implement data governance, lineage tracking, and metadata management solutions.
  • Enforce security best practices for data encryption, role-based access control (RBAC), and compliance with policies such as GDPR and CCPA.

Performance Optimization & Monitoring:

  • Monitor and optimize data workflows, performance tuning of queries, and resource utilization.
  • Implement logging, alerting, and monitoring solutions using CloudWatch, Prometheus, or Grafana to ensure system health.

Collaboration & Documentation:

  • Work closely with data architects, application teams, and business units to ensure seamless integration of data solutions.
  • Maintain clear documentation of data models, transformations, and architecture for internal reference and governance.

Programming & Scripting:

  • Strong proficiency in Python, SQL, and Shell scripting.
  • Experience with Scala or Java is a plus.

Data Processing & Storage:

  • Hands-on experience with Apache Spark, Kafka, Flink, or similar distributed processing frameworks.
  • Strong knowledge of relational (PostgreSQL, MySQL, Oracle) and NoSQL databases (DynamoDB, MongoDB).
  • Expertise in Apache Iceberg for managing large-scale data lakes, schema evolution, and ACID transactions.
  • Experience working with Iceberg Catalogs, Hive MetaStore (HMS), and integrating Iceberg-backed tables with query engines.
  • Familiarity with Starburst and Stargate for federated querying and cross-platform data access.

Cloud & Hybrid Architecture:

  • Experience working with AWS data services (S3, Redshift, Glue, Athena, EMR, RDS).
  • Understanding of hybrid data storage and integration between on-prem and cloud environments.

Infrastructure as Code (IaC) & DevOps:

  • Experience with Terraform, AWS CloudFormation, or Kubernetes for provisioning infrastructure.
  • CI/CD pipeline experience using GitHub Actions, Jenkins, or GitLab CI/CD.

Data Governance & Security:

  • Familiarity with data cataloging, lineage tracking, and metadata management.
  • Understanding of RBAC, IAM roles, encryption, and compliance frameworks (GDPR, SOC2, etc.).

Required Soft Skills:

  • Problem-Solving & Analytical Thinking - Ability to troubleshoot complex data issues and optimize workflows.
  • Collaboration & Communication - Comfortable working with cross-functional teams and articulating technical concepts to non-technical stakeholders.
  • Ownership & Proactiveness - Self-driven, detail-oriented, and able to take ownership of tasks with minimal supervision.
  • Continuous Learning - Eager to explore new technologies, improve skill sets, and stay ahead of industry trends.

Qualifications:

  • 4-6 years of experience in data engineering, cloud infrastructure, or distributed data processing.
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field.
  • Hands-on experience with data pipelines, cloud services, and large-scale data platforms.
  • Strong foundation in SQL, Python, Apache Iceberg, Starburst, and cloud-based data solutions (AWS preferred), Apache Airflow orchestration
Technology Project Management


Time Type:

Full time

View the " " poster. View the .

View the .

View the