Data Pipeline Development:
- Design, build, and optimize ETL/ELT pipelines for structured and unstructured data.
- Develop real-time and batch data ingestion pipelines using distributed data processing frameworks.
- Ensure pipelines are highly performant, cost-efficient, and secure.
Apache Iceberg & Starburst Integration:
- Work extensively with Apache Iceberg for data lake storage optimization and schema evolution.
- Manage Iceberg Catalogs and ensure seamless integration with query engines.
- Configure and maintain Hive MetaStore (HMS) for Iceberg-backed tables and ensure proper metadata management.
- Utilize Starburst and Stargate to enable distributed SQL-based analytics and seamless data federation.
- Optimize performance tuning for large-scale querying and federated access to structured and semi-structured data.
Data Mesh Implementation:
- Implement Data Mesh principles by developing domain-specific data products that are discoverable, interoperable, and governed.
- Collaborate with data domain owners to enable self-service data access while ensuring consistency and quality.
Hybrid Cloud Data Integration:
- Develop and manage data storage, processing, and retrieval solutions across AWS and on-premise environments.
- Work with cloud-native tools such as AWS S3, RDS, Lambda, Glue, Redshift, and Athena to support scalable data architectures.
- Ensure hybrid cloud data flows are optimized, secure, and compliant with organizational standards.
Data Governance & Security:
- Implement data governance, lineage tracking, and metadata management solutions.
- Enforce security best practices for data encryption, role-based access control (RBAC), and compliance with policies such as GDPR and CCPA.
Performance Optimization & Monitoring:
- Monitor and optimize data workflows, performance tuning of queries, and resource utilization.
- Implement logging, alerting, and monitoring solutions using CloudWatch, Prometheus, or Grafana to ensure system health.
Collaboration & Documentation:
- Work closely with data architects, application teams, and business units to ensure seamless integration of data solutions.
- Maintain clear documentation of data models, transformations, and architecture for internal reference and governance.
Programming & Scripting:
- Strong proficiency in Python, SQL, and Shell scripting.
- Experience with Scala or Java is a plus.
Data Processing & Storage:
- Hands-on experience with Apache Spark, Kafka, Flink, or similar distributed processing frameworks.
- Strong knowledge of relational (PostgreSQL, MySQL, Oracle) and NoSQL databases (DynamoDB, MongoDB).
- Expertise in Apache Iceberg for managing large-scale data lakes, schema evolution, and ACID transactions.
- Experience working with Iceberg Catalogs, Hive MetaStore (HMS), and integrating Iceberg-backed tables with query engines.
- Familiarity with Starburst and Stargate for federated querying and cross-platform data access.
Cloud & Hybrid Architecture:
- Experience working with AWS data services (S3, Redshift, Glue, Athena, EMR, RDS).
- Understanding of hybrid data storage and integration between on-prem and cloud environments.
Infrastructure as Code (IaC) & DevOps:
- Experience with Terraform, AWS CloudFormation, or Kubernetes for provisioning infrastructure.
- CI/CD pipeline experience using GitHub Actions, Jenkins, or GitLab CI/CD.
Data Governance & Security:
- Familiarity with data cataloging, lineage tracking, and metadata management.
- Understanding of RBAC, IAM roles, encryption, and compliance frameworks (GDPR, SOC2, etc.).
Required Soft Skills:
- Problem-Solving & Analytical Thinking - Ability to troubleshoot complex data issues and optimize workflows.
- Collaboration & Communication - Comfortable working with cross-functional teams and articulating technical concepts to non-technical stakeholders.
- Ownership & Proactiveness - Self-driven, detail-oriented, and able to take ownership of tasks with minimal supervision.
- Continuous Learning - Eager to explore new technologies, improve skill sets, and stay ahead of industry trends.
Qualifications:
- 4-6 years of experience in data engineering, cloud infrastructure, or distributed data processing.
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field.
- Hands-on experience with data pipelines, cloud services, and large-scale data platforms.
- Strong foundation in SQL, Python, Apache Iceberg, Starburst, and cloud-based data solutions (AWS preferred), Apache Airflow orchestration
Technology Project Management
Time Type:
Full timeView the " " poster. View the .
View the .
View the