Big Data Platform Engineer / BD Site Reliability Engineer (SRE) - Job Requirements
Position Overview
Key Responsibilities
- Customer Site Support: Provide ongoing technical support for customer big data clusters and installations
- Installation & Upgrades: Lead and support installation/upgrade projects both remotely (back office support) and on-premises at customer sites
- Travel Requirements: Willingness to travel to customer sites as needed for installations, upgrades, and critical issue resolution
- Emergency Response: Available for weekend/after-hours support during customer site crises
- Installation Code Development: Develop, maintain, and fix installation automation scripts and tools
- CI/CD Pipeline Management: Build and maintain continuous integration/deployment pipelines
- Monitoring Solutions: Develop and enhance monitoring tools for big data infrastructure
Required Technical Skills
Core Infrastructure & Automation
- Ansible: Advanced proficiency in playbook development and infrastructure automation
- Bash: Strong shell scripting capabilities for system administration and deployment
- Python: Solid programming skills for automation tools and utilities development
- RedHat Linux : Deep knowledge of RedHat Enterprise Linux distribution, system administration, and package management
CI/CD & Virtualization
- Jenkins : Experience building and maintaining CI/CD pipelines
- VMware vSphere: Virtual infrastructure management and deployment
Big Data Ecosystem (Required)
- Apache Spark: Cluster configuration, tuning, and troubleshooting
- YARN: Resource management and cluster administration
- HDFS: Distributed file system management and optimization
- Apache Kafka: Streaming platform deployment and maintenance
- Apache ZooKeeper: Coordination service configuration and management
Preferred (Bonus) Skills
- Kubernetes: Container orchestration and cloud-native deployments
- Vanilla, Open Shift, RKE
- Harbor: Container registry management
- Longhorn: Distributed storage solutions