Job Responsibilities:
- Design and implement scalable data processing pipelines using Apache Kafka, Apache Spark, and Structured Streaming.
- Develop and maintain Java applications for data ingestion, transformation, and storage.
- Integrate data processing solutions with AWS services such as Apache Kafka/Amazon MSK, Amazon S3, AWS Lambda, and Amazon EMR.
- Implement real-time data processing solutions to handle large volumes of data efficiently.
- Develop solutions for data enrichment and transformation to create meaningful insights.
- Optimize data processing pipelines for performance and scalability.
- Monitor and troubleshoot performance issues in Kafka and Spark applications.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
- Ensure data processing solutions adhere to security and compliance standards.
- Document data processing workflows, architecture, and bestpractices.
Required Qualifications, Capabilities, and Skills:
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Proven expertise in Java, Kafka, Spark, Structured Streaming, and Spark SQL.
- Strong experience with AWS services and cloud-based architectures (Lambdas, EC2, S3, Glue, EKS etc.)
- Hands-on practical experience in system design, application development, testing, and operational stability.
- Experience in developing, debugging, and maintaining code in a large corporate environment with one or more modern programming languages and database querying languages.
- Proficiency in designing and implementing real-time data processing solutions.
- Experience with data enrichment, transformation, and optimization techniques.
- Excellent problem-solving skills and attention to detail.
- Able to mentor the junior developers of the team.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
Preferred Qualifications, Capabilities, and Skills:
- Experience with Python/shell scripting and working in a Linux environment.
- Experience building distributed systems at Internet scale.