Develop and Maintain Streaming Infrastructure: Design and support a global streaming platform, creating software libraries that streamline development and ensure reliability and observability for thousands of services.
Automate Deployment and Operation: Oversee the automated deployment and operation of numerous Kafka and RabbitMQ clusters, adhering to various deployment constraints to support our backend services .
Monitor and Support Production Systems: Manage global Kafka clusters, provide on-call production support, and assist in troubleshooting and incident resolution.
Improve Infrastructure Observability: Develop monitoring tools, enhance alerting systems, and help developers better monitor and understand their streaming applications.
Optimize System Performance: Conduct research, build proofs of concept, and perform benchmark tests to identify and implement technology improvements.
Provide Developer Support and Training: Guide developers on best practices, create documentation and conduct workshops to ensure efficient use of the infrastructure.
Requirements
Extensive Experience: Minimum of 8 years in software engineering, with a focus on streaming services, backend infrastructure, and large-scale distributed systems.
Expertise in Kafka: Extensive knowledge of Kafka internals - including the Kafka protocol, consumer and producer mechanisms, and the Kafka ecosystem. Deep understanding and hands-on experience with Kafka, including deployment, maintenance, and troubleshooting. Working with Confluent-Cloud and with Confluent for Kubernetes is a plus.
Proficiency in RabbitMQ (Optional): Experience with RabbitMQ, including its deployment and maintenance, is a plus.
Strong Programming Skills: Proficiency in languages such as Java, Python, or Scala, with experience in developing and maintaining software libraries.
Infrastructure Management: Demonstrated ability to design, build, and maintain scalable infrastructure, including automated deployment and operations.
Containerization and Orchestration: Extensive knowledge and hands-on experience with Docker and Kubernetes for containerization and orchestration.
Observability and Monitoring: Expertise in enhancing system observability, creating monitoring tools, and implementing alerting and self-healing mechanisms.
Problem-Solving Abilities: Proven track record in troubleshooting, incident resolution, and optimizing system performance through research and benchmarking.
Ownership and Accountability: A strong sense of ownership and accountability, ensuring the highest quality and performance in all aspects of the work.
Collaboration and Communication: Excellent communication skills and experience in supporting and training developers, conducting peer reviews, and creating documentation and best practices.
Bonus Qualities
Mentorship and Leadership : Proven ability to mentor and lead teams, fostering a collaborative and growth-oriented environment.
Open Source Contributions : Active participation in contributing to open source libraries, showcasing a commitment to community and innovation.
Community Involvement : Engagement in related software communities, such as forums, user groups, or special interest groups.
Public Speaking : Experience presenting sessions or workshops at industry summits, conferences, or meetups, demonstrating thought leadership and expertise.
Published Works : Authoring articles, blogs, or papers on relevant topics, contributing to the broader knowledge base of the field.