Job responsibilities
- Regularly provides technical guidance and direction to support the business and its technical teams, contractors, and vendors
- Develops secure and high-quality production code, and reviews and debugs code written by others
- Drives decisions that influence the product design, application functionality, and technical operations and processes
- Actively contributes to the engineering community as an advocate of firmwide frameworks, tools, and practices of the Software Development Life Cycle
- Develop the reliability process to ensure the highest level of systems availability, stability, security and performance, including maintenance and support, root cause analysis, systems validation, performance tuning and capacity management.
- Work closely with Engineering and Product teams to identify and implement solutions that meet production, development, and administrative needs in Linux-based environments
- Identify changes for the product architecture from the reliability, performance, and availability perspective with a data driven approach
- Own and suggest improvements to our deployment process and tools
- Drive technical innovation and efficiency in application and infrastructure operations via simplification and automation.
- Coordinate between infrastructure, platform and application subject matter experts to promote reliability efforts through communication and best practice sharing.
Required qualifications, capabilities, and skills
- Formal training or certification on engineering concepts and 5+ years applied experience
- Advanced experience in writing/reading C++ and Python code
- Proficient in design and functionality problems independently with little to no oversight; Expertise in taking a task from gathering requirements to deployment and maintenance
- Proficient with configuration management and build tools and continuous integration environments such as Jenkins, Weave, AIM
- Experience championing new technologies/processes, architected their design, implementation and delivery to production
- Expertise in implementing and managing high-availability infrastructure solutions with automatic failover
- Identify problem or opportunity areas, develop and implement fixes and changes as necessary
- Collaborate effectively with various teams such as Site Reliability Engineering, Product Management, DevOps, and Leadership
- Experience with mentoring and guiding fellow team members effectively and constructively in a wide variety of disciplines, both technical and business-like in nature
- Demonstrated proficiency in application development and support for associated infrastructure and experience primarily Trading based application (Linux based)
- Expertise in designing, building, and troubleshooting large scale distributed systems
Preferred qualifications, capabilities, and skills
- Strong interest in Site Reliability Engineering topics like SLOs, resilience, scaling, performance, automation and more
- Knowledge of Electronic Trading and Equities, Futures, and Options Exchange Connectivity in global markets is desirable