Job responsibilities
- Writes high-quality, maintainable, and well-tested software to develop reliable and repeatable solutions to complex problems
- Collaborates with product development teams to design, implement, and manage CI/CD pipelines to support reliable, scalable, and efficient software delivery
- Partners with product development teams to capture and define meaningful service level indicators (SLIs) and service level objectives (SLOs)
- Develops and maintains monitoring, alerting, and tracing systems that provide comprehensive visibility into system health and performance
- Participates in design reviews to evaluate and strengthen architectural resilience, fault tolerance, and scalability
- Upholds incident response management best practices, champions blameless postmortems, and continuous improvements
- Debugs, tracks, and resolves complex technical issues to maintain system integrity and performance
- Champions and drives the adoption of reliability, resiliency and software engineering best practices, leading by example
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Experience building production-grade software in at least one programming language such as Java, Python, or Go
- Experience in writing unit tests and integration tests using tools such as JUnit, Mockito, or PyTest
- Experience working with relational and NoSQL databases
- Proficient knowledge of distributed systems, and reliability patterns for achieving redundancy, fault tolerance, and graceful degradation
- Proficient knowledge of networking concepts, including TCP/IP, routing, firewalls, and DNS
- Proficient knowledge of Unix/Linux, including performance tuning, process and memory management, and filesystem operations
- Familiarity with DevOps practices and tools for continuous integration and continuous deployment (CI/CD), such as Github Actions, Docker, Kubernetes, GIT, branching strategies and SemVer
- Experience analyzing, troubleshooting, and supporting large-scale systems
Preferred qualifications, capabilities, and skills
- Proficient knowledge of Java with a strong understanding of the Java programming language, including object-oriented programming concepts, and the JVM
- Familiarity with observability and monitoring tools such as Prometheus, Grafana, or OpenTelemtry
- Familiarity with cloud technologies such as AWS or GCP, including deployment, management, and optimization of cloud-based applications
- Experience working in the financial/fintech industry