Bring your engineering AI/ML skills to the next level where you will achieve state-of-the-art throughput for critical models using advanced techniques like model parallelism and distributed training. You will reduce inference time for new model architectures using optimizations like quantization and pruning. You will collaborate closely with Applied AI engineering to optimize the internal inference stack, leveraging technologies like TensorRT, ONNX, etc. In this role you will assist with interviewing top tier talent as well as acting as a mentor to AI systems Engineers, while fostering a culture of continuous learning and innovation. You will coordinate the inference needs of JPMorgan Chase’s research teams, ensuring alignment with business goals.
As a Senior Lead Software - Machine Learning Engineer at JPMorgan Chase within the Corporate Sector, specifically the AIML Technology division, you will play a crucial role in an agile team. Your responsibilities will include enhancing, developing, and delivering a company-wide AI/ML/Data Platform in a secure, stable, and scalable manner. As a technical contributor, your duties will encompass the architecture, design, and construction of AI/ML related capabilities using cloud technology. You will have the opportunity to work with both traditional AI/ML and Generative AI.
Job responsibilities
- Architects and implements distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage.
- Develops advanced monitoring and management tools for high reliability and scalability.
- Optimizes system performance by identifying and resolving inefficiencies and bottlenecks.
- Collaborates with product teams to deliver tailored, technology-driven solutions.
- Drives the adoption and execution of ML Platform tools across various teams.
- Integrates Generative AI within the ML Platform using state-of-the-art techniques.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience. In addition, 2 + years of experience leading technologists to manage and solve complex technical items within your domain of expertise
- Extensive hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn).
- Extensive experience with a Public Cloud provider (AWS, Azure, GCP) and addressing non-functional requirements such as scalability and cross-region resiliency.
- Strong coding skills and experience in developing large-scale ML systems and ensuring Software Best Practices.
- Experience with prompt engineering and interacting with various LLM vendors and models.
- Proven track record in contributing to and optimizing open-source ML frameworks.
- Strategic thinker with the ability to craft and drive a technical vision for maximum business impact.
- Demonstrated leadership in working effectively with engineers, data scientists, and ML practitioners.
- Proven ability to identify trade-offs, clarify project ambiguities, and drive decision-making.
Preferred qualifications, capabilities, and skills
- Expertise in Kubernetes ecosystem, including EKS, Helm, and custom operators.
- Background in High Performance Computing, ML Hardware Acceleration (e.g., GPU, TPU, RDMA), or ML for Systems.