With a growing industry focus around AI/ML, you will get to combineyour Kubernetes expertise with a solid understanding of machine learning and related frameworks. You will use your engineering skills and experience to deliver AI/ML based solutions which improve the operation of Kubernetes clusters and provide meaningful platform insight.
Job responsibilities
- Develops secure and high-quality production code, and reviews and debugs code written by others.
- Automating the installation, upgrade, scaling, and management of a large and rapidly growing fleet of Kubernetes clusters. Develop custom platform control plane webhooks, CRDs and operators and more that provide a secure opinionated platform.
- Deliver AI/ML based solutions to improve Kubernetes platforms for JPMC.
- Automating Infrastructure-as-code, using Terraform, VRA, ansible and other infrastructure automation tools. Automating networking and O/S configuration from building images, configuring file system layouts, and automating BGP peering up through CRI, CNI & CSI configuration for Kubernetes clusters
- Produces architecture and design artifacts for complex applications while being accountable for ensuring design constraints are met by software code development.
- Drives decisions that influence the product design, application functionality, and technical operations and processes.
- Regularly provides technical guidance and direction to support the business and its technical teams, contractors, and vendors.
- Adds to the team culture of diversity, equity, inclusion, and respect.
Required qualifications, capabilities, and skills
- Formal training or certification in Computer Science/ Software engineering and 5+ years of applied experience.
- Hands-on practical experience delivering system design, application development, testing, and operational stability.
- Proficiency in multiple modern programming languages, ideally including one of Go, Python, React, Java.
- Experience applying data engineering and machine learning techniques.
- Experience in Generative AI, LLMs and AI Agents
- Experience in one or more major machine learning frameworks: Tensorflow, Pytorch, Scikit-Learn
- Experience working with modern private & public cloud infrastructure platforms - Kubernetes, AWS EKS Terraform, Ansible and other automation tools.
- Experience in developing, debugging, and maintaining code in a large corporate environment with one or more modern programming languages and database querying languages.
- Ability to tackle design and functionality problems independently with little to no oversight.
- Practical cloud native experience
Preferred qualifications, capabilities, and skills
- Experience in multiple Cloud platforms like GCP GKE is a plus
- Knowledge of deep learning and ML Ops workflows
- Experience in Time Series analysis, including forecasting, anomaly detection, and trend analysis
- Knowledge of industry-wide technology trends and best practices