Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

JPMorgan Senior Lead Software Engineer- AI Platform engineer 
United States, California, Palo Alto 
391301167

01.04.2025

Required qualifications, capabilities, and skills

  • Provide technical guidance and direction to support business objectives, collaborating with technical teams, contractors, and vendors.
  • Develop secure, high-quality production code, and review and debug code written by others.
  • Influence product design, application functionality, and technical operations through informed decision-making.
  • Advocate for firmwide frameworks, tools, and practices within the Software Development Life Cycle.
  • Promote a culture of diversity, equity, inclusion, and respect within the team.
  • Architect and deploy secure, scalable cloud infrastructure platforms optimized for AI and machine learning workloads.
  • Collaborate with AI teams to translate computational needs into infrastructure requirements.
  • Monitor, manage, and optimize cloud resources for performance and cost efficiency.
  • Design and implement continuous integration and delivery pipelines for machine learning workloads.
  • Develop automation scripts and infrastructure as code to streamline deployment and management tasks.

Required Qualifications:

  • Formal training or certification in software engineering concepts with 5+ years of applied experience.
  • Hands-on experience in system design, application development, testing, and operational stability.
  • Proficiency in programming languages such as Python and/or Golang.
  • Ability to independently tackle design and functionality problems with minimal oversight.
  • Background in Computer Science, Computer Engineering, Mathematics, or a related technical field.
  • Strong knowledge of cloud computing delivery models (IaaS, PaaS, SaaS) and deployment models (Public, Private, Hybrid Cloud).
  • Proficiency in Linux environments, including scripting and administration.
  • Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
  • Experience in solutions design and engineering, containerization (Docker, Kubernetes), and cloud service providers (AWS, Azure, GCP).
  • Experience with Infrastructure as Code (Terraform, CloudFormation) and automation tools (Ansible, Chef, Puppet).
  • Deep understanding of cloud component architecture: Microservices, Containers, IaaS, Storage, Security, and routing/switching technologies.

Preferred Qualifications:

  • Foundational understanding of NVIDIA GPU Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference).
  • Hands-on experience with ML frameworks such as PyTorch, TensorBoard.
  • Experience with observability tools like Prometheus, Grafana.
  • Experience in ML Ops and associated tooling like MLflow.
  • Experience with High Performance Computing and Machine Learning frameworks such as vLLM, Ray.io, Slurm.
  • Strong background in network architecture, database programming (SQL/NoSQL), and data modeling.
  • Familiarity with cloud data services and big data processing tools.