Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

JPMorgan Lead Software Engineer-AI Platform Engineer 
United States, California, Palo Alto 
178599868

01.04.2025

Job Responsibilities:

  • Execute creative software solutions, including design, development, and technical troubleshooting, with the ability to think beyond conventional approaches to build solutions or resolve technical problems.
  • Develop secure, high-quality production code, and review and debug code written by others.
  • Identify opportunities to eliminate or automate the remediation of recurring issues to enhance the overall operational stability of software applications and systems.
  • Lead evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented assessments of architectural designs, technical credentials, and their applicability within existing systems and information architecture.
  • Lead communities of practice across Software Engineering to promote awareness and adoption of new and leading-edge technologies.
  • Contribute to a team culture of diversity, equity, inclusion, and respect.
  • Develop and deploy cloud infrastructure platforms that are secure, scalable, and optimized for AI and machine learning workloads.
  • Collaborate with AI teams to understand computational needs and translate these into infrastructure requirements.
  • Monitor, manage, and optimize cloud resources to maximize performance and minimize costs.
  • Design and implement continuous integration and delivery pipelines for machine learning workloads.
  • Develop automation scripts and infrastructure as code to streamline deployment and management tasks.

Required Qualifications, Capabilities, and Skills:

  • Formal training or certification in software engineering concepts with 5+ years of applied experience.
  • Hands-on practical experience in delivering system design, application development, testing, and ensuring operational stability.
  • Advanced proficiency in one or more programming languages such as Python and/or Golang.
  • Proficiency in automation and continuous delivery methods.
  • Proficient in all aspects of the Software Development Life Cycle.
  • Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.).
  • Proficiency in Linux environments, including scripting and administration.
  • Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
  • Experience in solutions design and engineering, containerization (Docker, Kubernetes), and cloud service providers (AWS, Azure, GCP).
  • Experience with Infrastructure as Code (Terraform, CloudFormation) and automation tools (Ansible, Chef, Puppet).
  • Deep understanding of cloud component architecture: Microservices, Containers, IaaS, Storage, Security, and routing/switching technologies.
Preferred qualifications, capabilities, and skills
  • Foundational understanding of NVIDIA GPU Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference).
  • Hands-on experience with ML frameworks such as PyTorch, TensorBoard.
  • Experience with observability tools like Prometheus, Grafana.
  • Experience in ML Ops and associated tooling like MLflow.
  • Experience with High Performance Computing and Machine Learning frameworks such as vLLM, Ray.io, Slurm.
  • Strong background in network architecture, database programming (SQL/NoSQL), and data modeling.
  • Familiarity with cloud data services and big data processing tools.