Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Architect AI Networking 
Israel, Tel Aviv District, Tel Aviv-Yafo 
719479266

Today
Israel, Tel Aviv
Israel, Yokneam
time type
Full time
posted on
Posted 6 Days Ago
job requisition id

You’ll help define how AI models are deployed and scaled in production, driving decisions on everything from memory orchestration and compute scheduling to inter-node communication and system-level optimizations. This is an opportunity to work with top engineers, researchers, and partners across NVIDIA and leave a mark on the way generative AI reaches real-world applications.

What You’ll Be Doing:

  • Design and evolve scalable architectures for multi-node LLM inference across GPU clusters.

  • Develop infrastructure tooptimize latency,throughput, and cost-efficiency of serving large models in production.

  • Collaborate with model, systems, compiler, and networking teams to ensure holistic, high-performance solutions.

  • Prototype novel approaches to KV cachehandling, tensor/pipelineparallel execution, and dynamic batching.

  • Evaluate and integrate new software and hardware technologies relevant to Core Spectrum-X technologies, such as load balancing, telemetry, congestion control, vertical application integration.

  • Work closely with internal teams and external partners to translate high-level architecture into reliable, high-performance systems.

  • Author design documents, internal specs, and technical blog posts and contribute to open-source efforts when appropriate.

What We Need to See:

  • Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or equivalent experience.

  • 8+ years of experiencebuilding large-scaledistributed systems or performance-critical software.

  • Deep understanding of deep learning systems, GPU acceleration, and AI model execution flows and/or high performance networking.

  • Solid software engineering skills in C++ and/or Python, preferably demonstrate strong familiarity with CUDA or similar platforms.

  • Strong system-level thinking across memory, networking, scheduling, and compute orchestration.

  • Excellent communication skills and ability to collaborate across diverse technical domains.

Ways to Stand Out from the Crowd:

  • Experience working on LLM - training or inference pipelines, transformer model optimization,or model-paralleldeployments.

  • Demonstrated success in profiling and optimizing performance bottlenecks across the LLM training or inference stack.

  • AI Accelerators and distributed communication patterns, congestion control and/or load balancing.

  • Proven optimization process for complex systems, deployed at scale to make impact.

  • Passion for solving tough technical problems and shipping high-impact solutions.