

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Modeling and analysis of graphics and / or SOC algorithms and features
Work in a matrixed environment, across the different modelling teams, to document, design, develop tools to analyze and simulate, validate, and verify models
Familiarize with the different models (functional and performance) that are used at Nvidia and help with feature implementation as required.
Develop tests, test plans, and testing infrastructure for newarchitectures/features.
Mentor younger engineers and help sustain good coding practices.
Learn about newer modelling techniques and frameworks, evaluate the best solution for our needs and work with your manager to drive the change
Help develop AI based tools to increase efficiency.
What we need to see:
Bachelors Degree (or equivalent experience) in a relevant discipline (Computer Science, Electrical Engineering or Computer Engineering)
8+ years of relevant work experience or MS with 5+ years of experience or PhD with 2+ years of experience
Strong programming ability: C++, C along with a good understanding of build systems (CMAKE, make), toolchains (GCC, MSVC) and libraries (STL, BOOST)
Computer Architecture background with experience in performance modeling with C++ and SystemC preferred
Familiarity with Docker, Jenkins, Python, Perl
Excellent communication and interpersonal skills and ability to work in a distributed team environment.
משרות נוספות שיכולות לעניין אותך

What will you be doing:
Responsibilities will include development of test plans and strategies, develop simulation environments, system bring-up, validation, and automation to deliver best-in-class CPUs.
Develop and maintain CPU simulator infrastructure, hardware CPU test and performance infrastructure.
Analyze and validate CPU and fabric performance, helping to understand current, and guide the development of future CPU products.
Definition and development of tool chain and workflows that enables the full system performance alignment.
Silicon based competitive analysis of NVIDIA CPUs.
What we need to see:
Master's or Bachelor's degree in EE/CS or equivalent experience
5+ years of experience preferably in the areas of CPU / SOC Performance Verification and Analysis
Strong understanding of computer system architecture and operating system fundamentals.
Hands-on experience with HDLs such as Verilog / System Verilog.
Knowledge of verification methodologies and tools for IP and SoC level verification.
Experience with System Verilog, C/C++, Python languages and relevant frameworks.
Background with debug on Silicon.
Ways to stand out from the crowd:
Detailed knowledge of the ARM and/or x-86 architecture.
Prior experience with performance analysis of CPUs.
Experience with analysis and characterization of CPU workloads.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of important applications and services;
Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective;
Develop ideas on how to improve the end to end system performance and usability by driving changes in the HW or SW (or both).
Collaborate with AI researchers, developers, and application service providers to understand internal developer and external customer pain points, requirements, project future needs and share best practice.
Develop the necessary modeling framework and the TCO (total cost of ownership) analysis to enable efficient exploration and sweep of the architecture and design space
Develop the methodology needed to drive the engineering analysis to Inform the architecture, design and roadmap of DGX Cloud
What we need to see:
Expertise in working with large scale parallel and distributed accelerator-based system systems
Expertise optimizing performance and AI workloads on large scale systems
Experience with performance modeling and benchmarking at scale
Strong background in Computer Architecture, Networking, Storage systems, Accelerators
Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others
Experience with AI/ML models and workloads, in particular LLMs as well as an understanding of DNNs and their use in emerging AI/ML applications and services
Bachelors/Masters in Engineering or equivalent experience (preferably, Electrical Engineering, Computer Engineering, or Computer Science)
10 years experience in the above areas
Proficiency in Python, C/C++
Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …);
Ways to stand out from the crowd:
PhD in the relevant areas
Very high intellectual curiosity; Confidence to dig in as needed; Not afraid of confronting complexity; Able to pick up new areas quickly;
Proficiency in CUDA, XLA
Excellent interpersonal skills
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases for the product. Develop workloads and test suites targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech, or equivalent experience in relevant area, PhD is a plus.
3+ years of relevant experience dealing with system level architecture and performance issues.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, CPU architecture, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Solid programming (C/C++) and scripting (Bash/Perl/Python) skills. Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use of RTL dumps to debug failures.
Exposure to performance simulators, cycle accurate/approximate models or emulators for pre-silicon performance analysis is a plus.
Excellent communication and organization skills.
Ability to work in a global team environment.
Ways to stand out from the crowd:
Strong background in System Level Performance aspects for Graphics and High Performance Computing.
Exposure to GPU application programming interfaces like CUDA, OpenGL, DirectX.
Expertise in data analysis and visualization.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Develop AI performance tools for large scale AI systems providing real time insight into applications performance and system bottlenecks.
Conduct in-depth hardware-software performance studies
Define performance and efficiency evaluation methodologies
Automate performance data analysis and visualization to convert profiling data into actionable optimizations
Support deep learning software engineers and GPU architects in their performance analysis efforts
Work with various teams at NVIDIA to incorporate and influence the latest technologies for GPU performance analysis
What we need to see:
Minimum of 8+ years of experience insoftware infrastructure and tools
BS or higher degree in computer science or similar (or equivalent experience)
Adept programming skills in multiple languages including C++ and Python
Solid foundation in operating systems and computer architecture
Outstanding ability to understand users, prioritize among many contending requests, and build consensus
Passion for “it just works” automation, eliminating repetitive tasks, and enabling team members
Ways to stand out from the crowd:
Experience in working with the large scale AI cluster
Experience with CUDA and GPU computing systems
Hands-on experience with deep learning frameworks (TensorFlow, PyTorch, JAX/XLA etc.)
Deep understanding of the software performance analysis and optimization process
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Contribute to advancing GPU Streaming Multiprocessor (SM) Architecture, Simulators, compilers and testing infrastructure.
Understand and model in architecture compiler, features for AI and graphics GPU SM.
Develop test plans and testing infrastructure for next generation GPU SM.
Develop tools to validate performance models and verify functional models.
What we need to see:
A Masters/Bachelor’s degree in CS or Math, with 3+ years of relevant experience in Compilers, Parallel programming, computer architecture or related field.
Strong programming ability in C, C++, Python.
Strong understanding of Compiler and data structure is a must.
Ability to work with teams spanning both hardware and software boundaries.
Ways to stand out from the crowd:
Experience in GPU compilers and architectural exploration.
Solid programming skills and CPU core architecture background.
Understanding of ISA of CPU (GPU ISA will be a big plus)
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך