

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Modeling and analysis of graphics and / or SOC algorithms and features
Work in a matrixed environment, across the different modelling teams, to document, design, develop tools to analyze and simulate, validate, and verify models
Familiarize with the different models (functional and performance) that are used at Nvidia and help with feature implementation as required.
Develop tests, test plans, and testing infrastructure for newarchitectures/features.
Mentor younger engineers and help sustain good coding practices.
Learn about newer modelling techniques and frameworks, evaluate the best solution for our needs and work with your manager to drive the change
Help develop AI based tools to increase efficiency.
What we need to see:
Bachelors Degree (or equivalent experience) in a relevant discipline (Computer Science, Electrical Engineering or Computer Engineering)
8+ years of relevant work experience or MS with 5+ years of experience or PhD with 2+ years of experience
Strong programming ability: C++, C along with a good understanding of build systems (CMAKE, make), toolchains (GCC, MSVC) and libraries (STL, BOOST)
Computer Architecture background with experience in performance modeling with C++ and SystemC preferred
Familiarity with Docker, Jenkins, Python, Perl
Excellent communication and interpersonal skills and ability to work in a distributed team environment.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases for the product. Develop workloads and test suites targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech, or equivalent experience in relevant area, PhD is a plus.
3+ years of relevant experience dealing with system level architecture and performance issues.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, CPU architecture, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Solid programming (C/C++) and scripting (Bash/Perl/Python) skills. Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use of RTL dumps to debug failures.
Exposure to performance simulators, cycle accurate/approximate models or emulators for pre-silicon performance analysis is a plus.
Excellent communication and organization skills.
Ability to work in a global team environment.
Ways to stand out from the crowd:
Strong background in System Level Performance aspects for Graphics and High Performance Computing.
Exposure to GPU application programming interfaces like CUDA, OpenGL, DirectX.
Expertise in data analysis and visualization.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Contribute to advancing GPU Streaming Multiprocessor (SM) Architecture, Simulators, compilers and testing infrastructure.
Understand and model in architecture compiler, features for AI and graphics GPU SM.
Develop test plans and testing infrastructure for next generation GPU SM.
Develop tools to validate performance models and verify functional models.
What we need to see:
A Masters/Bachelor’s degree in CS or Math, with 3+ years of relevant experience in Compilers, Parallel programming, computer architecture or related field.
Strong programming ability in C, C++, Python.
Strong understanding of Compiler and data structure is a must.
Ability to work with teams spanning both hardware and software boundaries.
Ways to stand out from the crowd:
Experience in GPU compilers and architectural exploration.
Solid programming skills and CPU core architecture background.
Understanding of ISA of CPU (GPU ISA will be a big plus)
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
System level performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of abstraction, including performance models, RTL test benches ,emulators and silicon to analyze performance and find performance bottlenecks in the system.
Understand key performance use-cases of the product. Develop workloads and test suits targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Develop required infrastructure including performance models, testbench components, performance analysis and visualization tools.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
What we need to see:
BE/BTech, or MS/MTech in relevant area, PhD is a plus, or equivalent experience.
3+ years of experience with exposure to performance analysis and complex system on chip and/or GPU architectures.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Expert hands on competence in programming (C/C++) and scripting (Perl/Python). Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use for RTL dumps to debug failures.
Hands on experience developing performance simulators, cycle accurate/approximate models for pre-silicon performance analysis is a strong plus.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators, to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases or the product. Develop workloads and test suits targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech in relevant area, PhD is a plus.
2+ years of experience with exposure to performance analysis and complex system on chip and/or GPU architectures.
Demonstrated history of technical leadership.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Expert hands on competence in programming (C/C++) and scripting (Perl/Python). Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use for rtl dumps to debug failures.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Understand various HW features related to performance, stability of various platforms and their compatibility with our GPU’s.
Take part in validation of GPU specific low power features specific on these platforms, involving work on pioneering GPU’s for mobile, desktop and server class configurations during the validation phase.
Drive the debug of Silicon, Board or Software issues involving many multi-functional teams across the globe.
Develop new methodologies to improve the silicon validation process and take it to the next level!
Engage in cross-team collaborations to ensure our solutions meet the highest standards and compete in the global market.
What we need to see:
BTech/BE or MTech/ME degree in Electronics or equivalent experience.
3+ Years of experience in related field
Good knowledge in board and system design considerations
An understanding of PC architecture and various commonly used buses.
Familiarity with scripting languages like perl and/or python.
Must be a standout colleague and ready to work with global teams from diverse cultural backgrounds.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך