

Share
What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
These jobs might be a good fit

Share
What you'll be doing:
Modeling and analysis of graphics and / or SOC algorithms and features
Work in a matrixed environment, across the different modelling teams, to document, design, develop tools to analyze and simulate, validate, and verify models
Familiarize with the different models (functional and performance) that are used at Nvidia and help with feature implementation as required.
Develop tests, test plans, and testing infrastructure for newarchitectures/features.
Mentor younger engineers and help sustain good coding practices.
Learn about newer modelling techniques and frameworks, evaluate the best solution for our needs and work with your manager to drive the change
Help develop AI based tools to increase efficiency.
What we need to see:
Bachelors Degree (or equivalent experience) in a relevant discipline (Computer Science, Electrical Engineering or Computer Engineering)
8+ years of relevant work experience or MS with 5+ years of experience or PhD with 2+ years of experience
Strong programming ability: C++, C along with a good understanding of build systems (CMAKE, make), toolchains (GCC, MSVC) and libraries (STL, BOOST)
Computer Architecture background with experience in performance modeling with C++ and SystemC preferred
Familiarity with Docker, Jenkins, Python, Perl
Excellent communication and interpersonal skills and ability to work in a distributed team environment.
These jobs might be a good fit

Share
What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases for the product. Develop workloads and test suites targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech, or equivalent experience in relevant area, PhD is a plus.
3+ years of relevant experience dealing with system level architecture and performance issues.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, CPU architecture, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Solid programming (C/C++) and scripting (Bash/Perl/Python) skills. Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use of RTL dumps to debug failures.
Exposure to performance simulators, cycle accurate/approximate models or emulators for pre-silicon performance analysis is a plus.
Excellent communication and organization skills.
Ability to work in a global team environment.
Ways to stand out from the crowd:
Strong background in System Level Performance aspects for Graphics and High Performance Computing.
Exposure to GPU application programming interfaces like CUDA, OpenGL, DirectX.
Expertise in data analysis and visualization.
These jobs might be a good fit

Share
What you will be doing:
Contribute to advancing GPU Streaming Multiprocessor (SM) Architecture, Simulators, compilers and testing infrastructure.
Understand and model in architecture compiler, features for AI and graphics GPU SM.
Develop test plans and testing infrastructure for next generation GPU SM.
Develop tools to validate performance models and verify functional models.
What we need to see:
A Masters/Bachelor’s degree in CS or Math, with 3+ years of relevant experience in Compilers, Parallel programming, computer architecture or related field.
Strong programming ability in C, C++, Python.
Strong understanding of Compiler and data structure is a must.
Ability to work with teams spanning both hardware and software boundaries.
Ways to stand out from the crowd:
Experience in GPU compilers and architectural exploration.
Solid programming skills and CPU core architecture background.
Understanding of ISA of CPU (GPU ISA will be a big plus)
These jobs might be a good fit

Share
What you'll be doing:
System level performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of abstraction, including performance models, RTL test benches ,emulators and silicon to analyze performance and find performance bottlenecks in the system.
Understand key performance use-cases of the product. Develop workloads and test suits targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Develop required infrastructure including performance models, testbench components, performance analysis and visualization tools.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
What we need to see:
BE/BTech, or MS/MTech in relevant area, PhD is a plus, or equivalent experience.
3+ years of experience with exposure to performance analysis and complex system on chip and/or GPU architectures.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Expert hands on competence in programming (C/C++) and scripting (Perl/Python). Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use for RTL dumps to debug failures.
Hands on experience developing performance simulators, cycle accurate/approximate models for pre-silicon performance analysis is a strong plus.
These jobs might be a good fit

Share
What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators, to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases or the product. Develop workloads and test suits targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech in relevant area, PhD is a plus.
2+ years of experience with exposure to performance analysis and complex system on chip and/or GPU architectures.
Demonstrated history of technical leadership.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Expert hands on competence in programming (C/C++) and scripting (Perl/Python). Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use for rtl dumps to debug failures.
These jobs might be a good fit

Share
What you'll be doing:
Architecting the memory system for next-gen GPUs to support next-generation computing applications.
Develop detailed performance models and craft creative workloads to bring out the best of memory systemcomponents.
Analyzing and optimizing memory performance, ensuring our solutions are highly efficient.
Ensure that the hardware design perfectly meets the requirements set by high-impact features that span multiple units.
Collaborating closely with cross-functional teams to ensure flawless integration of memory systems into broader GPU architectures.
Leading and mentoring a team of engineers, encouraging an inclusive and collaborative environment that drives outstanding results.
What we need to see:
A Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field.
At least 2 years of experience in memory system architecture or related areas.
Proven track record of crafting and implementing high-performance memory systems.
Strong analytical and problem-solving skills demonstrated through attention to detail.
Excellent teamwork and communication skills, with a determination to drive projects to successful completion.
These jobs might be a good fit

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
These jobs might be a good fit