

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Modeling and analysis of graphics and / or SOC algorithms and features
Work in a matrixed environment, across the different modelling teams, to document, design, develop tools to analyze and simulate, validate, and verify models
Familiarize with the different models (functional and performance) that are used at Nvidia and help with feature implementation as required.
Develop tests, test plans, and testing infrastructure for newarchitectures/features.
Mentor younger engineers and help sustain good coding practices.
Learn about newer modelling techniques and frameworks, evaluate the best solution for our needs and work with your manager to drive the change
Help develop AI based tools to increase efficiency.
What we need to see:
Bachelors Degree (or equivalent experience) in a relevant discipline (Computer Science, Electrical Engineering or Computer Engineering)
8+ years of relevant work experience or MS with 5+ years of experience or PhD with 2+ years of experience
Strong programming ability: C++, C along with a good understanding of build systems (CMAKE, make), toolchains (GCC, MSVC) and libraries (STL, BOOST)
Computer Architecture background with experience in performance modeling with C++ and SystemC preferred
Familiarity with Docker, Jenkins, Python, Perl
Excellent communication and interpersonal skills and ability to work in a distributed team environment.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Lead design and development of NVIDIA’s Assembler and Disassembler for GPU compute.
Work on binary analysis & instrumentation features like call graphs generation, program register usage and patching of GPU binaries
Work with GPU architecture and debugger/profiler development teams to understand their requirements and deliver new features & product improvements.
Collaborate closely with teams developing other related components to ensure compatibility, reliability, and high-quality code generation
Working with customers/partners to collect feedback and drive innovative ideas and features to incorporate into the product
What we need to see:
BS or MS degree in Computer Science, Computer Engineering, or related fields with 5+ years of experience in low-level system SW development and a minimum of 3 years related to assemblers, binary analysis tools, debuggers
Good analytical and C/C++ programming skills
Experience in any one area of compiler development including feature support, code generation and compiler infrastructure
Understanding of Assembly Language / Processor ISA (GPU ISA not required but a plus)
Knowledge of object file formats such as ELF and debugging formats (DWARF).
Ways to stand out from the crowd:
Understanding of debugger / profiler tools / bintools / Linker internals, experience in binary analysis / instrumentation tools like BOLT etc.
Usage of AI tools in everyday work like Cursor, Windsurf etc.
Knowledge of GPU development and compute APIs such as CUDA and OpenCL
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Enable NVIDIA Cumulus Linux on next generation ASICs.
Define, design and develop features for NVIDIA Cumulus Linux.
Sustain the existing deployments of NVIDIA Cumulus Linux.
Working closely with customers to understand the pain points, new use cases, deployment strategies and come up with innovative solutions.
Translating requirements to the SDK and ASIC Engineers for enabling end-to-end solutions.
What we need to see:
Strong knowledge of forwarding path for L2 and L3 including concepts like ECMP etc.
Strong and proven experience in C and Python programming.
Worked with VxLAN and EVPN routing protocols.
Strong knowledge in areas of QoS, ACLs and VxLAN. And working knowledge of hardware resource management (tables, TCAMs, etc).
Battle scars from troubleshooting production network deployments.
BS or MS degree in Computer Engineering, Computer Science, or related degree, or equivalent experience.
5+ years of hands on experience.
Ways to stand out from the crowd:
Experience with Merchant Silicon for Switching/Routing.
Contributions to SONiC, SwitchDev or Switch Abstraction Interface (SAI) projects.
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases for the product. Develop workloads and test suites targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech, or equivalent experience in relevant area, PhD is a plus.
3+ years of relevant experience dealing with system level architecture and performance issues.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, CPU architecture, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Solid programming (C/C++) and scripting (Bash/Perl/Python) skills. Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use of RTL dumps to debug failures.
Exposure to performance simulators, cycle accurate/approximate models or emulators for pre-silicon performance analysis is a plus.
Excellent communication and organization skills.
Ability to work in a global team environment.
Ways to stand out from the crowd:
Strong background in System Level Performance aspects for Graphics and High Performance Computing.
Exposure to GPU application programming interfaces like CUDA, OpenGL, DirectX.
Expertise in data analysis and visualization.
משרות נוספות שיכולות לעניין אותך

In this position, you will be encouraged to make architectural trade-offs based on features, performance requirements and system limitations, come up with micro-architecture, implement in RTL, and deliver a fully verified, synthesis/timing clean design. You will work with architects, other designers, pre- and post-silicon verification teams, synthesis, timing and backend teams to accomplish your tasks.
doing:
Own micro-architecture and RTL development of design modules.
Micro-architect features to meet performance, power and area requirements.
Work with HW architects to define critical features.
Collaborate with verification teams to verify the correctness of implemented features.
Interact with timing, VLSI and Physical design teams to ensure design meets timing, interface requirements and is routable.
Co-operate with FPGA and S/W teams to prototype the design and ensure that S/W is tested.
Work on post-silicon verification and debug.
see:
BS / MS or equivalent experience.
4+ years of design experience.
Experience in micro-architecture and RTL development of complex designs.
Exposure to design and verification tools (VCS or equivalent simulation tools, debug tools like Debussy, GDB).
Deep understanding of ASIC design flow including RTL design, verification, logic synthesis, prototyping,, timing analysis, floor-planning, ECO, bring-up & lab debug.
Expertise in Verilog.
crowd:
Design experience in memory subsystem or network interconnect IP.
Good debugging and problem solving skills.
Scripting knowledge (Python/Perl/shell).
Good interpersonal skills and ability & desire to work as a part of a team.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Contribute to advancing GPU Streaming Multiprocessor (SM) Architecture, Simulators, compilers and testing infrastructure.
Understand and model in architecture compiler, features for AI and graphics GPU SM.
Develop test plans and testing infrastructure for next generation GPU SM.
Develop tools to validate performance models and verify functional models.
What we need to see:
A Masters/Bachelor’s degree in CS or Math, with 3+ years of relevant experience in Compilers, Parallel programming, computer architecture or related field.
Strong programming ability in C, C++, Python.
Strong understanding of Compiler and data structure is a must.
Ability to work with teams spanning both hardware and software boundaries.
Ways to stand out from the crowd:
Experience in GPU compilers and architectural exploration.
Solid programming skills and CPU core architecture background.
Understanding of ISA of CPU (GPU ISA will be a big plus)
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
What we need to see:
BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent area.
8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
Experience with RDMA/RoCE or InfiniBand transport offload architectures.
Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
Strong analytical and system modeling skills (Python, SystemC, or similar).
Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Ways to stand out from the crowd:
Background in system design for AI and HPC.
Experience with NICs or DPU architecture and other transport offload engines.
Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
Hands-on experience with interposer or 2.5D/3D package co-design.
משרות נוספות שיכולות לעניין אותך