

What will you be doing:
Responsibilities will include development of test plans and strategies, develop simulation environments, system bring-up, validation, and automation to deliver best-in-class CPUs.
Develop and maintain CPU simulator infrastructure, hardware CPU test and performance infrastructure.
Analyze and validate CPU and fabric performance, helping to understand current, and guide the development of future CPU products.
Definition and development of tool chain and workflows that enables the full system performance alignment.
Silicon based competitive analysis of NVIDIA CPUs.
What we need to see:
Master's or Bachelor's degree in EE/CS or equivalent experience
5+ years of experience preferably in the areas of CPU / SOC Performance Verification and Analysis
Strong understanding of computer system architecture and operating system fundamentals.
Hands-on experience with HDLs such as Verilog / System Verilog.
Knowledge of verification methodologies and tools for IP and SoC level verification.
Experience with System Verilog, C/C++, Python languages and relevant frameworks.
Background with debug on Silicon.
Ways to stand out from the crowd:
Detailed knowledge of the ARM and/or x-86 architecture.
Prior experience with performance analysis of CPUs.
Experience with analysis and characterization of CPU workloads.
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of important applications and services;
Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective;
Develop ideas on how to improve the end to end system performance and usability by driving changes in the HW or SW (or both).
Collaborate with AI researchers, developers, and application service providers to understand internal developer and external customer pain points, requirements, project future needs and share best practice.
Develop the necessary modeling framework and the TCO (total cost of ownership) analysis to enable efficient exploration and sweep of the architecture and design space
Develop the methodology needed to drive the engineering analysis to Inform the architecture, design and roadmap of DGX Cloud
What we need to see:
Expertise in working with large scale parallel and distributed accelerator-based system systems
Expertise optimizing performance and AI workloads on large scale systems
Experience with performance modeling and benchmarking at scale
Strong background in Computer Architecture, Networking, Storage systems, Accelerators
Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others
Experience with AI/ML models and workloads, in particular LLMs as well as an understanding of DNNs and their use in emerging AI/ML applications and services
Bachelors/Masters in Engineering or equivalent experience (preferably, Electrical Engineering, Computer Engineering, or Computer Science)
10 years experience in the above areas
Proficiency in Python, C/C++
Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …);
Ways to stand out from the crowd:
PhD in the relevant areas
Very high intellectual curiosity; Confidence to dig in as needed; Not afraid of confronting complexity; Able to pick up new areas quickly;
Proficiency in CUDA, XLA
Excellent interpersonal skills

What you'll be doing:
Performance analysis/ bottleneck analysis of complex, high performance GPUs and System-on-Chips (SoCs).
Work on hardware models of different levels of extraction, including performance models, RTL test benches and emulators to find performance bottlenecks in the system.
Work closely with the architecture and design teams to explore architecture trade-offs related to system performance, area, and power consumption.
Understand key performance usecases for the product. Develop workloads and test suites targeting graphics, machine learning, automotive, video, compute vision applications running on these products.
Drive methodologies for improving turnaround time, finding representative data-sets and enabling performance analysis early in the product development cycle.
Develop required infrastructure including performance simulators, testbench components and analysis tools.
What we need to see:
BE/BTech or MS/MTech, or equivalent experience in relevant area, PhD is a plus.
3+ years of relevant experience dealing with system level architecture and performance issues.
Strong understanding of System-on-Chip (SoC) architecture, graphics pipeline, CPU architecture, memory subsystem architecture and Network-on-Chip (NoC)/Interconnect architecture.
Solid programming (C/C++) and scripting (Bash/Perl/Python) skills. Exposure to Verilog/System Verilog, SystemC/TLM is a strong plus.
Strong debugging and analysis (including data and statistical analysis) skills, including use of RTL dumps to debug failures.
Exposure to performance simulators, cycle accurate/approximate models or emulators for pre-silicon performance analysis is a plus.
Excellent communication and organization skills.
Ability to work in a global team environment.
Ways to stand out from the crowd:
Strong background in System Level Performance aspects for Graphics and High Performance Computing.
Exposure to GPU application programming interfaces like CUDA, OpenGL, DirectX.
Expertise in data analysis and visualization.


We are now looking for passionate, highly motivated and creative individuals to be part of our automotive verification team. As a verification owner, you will work on projects that will define the next generation of automotive chips and systems. You will get firsthand exposure to high performance CPU and Memory sub-systems, NOC based Interconnect Fabric, High speed IO's and many other leading technologies deployed in our Tegra chips.
What you will be doing:
You will be responsible for creation of "state of the art" UVM based verification test benches and methodologies to verify complex IP's and Sub-systems. You will also get to work on System level verification using C/C++. During the course of a project you would end up driving the following aspects of verification for your unit:
Architect the testbenches and craft verification environment using UVM methodology
Define test plans, tests and verification infrastructure for modules, clusters and system
Build efficient and reusable bus functional models, monitors, checkers and scoreboards
Implement functional coverage and own verification closure
Work with architects, designers, FPGA and post-silicon teams to ensure that your unit is robust
What we need to see:
You should be BTech/MTech with 5+ years of experience in verification closure of complex Unit, Sub-system or SOC level verification. If you have experience in at least a few of the following domains, we will have an excellent match for our needs:
CPU verification, Memory controller verification, Interconnect verification
High Speed IO verification (UFS/PCIE/XUSB)
10G/1G Ethernet MAC and Switch
Bus protocols (AXI/APB)
System functions like Safety, Security, Virtualization and sensor processing
Experience in the latest verification methodologies like UVM/VMM
Exposure to industry standard verification tools for simulation and debug is a requirement
Exposure to Formal verification would be excellent
Good debugging and analytical skills.
Good interpersonal skills, ability to work as an excellent teammatewith e

What you'll be doing:
Develop AI performance tools for large scale AI systems providing real time insight into applications performance and system bottlenecks.
Conduct in-depth hardware-software performance studies
Define performance and efficiency evaluation methodologies
Automate performance data analysis and visualization to convert profiling data into actionable optimizations
Support deep learning software engineers and GPU architects in their performance analysis efforts
Work with various teams at NVIDIA to incorporate and influence the latest technologies for GPU performance analysis
What we need to see:
Minimum of 8+ years of experience insoftware infrastructure and tools
BS or higher degree in computer science or similar (or equivalent experience)
Adept programming skills in multiple languages including C++ and Python
Solid foundation in operating systems and computer architecture
Outstanding ability to understand users, prioritize among many contending requests, and build consensus
Passion for “it just works” automation, eliminating repetitive tasks, and enabling team members
Ways to stand out from the crowd:
Experience in working with the large scale AI cluster
Experience with CUDA and GPU computing systems
Hands-on experience with deep learning frameworks (TensorFlow, PyTorch, JAX/XLA etc.)
Deep understanding of the software performance analysis and optimization process

What you will be doing:
Contribute to advancing GPU Streaming Multiprocessor (SM) Architecture, Simulators, compilers and testing infrastructure.
Understand and model in architecture compiler, features for AI and graphics GPU SM.
Develop test plans and testing infrastructure for next generation GPU SM.
Develop tools to validate performance models and verify functional models.
What we need to see:
A Masters/Bachelor’s degree in CS or Math, with 3+ years of relevant experience in Compilers, Parallel programming, computer architecture or related field.
Strong programming ability in C, C++, Python.
Strong understanding of Compiler and data structure is a must.
Ability to work with teams spanning both hardware and software boundaries.
Ways to stand out from the crowd:
Experience in GPU compilers and architectural exploration.
Solid programming skills and CPU core architecture background.
Understanding of ISA of CPU (GPU ISA will be a big plus)

What will you be doing:
Responsibilities will include development of test plans and strategies, develop simulation environments, system bring-up, validation, and automation to deliver best-in-class CPUs.
Develop and maintain CPU simulator infrastructure, hardware CPU test and performance infrastructure.
Analyze and validate CPU and fabric performance, helping to understand current, and guide the development of future CPU products.
Definition and development of tool chain and workflows that enables the full system performance alignment.
Silicon based competitive analysis of NVIDIA CPUs.
What we need to see:
Master's or Bachelor's degree in EE/CS or equivalent experience
5+ years of experience preferably in the areas of CPU / SOC Performance Verification and Analysis
Strong understanding of computer system architecture and operating system fundamentals.
Hands-on experience with HDLs such as Verilog / System Verilog.
Knowledge of verification methodologies and tools for IP and SoC level verification.
Experience with System Verilog, C/C++, Python languages and relevant frameworks.
Background with debug on Silicon.
Ways to stand out from the crowd:
Detailed knowledge of the ARM and/or x-86 architecture.
Prior experience with performance analysis of CPUs.
Experience with analysis and characterization of CPU workloads.
משרות נוספות שיכולות לעניין אותך