NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
What you’ll be doingDevelop efficient infrastructure and tools for automating complex software processes.
Drive Performance Optimization: Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.
Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.
Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.
Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.
Influence Technical Strategy: Contribute to defining technical strategies and roadmaps for our platform automation initiatives, ensuring alignment with company-wide goals and standard methodologies.
Bachelor's or equivalent experience in Computer Science, Computer Engineering, or a related technical field, or Master's degree or equivalent experience in a similar field.
6+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.
Expertise in System-Level Programming: Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go.
Deep Understanding of System Software: Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications.
Distributed Systems: Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects.
Automation and CI/CD Proficiency: Experience building and maintaining automated testing, benchmarking, and continuousintegration/continuousdeployment pipelines.
Problem-Solving and Analytical Skills: Outstanding analytical, problem-solving, and debugging skills, with a track record of resolving complex technical challenges.
Collaboration and Communication: Excellent interpersonal and communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively across teams.
Experience optimizing performance for AI/Machine Learning workloads, especially inference applications, on diverse hardware platforms.
Prior experience building or contributing to large-scale compute infrastructure solutions in cloud environments or on-premises data centers.
Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
Familiarity with performance profiling tools and methodologies for hardware and software systems.
Track record of driving significant efficiency gains or architectural improvements in large-scale systems.
משרות נוספות שיכולות לעניין אותך