

What you will be doing:
Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems.
Design and implement new communication technologies to accelerate AI and HPC workloads.
Explore innovative solutions in HW and SW for our next generation platforms as part of co-design efforts involving GPU, Networking, and SW architects.
Build proofs-of-concept, conduct experiments, and perform quantitive modeling to evaluate and drive new innovations.
Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs)
What we need to see:
M.S./Ph.D. degree in CS/CE or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
Deep understanding of operating systems, computer and system architecture.
Solid in fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
Strong experience with Linux.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Expertise in related technology and passion for what you do. Experience with CUDA programming and NVIDIA GPUs. Knowledge of high-performance networks like InfiniBand, RoCE, NVLink, etc.
Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc. Knowledge of deep learning parallelisms and mapping to the communication subsystem. Experience with HPC applications.
Strong collaborative and interpersonal skills and a proven track record of effectively guiding and influencing within a dynamic and multi-functional environment.
משרות נוספות שיכולות לעניין אותך

What You Will Be Doing:
Collaborate with lead robotics and simulation engineers to develop a new feature or integration prototype
Test and validate the prototype in experiments using our robotics software stacks and/or robotic hardware
Work in a large code base in a team of senior engineers
What We Need to See:
You are pursuing an MS, or PhD in Computer Science, Robotics, Physics or related field
Proficient analytical abilities, especially in analysis, linear algebra, numerical algorithms, and physics
Advanced skills in Python and C++ programming, ideally with experience using AI-assisted development tools
The ability to independently and efficiently debug code
Solid foundation in software engineering best practices
The capability for self-directed work and persistence when faced with complex problems
The Crowd:
In-depth knowledge of physics engines like PhysX, MuJoCo, or Newton
Hands-on experience with robotic manipulation and deep learning techniques for robotics
Experience with NVIDIA’s robotics software stack, such as Isaac Sim and Isaac Lab
Familiarity with NVIDIA Warp and CUDA programming
Please note: We will be reviewing applications on a rolling basis as they are submitted. We encourage you to apply early.
משרות נוספות שיכולות לעניין אותך

NVIDIA has been defining computer graphics, PC gaming, and accelerated computing for more than 25 years. With an outstanding legacy of innovation, driven by phenomenal technology, and extraordinary people, NVIDIA is looking for a strong technical principal architect to join us in shaping the future. Principal Architects are innovators who can translate business needs into workable technology solutions. Their expertise is deep and broad. They are hands on, producing both detailed technical work and high-level architectural designs. As a principal architect in the Advanced Development team, you will explore technological challenges on accelerate networking and building AI data centers. Research new transport functions and semantics for optimizing AI workloads You will also be leading architectural and development efforts across numerous technological fields, related to the modern data center, such as distributed AI and deep learning solutions, data analytics, High Performance Computing (HPC), Software Defined Networking (SDN), virtualization, storage, and more.
What you’ll be doing:
Enhance NVIDIA's future GPU Networking offerings for accelerating AI workloads.
Lead vision, architecture and design of such technologies.
Lead proof-of-concept development to evaluate and drive such technologies.
Identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
Work with the community and maintainers to drive strategic technologies
What we need to see:
Hold a M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering from a leading university (or equivalent experience).
8+ years of industry experience (or equivalent) in systems architecture or related fields.
Experienced in virtualization, networking and storage.
Experienced in either Windows or Linux drivers, with a very good background of the other OS.
Deep understanding of performance profiling and optimization techniques, together with defining and using HW offloads.
A teammate with a can-do attitude, high energy and excellent interpersonal skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Shown research track record.
Have experience and passion for system architecture,CPU/GPU/memory/storage/networking.
Stellar communication skills.
Knowledge in Deep Learning frameworks
משרות נוספות שיכולות לעניין אותך

What you’ll be doing:
Develop algorithms, protocols, and network controllers
Conduct simulations ranging from specific components to complete data center environments
Design and implement services and runtime systems
Define HW acceleration interfaces
Evaluate and optimize application performance
Partner and collaborate with industry leaders and external researchers
Participate and speak at conferences and events
Publish original research
What we need to see:
MSc/PhD in Electrical Engineering, Computer Science/Engineering, or Post-doc
5+ years of proven experience
Track record of independent research excellence
Solid foundation in networking, operating systems, simulation, and systems
Proficiency with firmware, system software, and embedded systems development to rapidly prototype
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
You will play a crucial role in ensuring the success of the Omniverse on DGX Cloud platform by helping to build our deployment infrastructure processes, creating world-class SRE measurement and creating automation tools to improve efficiency of operations, and maintaining a high standard of perfection in service operability and reliability.
Design, build, and implement scalable cloud-based systems for PaaS/IaaS.
Work closely with other teams on new products orfeatures/improvementsof existing products.
Develop, maintain and improve cloud deployment of our software.
Participate in the triage & resolution of complex infra-related issues
Collaborate with developers, QA and Product teams to establish, refine and streamline our software release process, software observability to ensure service operability, reliability, availability.
Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces
Develop, maintain and improve automation tools that can help improve efficiency of SRE operations
Practice balanced incident response and blameless postmortems
Be part of an on-call rotation to support production systems
What we need to see:
BS or MS in Computer Science or equivalent program from an accredited University/College.
8+ years of hands-on software engineering or equivalent experience.
Demonstrate understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, and security.
Expertise in Kubernetes (K8s) & KubeVirt and building RESTful web services.
Understanding of building AI Agentic solutions preferably Nvidia open source AI solutions. Demonstrate working experiences in SRE principles like metrics emission for observability, monitoring, alerting using logs, traces and metrics
Hands on experience working with Docker, Containers and Infrastructure as a Code like terraform deployment CI/CD.
Exhibit knowledge in concepts of working with CSPs, for example: AWS (Fargate, EC2, IAM, ECR, EKS, Route53 etc...), Azure etc.
Ways to stand out from the crowd:
Expertise in technologies such as Stack-storm, OpenStack, Redhat OpenShift, AI DBs like Milvus.
A track record of solving complex problems with elegant solutions.
Prior experience with Go & Python, React.
Demonstrate delivery of complex projects in previous roles.
Showcase ability in developing Frontend application with concepts of SSA, RBAC
משרות נוספות שיכולות לעניין אותך

What You'll Be Doing:
Contributing to the development of CUDA Quantum by building core infrastructure for inter-device communication and efficient execution across multiple processors
Partnering with architects, product managers, and collaborators to create an extensible toolchain integrating quantum architecture specific components
Solving difficult problems at the intersection of compilers, HPC and quantum computing to enable ground-breaking research and technology
Discussing and refining software designs and implementation strategies with peers
Improving processes and infrastructure to accelerate our development
What We Need To See:
Bachelors Degree in Computer Science, Physics or related engineering field (Ph.D. or Masters preferred), or equivalent experience
5+ years of experience
Ability working on large-scale software projects, and a proven track record of building performant and robust production software
Proficiency in GPU-programming and a solid understanding of performance profiling, multi-processor systems, and compiler fundamentals
Ability to quickly develop expertise in new domains and products, and eagerness to master new challenges
Strong communication and collaboration skills
Extensive knowledge about quantum computing hardware and control systems and/or prior experience implementing optimization and code generation components for various quantum computing architectures
A passion for system designing and a focus on improving extensibility
Familiarity with FPGA programming and HDLs
Deep understanding of compiler toolchains, specifically LLVM/MLIR
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Engage with our partners and customers to root cause functional and performance issues reported with NCCL
Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
Document and conduct trainings/webinars for NCCL
Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.
What we need to see:
B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
Experience working with engineering or academic research community supporting HPC or AI
Practical experience with high performance networking:Infiniband/RoCE/Ethernetnetworks, RDMA, topologies, congestion control
Expert in Linux fundamentals and a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for large clusters. Experience debugging network configuration issues in large scale deployments
Familiarity with CUDA programming and/or GPUs. Good understanding of Machine Learning concepts and experience with Deep Learning Frameworks such PyTorch, TensorFlow
Deep understanding of technology and passionate about what you do
משרות נוספות שיכולות לעניין אותך

What you will be doing:
Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems.
Design and implement new communication technologies to accelerate AI and HPC workloads.
Explore innovative solutions in HW and SW for our next generation platforms as part of co-design efforts involving GPU, Networking, and SW architects.
Build proofs-of-concept, conduct experiments, and perform quantitive modeling to evaluate and drive new innovations.
Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs)
What we need to see:
M.S./Ph.D. degree in CS/CE or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
Deep understanding of operating systems, computer and system architecture.
Solid in fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
Strong experience with Linux.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Expertise in related technology and passion for what you do. Experience with CUDA programming and NVIDIA GPUs. Knowledge of high-performance networks like InfiniBand, RoCE, NVLink, etc.
Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc. Knowledge of deep learning parallelisms and mapping to the communication subsystem. Experience with HPC applications.
Strong collaborative and interpersonal skills and a proven track record of effectively guiding and influencing within a dynamic and multi-functional environment.
משרות נוספות שיכולות לעניין אותך