

You will collaborate closely with researchers to design and scale agents - enabling them to reason, plan, call tools and code just like human engineers. You will work on building and maintaining the core infrastructure for deploying and running these agents in production, powering all our agentic tools and applications and ensuring their seamless and efficient performance. If you're passionate about the latest research and cutting-edge technologies shaping generative AI, this role and team offer an exciting opportunity to be at the forefront of innovation.
What you'll be doing:
Design, develop, and improve scalable infrastructure to support the next generation of AI applications, including copilots and agentic tools.
Drive improvements in architecture, performance, and reliability, enabling teams to bring to bear LLMs and advanced agent frameworks at scale.
Collaborate across hardware, software, and research teams, mentoring and supporting peers while encouraging best engineering practices and a culture of technical excellence.
Stay informed of the latest advancements in AI infrastructure and contribute to continuous innovation across the organization.
What we need to see:
Master or PhD or equivalent experience in Computer Science or related field, with a minimum of 5 years in large-scale distributed systems or AIinfrastructure.
Advanced expertise in Python (required), strong experience with JavaScript, and deep knowledge of software engineering principles, OOP/functional programming, and writing high-performance, maintainable code.
Demonstrated expertise in crafting scalable microservices, web apps, SQL, and NoSQL databases (especially MongoDB and Redis) in production with containers, Kubernetes, and CI/CD.
Solid experience with distributed messaging systems (e.g., Kafka), and integrating event-driven or decoupled architectures into robust enterprise solutions.
Practical experience integrating and fine-tuning LLMs or agent frameworks (e.g., LangChain, LangGraph, AutoGen, OpenAI Functions, RAG, vector databases, timely engineering).
Demonstrated end-to-end ownership of engineering solutions, from architecture and development to deployment, integration, and ongoingoperations/support.
Excellent communication skills and a collaborative, proactive approach.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Working with NVIDIA AI Native customers on data center GPU server and networking infrastructure deployments.
Guiding customer discussions on network topologies, compute/storage, and supporting the bring-up ofserver/network/clusterdeployments.
Identifying new project opportunities for NVIDIA products and technology solutions in data center and AI applications.
Conducting regular technical meetings with customers as a trusted advisor, discussing product roadmaps, cluster debugging, and new technology introductions.
Building custom demonstrations and proofs of concept to address critical business needs.
Analyzing and debugging compute/network performance issues.
What we need to see:
BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or related fields, or equivalent experience.
5+ years of experience in Solution Engineering or similar roles.
System-level understanding of server architecture, NICs, Linux, system software, and kernel drivers.
Practical knowledge of networking - switching & routing for Ethernet/Infiniband, and data center infrastructure (power/cooling).
Familiarity with DevOps/MLOps technologies such as Docker/containers and Kubernetes.
Effective time management and ability to balance multiple tasks.
Excellent communication skills for articulating ideas and code clearly through documents and presentations.
Ways to stand out from the crowd:
External customer-facing skills and experience.
Experience with the bring-up and deployment of large clusters.
Proficiency in systems engineering, coding, and debugging, including C/C++, Linux kernel, and drivers.
Hands-on experience with NVIDIA systems/SDKs (e.g., CUDA), NVIDIA networking technologies (e.g., DPU or equivalent experience, RoCE, InfiniBand), and/or ARM CPU solutions.
Familiarity with virtualization technology concepts.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

As part of the NVIDIA Solutions Architecture team, you will navigate uncharted waters and gray space to drive successful market adoption by balancing strategic alignment, data-driven analysis, and tactical execution across engineering, product, and sales teams. You will serve as a critical liaison product strategy and large-scale customer deployment.
What you’ll be doing:
Lead the end-to-end execution for key Hyperscalers customers to optimally and rapidly go-to-market at scale with NVIDIA data center products (e.g., GB200).
Partner with Hyperscalers Product Customer Lead to understand strategy, define metrics, ensure alignment.
Data-Driven Execution: Collect, maintain, and analyze sophisticated data trends to assess the product's market health, identify themes, challenges, and opportunities, and guide the customer to resolution of technical roadblocks.
Problem Solving & Navigation: Navigate complex issues effectively, embodying a productive leader who balances short-term unblocks with long-term process and product improvements.
Executive Communication: Deliver concise, direct executive-level updates and regular status communications to multi-functional leadership on priorities, progress, and vital actions.
Process Improvement: Integrate insights from deployment challenges and customer feedback into future developments for processes and products through close partnership with Product and Engineering teams.
What we need to see:
BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields or equivalent experience.
8+ years of combined experience in Solutions Architecture, Technical Program Management, Product Management, System Reliability Engineer or other complex multi-functional roles.
Proven track record to lead and influence without direct authority across technical and business functions.
Proven analytical skills with experience in establishing benchmarks, collecting/analyzing intricate data, and redefining data into strategic themes, action items, and executive summaries.
Skilled in reviewing logs and deployment data, and aiding customers in resolving technical concerns (e.g., identifying performance issues associated with AI/ML and system architecture).
Ways to stand out from the crowd:
Lead multi-functional teams and influence interested parties to address challenges in customer datacenter deployments, ensuring cluster health and performance at scale.
Established track record of driving a product from the pilot phase to at-scale deployment in a data center environment.
Hands-on experience with NVIDIA hardware (e.g., H100, GB200) and software libraries, with an understanding of performance tuning and error diagnostics.
Knowledge of DevOps/MLOps technologies such as Docker/containers and Kubernetes, and their relationship to data center deployments.
Confirmed capacity to align, adopt, and disseminate insights among various internal teams (e.g., collaborating with other program leads).
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Path-find technical innovations in Quantum Error Correction and Fault Tolerance, working with multi-functional teams in Product, Engineering, and Applied Research
Develop novel approaches to quantum error correction codes and their logical operations, including methods for implementation and logical operation synthesis
Research and co-design improved methods to achieve fault tolerance, such as techniques for logical operations, concatenation, synthesis, distillation, cultivation, or others
Collaborate with internal teams and external partners on developing technology components to enable a fault-tolerant software stack integrated with quantum hardware
Adopt a culture of collaboration, rapid innovation, technical depth, and creative problem solving
What we need to see:
Masters degree in Physics, Computer Science, Chemistry, Applied Mathematics, or related engineering field or equivalent experience (Ph.D. preferred)
Extensive background in Quantum Information Science with 12+ overall years of experience in the Quantum Computing industry
A demonstrated ability to deliver high impact value in quantum error correction and fault tolerance
Ways to stand out from the crowd:
Hands-on experience in scientific computing, high-performance computing, applied machine learning, or deep learning
Experience with co-design of quantum error correction with quantum hardware or quantum applications
Experience with CUDA and NVIDIA GPUs
Passion to drive technology innovations into NVIDIA software and hardware products to support Quantum Computing
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you will be doing:
The team will provide their services 24/7 with a follow-the-sun environment which will span continents. You will report directly to a manager in the United States.
Some CIS shifts require either a Saturday or Sunday each week. The hours worked may include an early or late start (10hrs-per-day x 4 days-per-week schedule) to ensure that the combination the US and India teams provide 24/7 coverage.
Every CIS team member will use alerts and alarms to help prevent issues and incidents when possible. You may also work with the developer community to develop and implement predictive support or diagnostic routines.
Perform systems administration tasks, network administration tasks, security incident monitoring to drive our actions.
CIS team members will work with developers to learn how the service works, then translate that understanding into runbooks which the entire team will use. As new features and functionality are added, you will also update and evolve the runbooks as needed.
Help discover incidents and issues, including initiating the incident management procedure.
Bring in subject matter authorities or service owners as needed to resolve issues. Feedback will help us continually improve our service.
Your interpersonal skills will help keep the team engaged through resolution and ensure our clients believe we value their time and effort. May perform other tasks that will help us provide extraordinary service levels for our customers.
What we need to see:
Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles, and architects, coordinating effectively across organizational boundaries and geographies.
5+ years of experience administering large-scale production systems. 3+ years of experience in high-availability Internet, Cloud, or Data Center environments (Systems Administration, SRE, or NOC).
BS in Computer Science, Engineering, Physics, Mathematics, or equivalent experience.
Expert-level knowledge of Linux system administration and automation using Ansible and/or Python.
Strong experience with shell scripting, DNS, DHCP, storage systems, and core networking (IP Tables, routing, firewalls).
Experience with at least one workload manager (Slurm preferred) or job scheduling system in a production environment.
Strong experience troubleshooting and maintaining large-scale bare-metal infrastructure. Strong cross-team collaboration, documentation, and mentoring skills.
Experience improving processes for automation, reliability, and operational excellence.
Expertise using monitoring tools and problem ticketing systems. Strong problem-solving, analytical, and troubleshooting abilities.
Ways to Stand Out from the Crowd:
Advanced hands-on experience with Kubernetes, SLURM, and large-scale cluster management.
Familiarity with GPU hardware and high-performance computing environments.
Experience with observability and incident management tools (Grafana, OpenTelemetry, PagerDuty, JIRA). Cloud experience (AWS, Azure, GCP) is a plus; strong preference for on-prem expertise.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Define and drive architecture for complex, high-volume GPU products, ensuring they meet our ambitious performance and scalability goals.
Perform and guide power and performance evaluation, trade-off assessments, and architectural modeling to identify optimal chip, package and system construction.
Lead improvements in architecture, methodology and tools to improve the scalability of our system, collaborating closely with cross-functional engineering teams.
Specify and optimize SoC subsystems such as memory architecture, test infrastructure and power management.
Collaborate with RTL, verification, physical design, firmware, and software teams to successfully implement and integrate system components.
Produce high-quality technical documentation of SoC architecture, specifications, and development trade-offs.
Provide technical leadership and mentorship to junior architects and engineers, encouraging a culture of excellence and innovation.
What we need to see:
Over 15 years in SoC architecture development or similar technical leadership roles.
Proven track record of defining and delivering multiple high-volume SoC systems (such as CPU, GPU, modem, networking or similar).
Proficiency in evaluating power/performance and architectural modeling in high-level programming languages.
Strong understanding of SoC system fundamentals, including memory hierarchy, coherency, clocking, power domains, boot and reset, test, and debug methodologies.
Hands-on experience with silicon bring-up, debug and tuning.
Excellent interpersonal, leadership, and collaboration skills, with the ability to influence across organizations.
Outstanding documentation, written, and verbal communication skills.
Master's degree (or equivalent experience) in a relevant subject area: Computer Science, Electrical Engineering or Computer Engineering
Ways to stand out from the crowd:
Experience with GPU or AI accelerator architecture, including platform aspects, off-chip I/O technologies, and networked multi-GPU systems
Knowledgeable in modern packaging technologies, and their costs and benefits
Knowledgeable in AI workload characteristics
Outstanding analytical and problem-solving skills with a focus on optimizing performance, power, area, and complexity.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

What you'll be doing:
Act as the subject matter expert (SME) for material management processes supporting data center infrastructure hardware across its full lifecycle.
Be responsible for the planning and execution of operational hardware sparing strategies to ensure availability and minimal downtime.
Own the end-of-life (EOL) management process for infrastructure hardware, including decommission planning and material disposition.
Ensure inventory accuracy through ongoing audits, reconciliation processes, and alignment with data center operational needs.
Apply ABC inventory classification methodology to prioritize and optimize stock levels based on usage, cost, and criticality.
Maintain and improve material planning models to support forecasting and capacity planning initiatives.
Analyze data trends to drive continuous improvements in inventory optimization, cost control, and operational efficiency.
What we need to see:
12+ years of experience in material management, inventory operations, or hardware lifecycle support within data center infrastructure, manufacturing, or supply chain environment.
Solid grasp of data center hardware components (servers, networking, storage, etc.) and their lifecycle (deployment, sparing, EOL).
Demonstrable experience with inventory control practices, including ABC classification, stock audits, and accuracy initiatives.
Excellent organizational and documentation skills; attention to detail is a must.
Bachelor’s degree in Supply Chain Management, Operations, Logistics, Information Technology, or related field; or equivalent experience.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך

You will collaborate closely with researchers to design and scale agents - enabling them to reason, plan, call tools and code just like human engineers. You will work on building and maintaining the core infrastructure for deploying and running these agents in production, powering all our agentic tools and applications and ensuring their seamless and efficient performance. If you're passionate about the latest research and cutting-edge technologies shaping generative AI, this role and team offer an exciting opportunity to be at the forefront of innovation.
What you'll be doing:
Design, develop, and improve scalable infrastructure to support the next generation of AI applications, including copilots and agentic tools.
Drive improvements in architecture, performance, and reliability, enabling teams to bring to bear LLMs and advanced agent frameworks at scale.
Collaborate across hardware, software, and research teams, mentoring and supporting peers while encouraging best engineering practices and a culture of technical excellence.
Stay informed of the latest advancements in AI infrastructure and contribute to continuous innovation across the organization.
What we need to see:
Master or PhD or equivalent experience in Computer Science or related field, with a minimum of 5 years in large-scale distributed systems or AIinfrastructure.
Advanced expertise in Python (required), strong experience with JavaScript, and deep knowledge of software engineering principles, OOP/functional programming, and writing high-performance, maintainable code.
Demonstrated expertise in crafting scalable microservices, web apps, SQL, and NoSQL databases (especially MongoDB and Redis) in production with containers, Kubernetes, and CI/CD.
Solid experience with distributed messaging systems (e.g., Kafka), and integrating event-driven or decoupled architectures into robust enterprise solutions.
Practical experience integrating and fine-tuning LLMs or agent frameworks (e.g., LangChain, LangGraph, AutoGen, OpenAI Functions, RAG, vector databases, timely engineering).
Demonstrated end-to-end ownership of engineering solutions, from architecture and development to deployment, integration, and ongoingoperations/support.
Excellent communication skills and a collaborative, proactive approach.
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך