

Share
computing for more than 25 years.a unique legacy of innovationfueled by great technology—and amazing people. Today,
You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and
Be Doing:
Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, andplacement.
Analyze current deployments, develop prototypes, and recommend architectural improvements.
Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies.
Design, simulate, and validate new systems using novel, scalable network simulator NSX.
Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1).
Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features.
Publish patents and present research at leading conferences.
What We Need to See:
M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience andpublications.
5+ years of relevant experience.
Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing).
Strong software engineering skills in C++ and/or Python.
Excellent system-level design and problem-solving abilities.
Outstanding communication and collaboration skills across technical domains.
Ways to Stand Out from the Crowd:
Proven passion for solving sophisticated technical problems and delivering impactful solutions.
Record of publications in top-tier conferences.
Experience in designing and building large-scale AI training clusters.
Post-PhD research experience
Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.
These jobs might be a good fit

Share
What you’ll be doing:
Technically leading the features owns working with customers and R&D on architecture and design of the features.
Clearly define the requirements. research the hardware, firmware, and software existing support and define the solution to match the requirements he defined.
Simulations ranging from specific components to complete data center environments
Develop SDKs for novel HW capabilities
Designing and implementing services, runtime systems, and applications over SDK
Evaluate and optimize application performance
Partner and collaborate with other forward-thinking team members and external researchers
Work with intelligent networking machines powered by AI systems that can learn, reason and interact with other network components
What we need to see:
Graduate of BSc/MSc in Electrical Engineering, Computer, Science/Engineering,Math/Physics/Statisticsor a related field
0-2 years of relevant experience.
Knowledge in networking, operating systems, accelerator programming, and systems
Track record of research excellence
Good communications skills
Ways to stand out from the crowd:
Experience in networking and operation system
Knowledge or experience with LLM

Share
What you'll be doing:
Conduct research and analysis on networking solution and end to end algorithms.
Work with a creative and experienced team to outline the next generation of our RDMA load balance and congestion control algorithms.
Work on simulation environment and on real HW systems
Engage with other research teams to develop Proof of Concepts using our technology.
What we need to see:
2+ years of experience.
B.Sc. in Electrical Engineering or Computer Engineering.
High motivation to learn and explore new fields.
Proven problem-solving skills.
Excellent interpersonal skills.
Knowledge and understanding of compute and networking systems is an advantage.
Passion and attention to detail in building with a high focus on building quality.
Ways to stand out from the crowd:
Passion and love for system architecture, includingCPU/GPU/Memory/Storage/Networking.
Background with AI workloads.
background with networking.
Experience in the development of simulation environments.

Share
Key Responsibilities:

Share
What You’ll Be Doing:
Define next-generation SmartNIC software and firmware stacks’ architecture to match expected and future workloads.
Closely collaborate with HW architects to define new HW features and SW-HW interfaces for diverse use cases.
Conduct research in network protocols, explore new network technologies, extend networking drivers, driver offloads and accelerators of various SmartNICs-related tasks.
What We Need to See:
B.Sc. or M.Sc. in Computer Engineering, Computer Science, Electrical Engineering, or equivalent experience.
5+ years of proven experience in the field.
Proven track record to quickly adapt to new technologies and delve deep into new areas.
Outstanding ability to work independently, interact with customers, and guide R&D teams.
Excellent communication and presentation skills.

Share
As the Sr. SDM for the Inference Technology Team, you will lead a strong team of managers and engineers to build fundamental inference technology building blocks and libraries to enable AI developers to optimize model for inference on Trainium and Inferentia devices. You will be responsible for the full development life cycle of inference library and feature development, including reliability and scalability. You will develop the Neuronx_Distributed Inference Libraries and contribute to other popular open source Inference Libraries, enabling customers to optimize LLMs, multimodal, and generative models.A day in the life
You will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions, developing the full stack of software, servers and chips together with teams across the Annapurna organization to run the largest machine learning workloads.Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Work/Life Balance
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
- 10+ years of engineering experience
- 5+ years of engineering team management experience
- 10+ years of planning, designing, developing and delivering consumer software experience
- Experience partnering with product or program management teams
- Experience managing multiple concurrent programs, projects and development teams in an Agile environment

Share
Annapurna Labs as part of AWS, is looking for System Software student to help us develop the semiconductor platform which is based on revolutionary architecture.Key job responsibilities
- Experience with python development.

Share
computing for more than 25 years.a unique legacy of innovationfueled by great technology—and amazing people. Today,
You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and
Be Doing:
Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, andplacement.
Analyze current deployments, develop prototypes, and recommend architectural improvements.
Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies.
Design, simulate, and validate new systems using novel, scalable network simulator NSX.
Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1).
Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features.
Publish patents and present research at leading conferences.
What We Need to See:
M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience andpublications.
5+ years of relevant experience.
Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing).
Strong software engineering skills in C++ and/or Python.
Excellent system-level design and problem-solving abilities.
Outstanding communication and collaboration skills across technical domains.
Ways to Stand Out from the Crowd:
Proven passion for solving sophisticated technical problems and delivering impactful solutions.
Record of publications in top-tier conferences.
Experience in designing and building large-scale AI training clusters.
Post-PhD research experience
Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.
These jobs might be a good fit