

Share
Nice to have:
These jobs might be a good fit

Share
computing for more than 25 years.a unique legacy of innovationfueled by great technology—and amazing people. Today,
You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and
Be Doing:
Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, andplacement.
Analyze current deployments, develop prototypes, and recommend architectural improvements.
Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies.
Design, simulate, and validate new systems using novel, scalable network simulator NSX.
Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1).
Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features.
Publish patents and present research at leading conferences.
What We Need to See:
M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience andpublications.
5+ years of relevant experience.
Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing).
Strong software engineering skills in C++ and/or Python.
Excellent system-level design and problem-solving abilities.
Outstanding communication and collaboration skills across technical domains.
Ways to Stand Out from the Crowd:
Proven passion for solving sophisticated technical problems and delivering impactful solutions.
Record of publications in top-tier conferences.
Experience in designing and building large-scale AI training clusters.
Post-PhD research experience
Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.

Share
Your expertise will transform our infrastructure and deployment. You'll design scalable cloud architectures to accelerate innovation, champion a world-class DevOps culture to empower developers, and build the foundation for our future growth.
This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Share
Unify online/offline for features: Drive Flink adoption and patterns that keep features consistent and low-latency for experimentation and production.
Make self-serve real: Build golden paths, templates, and guardrails so product/analytics/DS engineers can move fast safely.
Run multi-tenant compute efficiently: EMR on EKS powered by Karpenter on Spot instances; right-size Trino/Spark/Druid for performance and cost.
Cross-cloud interoperability: BigQuery + BigLake/Iceberg interop where it makes sense (analytics, experimentation, partnership).
What you'll be doingThis position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Share
This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Share
You’ll lead a group of talented senior engineers, providing both technical and people leadership while staying hands-on in design and implementation. Together, you’ll drive key architectural decisions and evolve our ingestion and processing layers to support real-time pipelines, improve observability, and enable greater horizontal scalability.
Our Stack:
On a typical day you’ll
Share
What you’ll be doing:
Technically leading the features owns working with customers and R&D on architecture and design of the features.
Clearly define the requirements. research the hardware, firmware, and software existing support and define the solution to match the requirements he defined.
Simulations ranging from specific components to complete data center environments
Develop SDKs for novel HW capabilities
Designing and implementing services, runtime systems, and applications over SDK
Evaluate and optimize application performance
Partner and collaborate with other forward-thinking team members and external researchers
Work with intelligent networking machines powered by AI systems that can learn, reason and interact with other network components
What we need to see:
Graduate of BSc/MSc in Electrical Engineering, Computer, Science/Engineering,Math/Physics/Statisticsor a related field
0-2 years of relevant experience.
Knowledge in networking, operating systems, accelerator programming, and systems
Track record of research excellence
Good communications skills
Ways to stand out from the crowd:
Experience in networking and operation system
Knowledge or experience with LLM

Nice to have:
These jobs might be a good fit