What you’ll be doing:
Architect and Build Scalable Systems: Drive the design and implementation of the AON profiling service's core systems. You'll master inter-process communication (IPC), memory management, and building low-overhead architectures to handle profiling data from complex multi-node, multi-process, multi-GPU, and cluster environments.
Elevate Software Engineering Excellence: Promote high standards in software development, including design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems. Our commitment to code quality and robust testing ensures a reliable profiling service.
Lead, Mentor, and Innovate: Guide and mentor engineers, provides impactful code reviews, and shape technical roadmaps. Proactively identify complex technical issues within the AON project, break them down, and craft innovative solutions. Your problem-solving prowess will be crucial for AON's success with ML workloads.
Drive Full-Stack Development: Transform user needs into clear requirements and design documents. Explore diverse approaches to problems, making well-reasoned recommendations. Lead end-to-end feature development—from planning and prototyping to implementation, testing, and customer evaluation. This involves hands-on development across user applications, drivers, performance counter libraries, and lower-level platform/hardware abstraction layers.
Collaborate Across Boundaries
What we need to see:
BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.
8+ years of meaningful software development experience in C, C++, and Python
10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.
Strong interpersonal, verbal, and written communication, demonstrating the ability to build cross-organizational partnerships and lead technical teams through complex challenges.
Profiling & Performance Tools Expert: Extensive knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, performance counters, API traces, event correlation). Familiarity with existing profiling ecosystems and their limitations is a plus.
GPU & CUDA Proficiency: In-depth knowledge of CUDA APIs, runtime, streams, kernels, and GPU architecture.
ML Ecosystem & Performance Analysis:Familiarity with ML frameworks such as PyTorch and JAX, and knowledge of performance analysis for AI training/inference applications.
Large-Scale System Development & Debugging: Experience developing and debugging across complex multi-layered software systems, including user mode and kernel drivers, with a proven ability to contribute to and extend substantial codebases (100s of millions of lines).
Proficiency in Designing APIs and Interfaces for Profiling Tools: Designs robust, flexible APIs and interfaces enabling seamless integration of profiling tools with various frameworks and custom code.
Mastery of Problem Simplification
Ways to stand out from the crowd:
Pioneering Low-Overhead Profiling Systems: A track record of designing and implementing profiling systems with minimal performance impact on target workloads, especially in complex multi-process and distributed environments.
Deep Understanding of PyTorch Internals & CUDA Usage: A comprehensive grasp of how PyTorch uses CUDA, including tensor memory, operations, and distributed training functionalities.
GPU Performance Analysis & Optimization Acuity: The ability to analyze profiling data and translate it into concrete, actionable insights, particularly within CUDA and ML Frameworks like PyTorch.
Translating Customer Needs: Skilled at redefining customer requests into actionable use cases and requirements.
Strong understanding of system security principles
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך