What you’ll be doing:
Analyze and optimize performance across application, middleware, runtime, and infrastructure layers—networking, storage, GPU utilization, and beyond
Develop tooling and metrics that provide deep observability into system performance
Collaborate closely with infra, platform, runtime, and product teams to identify key performance goals and drive systemic improvements
Lead investigations into high-impact performance regressions or scalability issues in production
Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale
Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems
What we need to see:
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field (or equivalent experience)
5+ yearsin software engineering with a strong track record in performance or scalability of high-scale distributed systems
Are deeply comfortable with performance profiling tools and tracing systems
Be able to identify performance issues, root cause problems, and be able to come up with potential solutions
Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization)
Contributed to observability, benchmarking, or performance-focused infrastructure at scale
Strong understanding of OS internals, scheduling, memory management, and IO patterns
Have demonstrated success navigating ambiguity and aligning stakeholders around performance goals
Proficient in container-based infrastructure (Docker, Kubernetes, Helm)
Ways to stand out from the crowd:
Demonstrated ability to handle sophisticated technical environments while meeting or exceeding all security, reliability, scalability, and availability metrics
Strong and confirmed knowledge of modern architectures at scale
You will also be eligible for equity and .
משרות נוספות שיכולות לעניין אותך