Design and implement scalable telemetry frameworks for data center infrastructure to across hardware, networking, and software layers.
Translate technology requirements into platform architecture, encompassing hardware, software, tools and other components.
Develop and deploy telemetry solutions for virtualized data center environments, multi-chiplet architectures and AI-scale infrastructure.
Develop in-band and out-of-band data collection mechanisms to monitor system health, power management, battery life, thermal performance, RAS events and CPU/Accelerator usage.
Work with industry-standard frameworks such as OpenTelemetry, DMTF Redfish, IPMI, Open Telemetry, and Bulk Telemetry (OCP) to ensure broad compatibility and compliance, to build a pioneering, standardized telemetry solution.
Collaborate with hardware (Arm platform architecture, SoC architects) and software teams to intercept telemetry solutions across different system layers, including firmware, kernel, drivers, OS, and remote/cloud access.
Develop secure telemetry data management models, both on-chip and off-chip.
Work with Arm cloud provider customers to enable enterprise-wide observability solutions and ensure alignment with industry needs.
Required Skills and Experience :
Proven experience in Infrastructure System/SoC Architecture or related work experience, with deep technical background and credibility
Deep understanding of data center architecture, compute/storage/networking telemetry, and hardware monitoring.
Understanding of power management, thermal profiling and system health diagnostics strategies in computing platforms.
Expertise in one or more of these technology domains would be desirable: Telemetry, Power, Thermal & Limits Management, Platform security, Reset & Boot, 2.5D/3D advanced packaging and chiplets, Chipset Platform Architecture
Working experience with Python, or C/C++ for telemetry data collection, processing and integration tools.
"Nice to have" Skills and Experience:
You have proven experience in firmware development, platform architecture, or telemetry solution.
You have experience with OpenTelemetry, Prometheus, Grafana, Redfish, IPMI, and other observability frameworks is a plus.
You have excellent communication and interpersonal skills and ability to work across multiple teams.