Job Overview:Arm is seeking an experienced SoC Availability, Reliability, and Serviceability (RAS) Architect to drive the RAS strategy for our next-generation SoCs. In this pivotal role, you will collaborate closely with design, verification, manufacturing, and product engineering teams to develop robust solutions that meet stringent reliability, aging, and lifecycle expectations across a diverse set of workloads and deployment environments.
You will be responsible for defining architectural specifications and delivering end-to-end RAS solutions for SoCs targeting data center applications. This includes setting RAS goals, developing strategy and roadmaps, and partnering with hardware and software teams to realize innovative and efficient architectures.
Responsibilities:- Define reliability, availability, and serviceability (RAS) requirements for next-generation SoCs
- Architect scalable RAS solutions that balance power, performance, and area (PPA) while meeting customer and market needs
- Develop and guide implementation of reliability-aware design techniques such as ECC, parity, error logging, detection, and mitigation strategies
- Lead RAS efforts throughout the product lifecycle—collaborating with front-end design, physical implementation, and verification teams
- Align with cross-functional teams to ensure compliance with data center-class reliability and availability standards
Required Skills and Experience:- Master’s degree (or higher) in Computer Engineering, Computer Science, Electrical Engineering, or a related discipline
- 10+ years of experience in SoC development, with a focus on RAS architecture
- Deep understanding of data center-class availability and reliability expectations
- Expertise in fault detection, error handling, and resiliency techniques for large-scale compute platforms
“Nice To Have” Skills and Experience:- Experience designing and deploying RAS strategies for Arm-based architectures
- Familiarity with reliability modeling, stress and aging analysis, and silicon health monitoring
- Understanding of firmware and software roles in RAS implementations
- Exposure to industry standards and specifications such as RAS for PCIe, CXL, or JEDEC memory
- Hands-on experience with failure analysis and silicon debug workflows
- Background in safety-critical or high-availability systems (e.g., automotive, aerospace, cloud infrastructure)
In Return:Arm is proud to have a set of behaviors that reflect our culture and guide our decisions, defining how we work collaboratively to defy ordinary and shape extraordinary!
- Partner and customer focus
- Collaboration and communication
- Creativity and innovation
- Team and personal development
- Impact and influence
- Deliver on your promises
Salary Range:$253,300-$342,700 per year