

Share
The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, Neuron Kernel Interface (NKI) compiler, and runtime that natively integrates into popular ML frameworks, such as PyTorch and TensorFlow.Neuron Kernel Interface (NKI) is a bare-metal language and compiler for directly programming NeuronDevices available on AWS Trn/Inf instances. You can use NKI to develop, optimize and run new operators directly on NeuronCores while making full use of available compute and memory resources.Learn more about Our History:
You have knowledge of resource management, scheduling, code generation, optimization, and instruction architectures including CPU, NPU, GPU and novel forms of compute.Explore the Product:
Work/Life Balance
Mentorship & Career Growth
- 5+ years of engineering team management experience
- 9+ years of working directly within engineering teams experience
- 4+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- Experience partnering with product or program management teams
- Understanding of compilers (resource management, instruction scheduling, code generation, and compute graph optimization)
- Strong software design fundamentals and excellent system-level coding skills
- M.S. or Ph.D. in Computer Science or related technical field
These jobs might be a good fit

Share
Key job responsibilities
In this role you'll develop, design, maintain, deploy, monitor and support a very important component in the Nitro firmware, while enjoying every step of the journey.About the team
*Diverse Experiences
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
*Why AWS*Work/Life Balance*Inclusive Team Culture*Mentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
These jobs might be a good fit

Share
Key job responsibilities
- Develop and maintain integrations using RESTful services, SOAP, and database connections
- Develop endpoints in systems (e.g. NetSuite) that connect to AWS
- Architect robust error management and control systems
- Conduct code reviews and maintain high code quality standards
- Create comprehensive technical documentationTechnical Expertise:
- Deep knowledge of AWS services, including: Lambda, S3, Redshift, API Gateway, CloudWatch, EventBridge, SQS, SNS, B2B Data Interchange
- Experience building APIs and system integrations
- Excellent background in systems integration and data processing
- Experience in automating, deploying, and supporting large-scale infrastructure
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with CI/CD pipelines build processes
- Experience with distributed systems at scalePursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
These jobs might be a good fit

Share
Custom SoCs (System on Chip) live at the heart of AWS Machine Learning servers. As a member of the Cloud-Scale Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers including AWS Inferentia, Trainium Systems (our custom designed machine learning inference and training datacenter servers). Our success depends on our world-class server infrastructure; we’re handling massive scale and rapid integration of emergent technologies. We’re looking for an ASIC Physical Design Engineer to help us trail-blaze new technologies and architectures, while ensuring high design quality and making the right trade-offs.Key job responsibilities
- Work with RTL/logic designers to drive architectural feasibility studies, explore power-performance-area tradeoffs for physical design closure
- Drive IO/Core block physical implementation through synthesis, floor planning, bus / pin planning, place and route, power/clock distribution, congestion analysis, timing closure, IR drop analysis, physical verification, ECO and sign-off
- Develop physical design methodologies
- Evaluate 3rd party IP and provide recommendations
- BS + 8yrs or MS + 6yrs in EE/CS
- 6+ years in ASIC Physical Design from - RTL-to-GDSII in either 7nm, 14/16nm, 20nm, or 28nm
- Block Design using EDA tools (examples: Cadence, Mentor Graphics, Synopsys, or Others) including synthesis, equivalency verification, floor planning, bus / pin planning, place and route, power/clock distribution, congestion analysis, timing closure, IR drop analysis, physical verification, and ECO
- Deep understanding on sign-off activities (timing, ir/em, physical verification)
- Scripting experience with Tcl, Perl or Python
- Expertise using CAD tools (examples: Cadence, Mentor Graphics, Synopsys, or Others) develop flows for synthesis, formal verification, floor planning, bus / pin planning, place and route, power/clock distribution, congestion analysis, timing closure, IR drop analysis, physical verification, and ECO
- 4+ years in integrating IP and ability to specify and drive IP requirements in the physical domain.
- Thorough knowledge of device physics, custom/semi-custom implementation techniques
- Experience solving physical design challenges across various technologies such as DDR, PCIe, fabrics etc.
- Experience in extraction of design parameters, QOR metrics, and analyzing trends
- Ability to provide mentorship, guidance to junior engineers and be a very effective team player
- Meets/exceeds Amazon’s leadership principles requirements for this role
- Meets/exceeds Amazon’s functional/technical depth and complexity for this role
These jobs might be a good fit

Share
As a member of the Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers. You’ll provide leadership in the application of new technologies to large scale server deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you’ll work with thought leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve your products performance, quality and cost. We’re changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.Work/Life Balance
Mentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Diverse Experiences
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
- Deep knowledge with PCIe interface Gen4 or above, both Electrical and Functional at the chip level and at the PCB level.
- Deep understanding of Transmission line theory and Electromagnetics and its application in SerDes, Single-ended signal and parallel bus interfaces. * Work with ODMs, IP Silicon vendors, component suppliers and internal design teams on cross-boundary triaging, debugging, and resolving issues.
- Hands-on lab equipment skills (VNA, Realtime scope, Sampling scope and its accessories) for electrical validation and characterization.
- Scripting skills to automate tests, logs parsing and data collection.
- Strong technical communication skills (verbal and written) to interface with cross-functional technical leads within and/or outside of the organization.
These jobs might be a good fit

Share
Key job responsibilities
You will lead efforts to build distributed training support into PyTorch and JAX using XLA, the Neuron compiler, and runtime stacks. You will optimize models to achieve peak performance and maximize efficiency on AWS custom silicon, including Trainium and Inferentia, as well as Trn2, Trn1, Inf1, and Inf2 servers. Strong software development skills, the ability to deep dive, work effectively within cross-functional teams, and a solid foundation in Machine Learning are critical for success in this role.Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSWork/Life Balance
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
- Bachelor's degree in computer science or equivalent
- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience as a mentor, tech lead or leading an engineering team
- Experience in machine learning, data mining, information retrieval, statistics or natural language processing
- Master's degree in computer science or equivalent
- Experience in computer architecture
- Previous software engineering expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training.
These jobs might be a good fit

Share
Key job responsibilities
• Drive a safety centric culture and ensure a safe workplace for builders and visitors to our sites.
• Oversee the performance of the data center's critical physical infrastructure. Ensure that all work performed is completed to the highest quality and without impact to customers.
• Leading a team of 24x7 engineering technicians with an emphasis on career growth.
• Driving improvement projects, often requiring reaching out to a variety of support teams, and drive them from conception to completion.
• Coordinate daily with third-party vendors ensuring adherence to contracted SLA’s.
• Effectively and efficiently manage the operations budget and expenditures
• Routinely operate as the after-hours on-call Data Center Facility Manager for their data centers in the region. This will include responding to any issues within the data centers and managing the investigation, mitigation, and recovery of the issue(s)
A day in the life
As the Facility Manager, your role demonstrates a strong commitment to prioritizing the development and well-being of team members, as well as fostering diversity and inclusion. You will oversee all facets of the data center's critical infrastructure with a focus on continuous availability and optimal performance, while upholding high-quality standards and minimizing any impact on internal and external customers. Additionally, you will play a crucial role in process optimization, staff management, setting performance metrics, and driving continuous improvement initiatives, all while ensuring a supportive and inclusive work environment.
- Experience in people management and team development
- Experience in engineering work, managing large-scale services
- Experience maintaining SLAs through the implementation of proactive issue detection and reporting
- Experience operating a mission-critical team or product
- High school or equivalent
- This role requires you to be a national of an EU member state.
- Bachelor's degree in Electrical Engineering, Mechanical Engineering, or a related field
- Knowledge of the electrical and mechanical systems involved in critical data center operations including systems such as feeders, transformers, generators, switchgear, UPS systems, ATS units, PDU units, chillers, pumps, air handling units, and CRAC units
- Experience in a management position with 5 or more direct reports
- Experience working in data centers with an emphasis on building and equipment operation
These jobs might be a good fit

Share
The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, Neuron Kernel Interface (NKI) compiler, and runtime that natively integrates into popular ML frameworks, such as PyTorch and TensorFlow.Neuron Kernel Interface (NKI) is a bare-metal language and compiler for directly programming NeuronDevices available on AWS Trn/Inf instances. You can use NKI to develop, optimize and run new operators directly on NeuronCores while making full use of available compute and memory resources.Learn more about Our History:
You have knowledge of resource management, scheduling, code generation, optimization, and instruction architectures including CPU, NPU, GPU and novel forms of compute.Explore the Product:
Work/Life Balance
Mentorship & Career Growth
- 5+ years of engineering team management experience
- 9+ years of working directly within engineering teams experience
- 4+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- Experience partnering with product or program management teams
- Understanding of compilers (resource management, instruction scheduling, code generation, and compute graph optimization)
- Strong software design fundamentals and excellent system-level coding skills
- M.S. or Ph.D. in Computer Science or related technical field
These jobs might be a good fit