

The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced cross-disciplinary staff to conceive and design infrastructure technologies. You will work closely with an internal inter-disciplinary team, and outside partners to drive key aspects of product definition, execution and test in manufacturing. A successful candidate will be responsive, flexible and able to succeed within an open collaborative peer environment. You will:* Be responsible for the test validation of future technologies.
* Drive manufacturing process improvements to address reliability issues and concerns.
* Qualify manufacturing lines and mechanisms for mass production
* You will have a fundamental understanding of Reliability statistics/Reliability tests and/or solid understanding of computer systems to influence design for reliability.
* Lead identifying and validating product/component risks and work with design teams to mitigate them and define the test methodology and test coverage to assure product reliability.
* Deep-dive in technologies aligned with product roadmap.
* Provide technical leadership and mentor engineers.
* Perform Reliability prediction of failure mechanisms, products under development and products in the field.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.Key job responsibilities
* Responsible for defining reliability tests to be implemented during manufacturing
* Drive manufacturing process improvements to address reliability issues and concerns.
* Perform Reliability prediction of failure mechanisms, products under development and products in the field.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.
- Bachelor's or Master’s degree in Reliability Engineering, Physics or related field, or equivalent experience
- 7+ years of Reliability Engineering work experience with server compute platforms or on high-tech hardware
משרות נוספות שיכולות לעניין אותך

You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion
Key job responsibilities
The successful candidate will be operationally responsible for a Data Center. Some high-level responsibilities include:
- Prioritize and assign trouble tickets to data center technicians and operators
- Recruit and train data technicians to ensure appropriate staffing levels
- Ensure effective and efficient management of day to day data center operations including queue management, 7/24 shift arrangement and hardware logistics
- Fast learn or act as the subject matter expert across all aspects in data center operations
- Ensure all operational KPIs and metrics are being measured and met- Manage Large Scale Events (outages) and act as the call leader
- Manage and improve the work-flows and through-put for data centers operations
- Recommend, document, and oversee policies and procedures to meet industry best practices and to meet required SLAs
- Maintain the on-call schedule coordinating absence and vacationsDiverse Experiences
AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Work/Life Balance
- 4+ years of Information Technology (IT) experience, or Bachelor's degree in computer science, engineering, mathematics or equivalent
- 2+ years of experience managing people in a technical environment.
- 2+ years experience in participating in on-call rotations, and providing after-hours support in an environment that operates 24/7, Networking and Computer Hardware.
- Experience in technical writing in a relevant field
- Experience in project management
- In-depth knowledge of Linux systems administration, Networking and Cabling best practices
- In-depth hardware architectures knowledge and troubleshooting experience, system management tools and client/server environments

We’re searching for an experienced Circuit Design & Analysis engineer with a background in custom circuit design & analysis, system level thermal & power analysis with a proven track record of handling challenges at scale. In this role, you’ll be working directly with product engineers, signal & power integrity engineers and physical design experts - defining best practices, driving correlation of pre-silicon simulation of thermal & power integrity to post silicon analysis and developing custom circuits that help raise the bar in implementing state-of-the-art machine learning hardware.Key job responsibilities
- Design and implement custom cells / IP.
- Develop & run characterization flows for custom cells / IP developed.
- Own integration & post-silicon qualification of IPs like PLL, PCIE, UCIE, HBM, sensors/monitors.
- Develop scripts to automate running analysis and collect reports.
- Develop test-plan and perform measurements in the lab to correlate with simulation data.
A day in the life
Depending on the state of the project, you may find yourself working on the following:- Evaluate IPs (like sensors, process monitors) from a 3rd party
- Develop an characterize custom IPs like ganged buffers, custom logic cells for specialized operations (like MACs)
- Work with designers and architects to identify pain-points and areas where custom solutions can improve PPAS
- Do post-silicon quality checks for key IP like PLLs, UCIE/PCIE, HBM
- Do post-silicon power measurements of jitter, sensor calibration, power and correlate with simulationWork/Life Balance
Mentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.Diverse Experiences
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
- BS + 8yrs or MS + 6yrs or PhD + 3yr in EE/CS
- Expertise on circuit level analysis using tools like SPICE / SPECTRE
- Expertise in interconnect & transistor fundamentals in deep sub-micron processes
- Understanding of ASIC Physical Design from RTL-to-GDSII
- Understanding of other sign-off activities (ir/em, physical verification, timing closure, DFT)
- 3+ years of scripting experience with Tcl, Perl or Python

Key job responsibilities
Our work is characterized by high scale, complexity and the need for invention. We offer great opportunities to work on low-latency distributed systems in the machine learning space.A day in the life
- 5+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience in object oriented programing in enterprise environment
- Experience with distributed cloud systems
- Experience with CI/CD systems, build automation and familiar with DevOps approach
- Knowledge and experience with various processes in the full SDLC (coding standards, code reviews, source control, build systems, integration and deployment, maintenance, updates, etc.)

Export Control Requirement:
Key job responsibilities- Provide escalation support for troubleshooting complex networking issues, including switching, routing, interconnectivity, performance, and platform configurations
- Establish and maintain network engineering standards, partnering with cross-functional teams to ensure alignment with industry best practices and security requirements
- Develop and enhance network automation tools to improve operational efficiency and reduce manual intervention- Engage proactively with technology vendors to drive bug fixes, feature enhancements, and product improvements, ensuring our network infrastructure remains leading-edge and reliable.
- 4+ years of major internet routing protocols experience
- 4+ years of working in a Linux/Unix environment experience
- 1+ years of automation scripting using Python, Bash, Shell and/or Perl experience
- Experience with AWS and AWS networking products such as Direct Connect and Transit Gateway

You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.You will be a part of the global BIM technology team responsible for implementing, managing, and adopting new BIM technologies and systems to support the design, construction, and operation of Amazon-owned data center facilities.You will be responsible for administering and implementing new BIM systems. You will oversee the day-to-day operations of such tools and propose process improvements. You will work with BIM stakeholders to collect requirements and coordinate development with the engineering team.Key job responsibilities
- Administer BIM Common Data Environments, such as Autodesk Construction Cloud (former BIM 360), and other technology solutions.
- Use technology to design, develop, deploy, test, and maintain BIM solutions.
- Produce comprehensive, usable BIM systems documentation.
- Evangelize, educate, and support primary design technology applications.
- Contribute to the development of technical solutions, programs, or scripts that use BIM API and SDK such as Revit and Autodesk Platform Services.
- Coordinate information management, smart object metadata and classification codes with asset information databases.
- Perform BIM data QA/QC and analysis.- Assist with establishing and maintaining BIM standards, processes, and workflows across regions.Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.Why AWSWork/Life BalanceMentorship and Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
- 5+ years of non-internship professional experience administering and implementing BIM programs and systems.
- 5+ years of working with Revit, Autodesk Construction Cloud.
- 3+ years of using BIM API and SDKs.

The VMR Operations organization is looking for a Senior Security Engineer with deep technical expertise in security operations and vulnerability management to join us in building and maturing our VMR Operations programs. In this role, you will be responsible for defining and holding the security bar for VMR’s Campaign Management Program, which enables builders and drives remediation actions across Amazon. You will leverage relationships with engineers across Stores Security, business teams, and leaders throughout Amazon to ensure security, compliance, and privacy risks are correctly assessed and mitigations properly designed. You will build trust with senior engineers and leaders by leveraging your technical expertise to understand technical challenges, design improved solutions, and hold a high program bar. Finally, you will use your information security expertise to champion and drive risk-based decisions across complex, multi-disciplinary programs to ensure Amazon properly manages security risk.Key job responsibilities
You will be responsible for defining and enforcing the quality bar for all security, privacy, and compliance campaigns launched across Amazon.You will leverage relationships with engineers and managers across Stores Security to intake, prioritize, and launch security campaigns.You will build trust with Amazon builders by ensuring all campaigns we launch are of the highest quality bar, including clear descriptions of the security risk, steps for remediation, and insight to prevent future issues.You will partner with your peers to continuously improve the Campaign Management Program, with an emphasis on constant improvement through small, iterative changes to our processes and tooling.About the team
Diverse Experiences
Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture.
Mentorship and Career growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional.
- Bachelor's Degree in Computer Science, Computer Engineering, Software Engineering, Cybersecurity, or related technical degree; or 4+ years equivalent technology experience.
- 7 years engineering experience in system, network, and/or application security or the development of security products.
- 5 years experience improving accuracy of vulnerability detection mechanisms across a diverse technical ecosystem.
- 5 years experience and deep knowledge of vulnerabilities, exploits and vulnerability management systems. Experience building applications or systems on cloud-based services.
- Understanding of networking, operating system internals, and system design.
- Experience across the vulnerability management and remediation lifecycle from assessment through remediation.
- Experience with developing exploits, host-based, and container-based detections.
- Experience with AWS services.

The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced cross-disciplinary staff to conceive and design infrastructure technologies. You will work closely with an internal inter-disciplinary team, and outside partners to drive key aspects of product definition, execution and test in manufacturing. A successful candidate will be responsive, flexible and able to succeed within an open collaborative peer environment. You will:* Be responsible for the test validation of future technologies.
* Drive manufacturing process improvements to address reliability issues and concerns.
* Qualify manufacturing lines and mechanisms for mass production
* You will have a fundamental understanding of Reliability statistics/Reliability tests and/or solid understanding of computer systems to influence design for reliability.
* Lead identifying and validating product/component risks and work with design teams to mitigate them and define the test methodology and test coverage to assure product reliability.
* Deep-dive in technologies aligned with product roadmap.
* Provide technical leadership and mentor engineers.
* Perform Reliability prediction of failure mechanisms, products under development and products in the field.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.Key job responsibilities
* Responsible for defining reliability tests to be implemented during manufacturing
* Drive manufacturing process improvements to address reliability issues and concerns.
* Perform Reliability prediction of failure mechanisms, products under development and products in the field.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.
- Bachelor's or Master’s degree in Reliability Engineering, Physics or related field, or equivalent experience
- 7+ years of Reliability Engineering work experience with server compute platforms or on high-tech hardware
משרות נוספות שיכולות לעניין אותך