Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Ebay AI Hardware Systems Engineer 
United States, California, San Jose 
937933014

08.08.2024

Key Responsibilities:

  • You will work as part of the Hardware Engineering team to reduce the cost of purchasing and operating eBay’s fleet of servers, saving millions of dollars a year.

  • At eBay we believe that AI will fundamentally change the way we work and do business, as such you primary focus will be working on our AI hardware platforms.

  • You will translate internal customer requests into requirements, and develop benchmarks and test suites to ensure our platforms meet their needs.

  • Evaluate the performance and reliability of new hardware platforms and hardware components using automated tests, with a strong focus on AI accelerators.

  • Expand and maintain our automation that we use daily for testing, and reliability work.

  • Develop performance test plans and experiments with our customer teams to ensure we are able to utilize our hardware to the fullest of its ability.

  • Work with our customers to debug, and address any reliability or performance issues they have with our server products.

  • Identify and suggest the ideal OS and BIOS settings for our systems.

  • Explore and propose new hardware/software technologies that improve performance, or reduce cost of our products, particularly new AI accelerators.

  • You will improve our monitoring and data collection tooling, to ensure we’re recording relevant information.

What you need:

  • You have at least 5-8 years of systems engineering experience using Linux as an operating system.

  • You should understand how to configure servers to expose AI accelerators.

  • Experience with AI frameworks and platforms, ideally with experience benchmarking services or accelerators. Things like pytorch, deepspeed, or MLPerf for benchmarking.

  • You should be able to explain how linux utilizes various hardware components, and what tunables it provides.

  • We primarily use Python and Bash for automating tasks, you must be proficient in one of these languages.

  • You should have used a revision control system like GIT, and be familiar with concepts like branching and merging.

  • You must be able to build and use containers using Docker or another technology.

  • You should understand how to compile and build source code, especially the linux kernel.

  • BS EE or CS with continued formal or informal education

Desired Skills:

  • It would a bonus if you understood the differences between AI accelerators from multiple vendors, and the differences in their architectures.

  • We’d like you to be familiar with extending a monitoring framework like Prometheus, so we can collect additional data from our testing.

  • We’d like for you to be familiar with Kubernetes and cloud computing concepts.

  • It would be a bonus if you’ve used various profiling and performance tools like perf, vtune, or performance co-pilot.

  • It would be great if you have experience analyzing logs, and working with data repositories to help drive technical decisions.

  • It would be a bonus if you’ve deployed and configured systems at scale using standard technology like PXE, Ansible, Salt, and Puppet.

  • Position ideally based in San Jose, CA with minimal travel required.

The pay range for this position at commencement of employment in California, Washington, or New York is expected in the range below.

$149,200 - $234,850

This website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our for more information.