Finding the best job has never been easier
Share
As an, you will collaborate closely with cross-functional teams, including software engineers, data scientists, and operations, to monitor, analyze, and optimize our systems. Your primary responsibility will be to collect, analyze, and present key performance indicators (KPIs) that drive operational excellence and inform strategic decisions.
What you’ll be doing:
Develop, test, and deploy data collectors, pipelines, and services to enhance use of our AI/ML and chip development infrastructure
Participate in the full life-cycle of tool development, test, and deployment.
Work in a diverse team to provide operational and strategic metrics which empower our engineers to develop at the speed of light.
Continuously improve our chip develop process through better observability
Directly contribute to the overall quality and improve time to market for our next generation chips.
What we need to see:
Experience in applying data analysis principles and influencing data-driven decisions
Experience with turning raw data into actionable reports
Hands-on experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open source tools
Authoritative level Python programming experience and use of API calls
Extensive experience with CI/CD pipelines such as Jenkins and/or GitLab
Passion for improving the productivity of others
Excellent planning and interpersonal skills
Flexibility/adaptabilityworking in a dynamic environment with changing requirements
MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience.
5+yrs of relevant experience.
Ways to stand out from the crowd:
Hands-on experience running GPU-based workloads in a batch computing environment
Passion for gathering and visualizing metrics and data
Experience with chip design workflows, such as front end verification, back end workflows, or mixed signal workflows
Experience with job schedulers (in particular IBM Spectrum LSF and/or SLURM)
Mastery of distributed system principles
You will also be eligible for equity and .
These jobs might be a good fit