Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Nvidia Senior Software Engineer – Simulation Virtualization 
United States, Texas 
49293862

18.02.2025

a Senior orobservability solutions for multiple


Be Doing:

  • Collaborate with AI, HW, and SW engineering and research teams to define a vision and roadmap for AI/HPC cluster observability.

  • Architect and lead teams to develop, test, and deploy data collectors, pipelines, visualization and retrieval services.

  • Define data collection and retention polices to balance network bandwidth, system load, and storage capacity costs with data analysis requirements.

  • Work in a diverse team to provide operational and strategic data to empower our engineers and researchers to improve performance, productivity, and efficiency.

  • Continuously improve quality, workloads, and processes through better observability.

What We NeedSee:

  • Experience designing and building large scale, distributed observability systems.

  • Ability to collaborate with data scientists, researchers, and engineering teams to identify high value data for collection and analysis.

  • Experience with turning raw data into actionable reports

  • Experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open-source tools

  • Technical lead level Python programming experience and use of API calls

  • Passion for improving the productivity of others

  • Excellent planning and interpersonal skills

  • Flexibility/adaptabilityworking in a dynamic environment with changingrequirements

  • MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience

  • 12+ years of relevant experience.

Ways To Stand OutThe Crowd:

  • Background in computer science, machine learning, deep learning, open-source software, infrastructure technologies, and GPU technology.

  • Prior experience in infrastructure software, production application software development, software development, release and support methodology and devops

  • Experience in the management of datacenters and large-scale distributed computing

  • Experience in working with AI researchers and/or EDA developers

  • Consistent track record of driving process improvements and measuring efficiency and a passion for sharing knowledge and experience driving complex projects end-to-end.

You will also be eligible for equity and .