The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Nvidia Senior SRE Software Engineer Storage Data
China, Shanghai
390879333

14.04.2025

China, Shanghai

time type: Full time

posted on: Posted 30+ Days Ago

job requisition id

SRE is also a mindset and a set of engineering approaches to running efficient production systems, with a focus on eliminating manual work through modern automation practices and performance tuning. We promote self-direction to work on meaningful projects while striving to build an environment that provides the support and mentorship needed to learn and grow.

What You Will Be Doing:

Develop strategies to ensure the reliability and availability of storage systems, including redundancy, failover, and disaster recovery plans.
Continuously analyze and fine-tune storage systems for optimal performance, including throughput optimization, caching, and latencyreduction. Identifyand resolve performance bottlenecks to enhance overall system efficiency.
Develop and maintain automation scripts and tools to streamline storage provisioning, configuration, and maintenance tasks.
Implement monitoring and alerting systems to proactively identify and address issues.
Participate in on-call rotation to respond to storage-related incidents promptly conduct root cause analysis of outages and implement preventive measures.
Collaborate with cross-functional teams, including Compute SRE, development, and networking, to ensure seamless integration of large-scale storage solutions.
Work with AI/ML workloads to capture and correlate behavior in large clusters and workflows, which are otherwise hard to understand.

What We Need To See:

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), with 5+ years equivalent practical experience.
Proven experience in storage system administration and site reliability engineering.
Experience with Git, RESTFul API, Linux service operation, networking, complexity analysis, AWS S3, software design, and maintaining large-scale Linux based systems.
Experience in one or more of the following languages: Ansible, Bash, Python, Go, YAML, Java
Good knowledge of infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform.
Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic(OpenSearch) stack, Grafana.

Ways to stand out from the crowd:

Experience with storage solutions like: OpenStack Swift(object), AWS S3(object), DDN, Lustre.
Strong Linux and network troubleshooting skills by running various commands and tools.
Demonstrated experience in having an SRE mindset, customer-first approach, and focus on customer satisfaction and passion for ensuring customer success..
Interest in crafting, analyzing, and fixing large-scale distributed systems. Strong debugging skills with a systematic problem-solving approach to identify complex problems.
Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker.

Full job details

These jobs might be a good fit

Nvidia Senior SRE Software Engineer Storage Data China, Shanghai

Nvidia Senior SRE Software Engineer Storage Data Taiwan, Taipei

Apple Site Reliability Engineer SRE China, Shanghai

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

Nvidia Senior SRE Software Engineer Storage Data China, Shanghai 390879333

Nvidia Senior SRE Software Engineer Storage Data China, Shanghai

Nvidia Senior SRE Software Engineer Storage Data China, Shanghai

Nvidia Senior SRE Software Engineer Storage Data Taiwan, Taipei

Apple Site Reliability Engineer SRE China, Shanghai

Nvidia Senior SRE Software Engineer Storage Data
China, Shanghai
390879333