דרושים Network Site Reliability Engineer ב-אנבידיה ב-United Kingdom, Southampton

Research novel AI techniques to secure next-generation networks and apply them to existing NVIDIA products. Investigate and analyze network telemetry for indicators of vulnerabilities in secure environments. Collaborate with NVIDIA...

UK, Cambridge

Germany, Remote

time type: Full time

posted on: Posted 6 Days Ago

job requisition id

What you'll be doing:

Research novel AI techniques to secure next-generation networks and apply them to existing NVIDIA products.
Investigate and analyze network telemetry for indicators of vulnerabilities in secure environments.
Collaborate with NVIDIA researchers to explore innovative ways to improve security in networking products.
Contribute to projects that have potential real-world impact on NVIDIA's product portfolio.

What we need to see:

Holding a PhD or MSc or equivalent experience in Electrical Engineering, Computer Science, or a related field with a focus on AI.
5+ years of relevant experience.
Experience with innovative AI tools, frameworks, and methods related to cybersecurity incident detection and prevention.
Background in cybersecurity, networking (TCP/IP), and network security (TLS/IPSec).
Solid programming skills and a deep understanding of secure system design.

Ways to stand out from the crowd:

PhD with a track record of publication in top peer-reviewed AI conferences or equivalent experience leading AI projects.
Expertise with LLMs and recent advancements in neural networks.
Architectural knowledge of system security.
Understanding of common attack vectors targeting network devices and methods to mitigate them.
A proven track record to translate sophisticated research into practical solutions.

משרות נוספות שיכולות לעניין אותך

Nvidia Senior Security Research Architect United Kingdom, England, Southampton

Nvidia Senior Solutions Architect HPC AI United Kingdom, England, Southampton

Amazon Senior Security Engineer AI United Kingdom, England, London

JPM

JPMorgan Senior Product Security Architect United Kingdom, England, London

09.11.2025

Nvidia Senior Systems Engineer Artificial Intelligence Operations United Kingdom, England, Southampton

שיתוף

You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs. develop automated workflows for issue detection...

time type: Full time

posted on: Posted 13 Days Ago

job requisition id

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.

What will you be doing:

You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems. We will bring to bear the findings for product improvements!
deliver compelling technical presentations and lead hands-on demos or training. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.

What we need to see:

Bachelor of Science or equivalent experience
12+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development

Ways to stand out from the crowd:

Experience with data center infrastructure and cloud architectures
Background in network performance monitoring or observability
Previous experience working at a technological start-up

משרות נוספות שיכולות לעניין אותך

08.11.2025

Nvidia Software Configuration Management Engineer – Hardware United Kingdom, England, Southampton

שיתוף

Responsible for the full SCM environment including application, OS, and server hardware components, developing the continued automation and innovation needed for our large environment. Create new solutions to improve the...

UK, Cambridge

time type: Full time

posted on: Posted 11 Days Ago

job requisition id

For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning. This new model - where deep neural networks are trained to recognize patterns from massive amounts of data - has shown to be deeply effective at solving some of the most complex problems in everyday life.

NVIDIA runs one of the largest Perforce installations in the world, and a very large Git installation as well. Our Software Configuration Management (SCM) Tools and Infrastructure group is looking for a top SCM architect. You will tackle the challenges that we face with operating at scale to produce a best-in-industry solution and enable us to continue to provide unprecedented performance and reliability for our users. You will work in our team to engineer new solutions to scale our Perforce and Git infrastructure to handle large and ever-growing load and data volume. You will design and code processes and automation tools to improve productivity managing and administering the SCM systems and applications used by our globally distributed engineering teams.

What you'll be doing:

Responsible for the full SCM environment including application, OS, and server hardware components, developing the continued automation and innovation needed for our large environment
Create new solutions to improve the reliability and performance of our ever-growing infrastructure, and work with automated orchestration tools to deploy those improvements to hundreds of systems worldwide
Be part of a global team and will evaluate technology alternatives, work closely with other project members to specify solutions, craft schedules, and lead ongoing enhancements and support
Learn and greatly improve the daily productivity of the world’s top chip designers and software engineers

What we need to see:

MS (preferred) or BS in Computer Science (or equivalent experience) or a related field with at least 3+ years of experience
Deep understanding of Software Configuration Management (SCM) processes and tools such as Perforce, Git, Subversion, or ClearCase for large, multi-site development
You've configured/deployed Continuous Integration (CI) and Continuous Deployment (CD) systems in your past experience
Excellent interpreted language skills highly desired – Object Oriented Perl or Python preferred and Strong software engineering process skills required
Strong object-oriented programming and design pattern knowledge and background - Object Oriented Perl, Python, C++, or Java preferred
Experience with databases, MySQL or Postgres preferred, experience with NoSQL databases a plus
Experience with DevOps or system administration with Linux systems required (CentOS/RHEL and Ubuntu preferred)
Strong experience with automation required, Ansible or Puppet preferred and Excellent interpersonal skills, including written and verbal communication
You are comfortable and enjoy working with dynamic and ever evolving environments

Ways to stand out from the crowd:

Meticulous organizer with an ever positive, can-do attitude
Demonstrate use of out-of-box thinking for creative solutions to highly sticky problems
Fun and enthusiastic teammate who enjoys a challenge and celebrates success

משרות נוספות שיכולות לעניין אותך

25.10.2025

Nvidia Senior System Software Engineer Platform Operations United Kingdom, England, Southampton

שיתוף

Architect, build, and evolve the scalable technology stack for global learner and instructor technical support. Lead the global operationalization of support systems, to ensure high availability, performance, and efficient resource...

France, Remote

Germany, Remote

time type: Full time

posted on: Posted 4 Days Ago

job requisition id

What you’ll be doing:

Architect, build, and evolve the scalable technology stack for global learner and instructor technical support.
Lead the global operationalization of support systems, to ensure high availability, performance, and efficient resource utilization.
Provide technical leadership and mentorship to a distributed operations team, driving excellence in the use of support technologies and processes.
Collaborate cross-functionally to translate support insights and user feedback into systemic improvements to shared NVIDIA services, the DLI platform, and overall experience for enterprises, learners, and instructors.

What we need to see:

Bachelor’s degree in Computer Science, a related technical field, or equivalent experience
Over 6 years of DevOps experience optimizing, deploying and running containerized applications (Docker, Kubernetes) across AWS, Azure, and GCP, including hands-on work with EKS, AKS, and GKE.
Proficient in Python and Linux shell scripting for automation, application development, system administration, and troubleshooting.
Validated experience architecting, implementing, and managing cloud infrastructure using Terraform.
Demonstrated ability as a meticulous problem-solver with strong analytical skills, capable of diagnosing and resolving complex technical challenges under pressure.
Excellent communication, teamwork, and collaboration skills, with an ability to articulate technical concepts clearly to diverse audiences and lead technical responses during incidents.

Ways to stand out from the crowd:

Proven experience designing and implementing event-driven architectures using pub/sub patterns with platforms like AWS SNS / SQS, Google Pub / Sub, or Azure Service Bus.
Knowledge of generative AI architectures (LLMs, diffusion models) and concepts such as Retrieval Augmented Generation (RAG) and vector databases.
Hands-on experience with the NVIDIA AI stack (NeMo, Triton Inference Server, TensorRT) for model development, serving, and optimization. Production experience with NVIDIA NIM is a strong plus.
Experienced in building and running CI/CD pipelines (Jenkins, GitLab CI) and managed software development environments, applying SRE principles to automate, enhance reliability, and improve performance.
Familiarity with Python-based Learning Management Systems (LMS) such as Open edX.

משרות נוספות שיכולות לעניין אותך

14.10.2025

Nvidia Network Site Reliability Engineer United Kingdom, England, Southampton

שיתוף

Owning the operational aspect of the network infrastructure, ensuring its high availability and reliability. Partnering with architecture and deployment teams to guarantee that new implementations are supportable and align with...

UK, Reading

time type: Full time

posted on: Posted 25 Days Ago

job requisition id

This crucial role will be focused on user satisfaction and brilliance in Network Operations. This SRE engineer will focus on tackling significant projects and is committed to fostering a supportive atmosphere that offers the mentorship necessary for professional development and growth. They will bring a wealth of skills and experience to be a sought after mentor, who leads by example.

What you'll be doing:

Owning the operational aspect of the network infrastructure, ensuring its high availability and reliability.
Partnering with architecture and deployment teams to guarantee that new implementations are supportable and align with production standards.
Advocating for and implementing automation to reduce toil and enhance operational efficiency.
Monitoring network performance, identifying areas for improvement, and coordinating with relevant teams to execute enhancements.
Collaborating with SMEs to resolve production issues swiftly and effectively, maintaining customer satisfaction.
Identifying opportunities for operational improvements and partnering with teams to develop solutions that drive excellence and sustainability in network operations.

What we need to see:

BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
Minimum of 8 years of industry experience in network site reliability engineering, network automation, network operations, or related areas. Experience on both campus and data center networks.
Familiarity with network management tools such as Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda
Expertise in automating networks using frameworks such as Salt, Ansible, or similar.
In depth experience in one or more of the following: Python, Go.
Knowledge in network technologies such as TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, L2 switching, , Firewalls, Load Balancers, EVPN, VxLAN, Segment Routing. Proven track record in network operations.
Skills with ServiceNow and Jira
Knowledge of Linux system fundamentals is a plus.
Systematic problem-solving approach, coupled with excellent communication skills and a sense of ownership and drive.

Ways to stand out from the crowd:

Track record of taking operational signals through means such as SNMP, Syslog, Streaming Telemetry to solve operational challenges
History of debugging and optimizing code; automating routine tasks.
Experience with Mellanox/Cumulus Linux, Palo Alto firewalls, Netscalers and F5 load balancers
Previous SRE experience

משרות נוספות שיכולות לעניין אותך

26.08.2025

Nvidia Senior HPC AI Cluster Engineer United Kingdom, England, Southampton

שיתוף

Designing, implementing and maintaining large scale HPC/AI clusters with monitoring, logging and alerting. Managing Linux job/workload schedules and orchestration tools. Developing and maintaining continuous integration and delivery pipelines. Developing tooling...

time type: Full time

posted on: Posted 14 Days Ago

job requisition id

What you will be doing:

Designing, implementing and maintaining large scale HPC/AI clusters with monitoring, logging and alerting
Managing Linux job/workload schedules and orchestration tools
Developing and maintaining continuous integration and delivery pipelines
Developing tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources
Deploying monitoring solutions for the servers, network and storage
Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
Being a technical resource, developing, re-defining and documenting standard methodologies to share with internal teams
Supporting Research & Development activities and engaging in POCs/POVs for future improvements

What we need to see:

Bachelor's Degree in Computer Science, Engineering, or a related field; or equivalent experience
5+ years of experience
Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software
Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.
Python programming and bash scripting experience.
Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
Deep knowledge of Networking Protocols like InfiniBand, Ethernet
Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Ways to stand out from the crowd:

Knowledge of CPU and/or GPU architecture
Knowledge of Kubernetes, container related microservice technologies
Experience with GPU-focused hardware/software (DGX, Cuda)
Background with RDMA (InfiniBand or RoCE) fabrics

משרות נוספות שיכולות לעניין אותך

26.07.2025

Nvidia Senior System Software Engineer Defined Networking United Kingdom, England, Southampton

שיתוף

Design, develop, deploy and operate next generation multi-tenant cloud SDN control and data planes software. DevOps automation tasks for SDN stack - CI/CD, GitOps for secure and seamless integration with...

Poland, Remote

time type: Full time

posted on: Posted 2 Days Ago

job requisition id

What you’ll be doing:

Design, develop, deploy and operate next generation multi-tenant cloud SDN control and data planes software
DevOps automation tasks for SDN stack - CI/CD, GitOps for secure and seamless integration with cloud infrastructure components.
To complement the efficient networking architecture, you will help designingInfrastructure-as-a-Servicevirtual network orchestration API-driven services to support tenants workloads security and performance SLAs for BMaaS, VMaaS and Kubernetes.
You will also develop software for network observability (monitoring and telemetry) to enable intelligent metering and performance analysis for KPIs enforcement for tenants workloads

What we need to see:

BA/BS degree in Computer Science, related technical discipline (or equivalent experience), MS preferred
10+ years of experience developing software for large scale distributed environments according to industry standard best DevOps practices
Deep understanding of the modern network stack and protocols
Hands-on experience developing secure and performant API-driven services (gRPC, ReST with transport encryption and strong authentication)
Background in private cloud/large distributed systems architecture design.
Experience with modern data center servers and network equipment (out-of-band management, provisioning, monitoring - IPMI, RedFish, zero-touch provisioning)
Hands-on experience with SDN - OpenFlow, Open Virtual Switch or equivalent solutions
Hands-on experience with one or more SDN solutions (control and data planes)

Ways to stand out from the crowd:

Experience with RDMA (InfiniBand or RoCE) protocols and fabrics designs and deployments
Understanding of container networking (CNI) APIs and implementations
SRE/DevOps: top-level expertise
Hands-on background with Tier 1 CSPs (AWS, Azure and others) services and tools
Hands-on experience with networking hardware acceleration

NvidiaSenior AI Engineer Security Architect

משרות נוספות שיכולות לעניין אותך

1 2

United Kingdom, England, Southampton

593740946

24.11.2025

שיתוף

תיאור:

UK, Cambridge

UK, Remote

Germany, Remote

time type: Full time

posted on: Posted 6 Days Ago

job requisition id

What you'll be doing:

Research novel AI techniques to secure next-generation networks and apply them to existing NVIDIA products.
Investigate and analyze network telemetry for indicators of vulnerabilities in secure environments.
Collaborate with NVIDIA researchers to explore innovative ways to improve security in networking products.
Contribute to projects that have potential real-world impact on NVIDIA's product portfolio.

What we need to see:

Holding a PhD or MSc or equivalent experience in Electrical Engineering, Computer Science, or a related field with a focus on AI.
5+ years of relevant experience.
Experience with innovative AI tools, frameworks, and methods related to cybersecurity incident detection and prevention.
Background in cybersecurity, networking (TCP/IP), and network security (TLS/IPSec).
Solid programming skills and a deep understanding of secure system design.

Ways to stand out from the crowd:

PhD with a track record of publication in top peer-reviewed AI conferences or equivalent experience leading AI projects.
Expertise with LLMs and recent advancements in neural networks.
Architectural knowledge of system security.
Understanding of common attack vectors targeting network devices and methods to mitigate them.
A proven track record to translate sophisticated research into practical solutions.