Finding the best job has never been easier

Senior Technical Support Engineer jobs at Red Hat in United States, Sacramento

Discover your perfect match with Expoint. Search for job opportunities as a Senior Technical Support Engineer in United States, Sacramento and join the network of leading companies in the high tech industry, like Red Hat. Sign up now and find your dream job with Expoint

Company (1)

Job type

Job categories

Job title (1)

United States

State

Sacramento

Creation date

5 jobs found

06.09.2025

Red hat Senior Performance Resilience Engineer - LLM Inference United States, California, Sacramento

Limitless High-tech career opportunities - Expoint

Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD. Design GPU/accelerator-aware fault experiments that target vLLM...

Description:

What you will do:

Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD
Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)
Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)
Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates
Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)
Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d
Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.
Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

3+ years in reliability, and/or performance engineering on large-scale distributed systems
Expertise in systems‑level software design
Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)
Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.
Fluency in Python (data & ML), strong Bash/Linux skills
Exceptional communication skills - able to translate raw data into customer value and executive narratives
Commitment to open‑source values and upstream collaboration

The following is considered a plus:

Master’s or PhD in Computer Science, AI, or a related field
History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering
Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

Red hat Senior Performance Resilience Engineer - LLM Inference United States, District of Columbia, Washington

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Colorado, Denver

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Illinois, Springfield

Red hat Senior Performance Resilience Engineer - LLM Inference United States, North Carolina, Raleigh

06.07.2025

Red hat Senior Solution Architect United States, California, Sacramento

Strategic Enablement Leadership: Drive the development and execution of technical enablement plans for key partners, ensuring alignment with Red Hat’s business priorities and regional growth objectives. Solution Design & Innovation:...

Description:

What you will bring:

Strategic Enablement Leadership: Drive the development and execution of technical enablement plans for key partners, ensuring alignment with Red Hat’s business priorities and regional growth objectives.
Solution Design & Innovation: Architect and deliver customized, scalable solutions that address complex business and technical challenges across hybrid cloud environments.
Cross-Functional Coordination: Lead collaboration across Red Hat’s technical ecosystem — including product specialists, consulting, and support — to accelerate customer adoption and ensure long-term success.
Technical Mentorship: Coach and develop junior architects and partner engineers, fostering a culture of excellence, innovation, and best practices within the team.
Hands-On Evangelism: Facilitate technical workshops, RHUGs, community events, proof-of-concepts, and joint innovation initiatives with partners and customers to showcase the value of Red Hat’s technologies.

The salary range for this position is $202,380.00 - $323,780.00 (inclusive of base pay + target incentive compensation). Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

04.07.2025

Red hat Senior Technical Support Engineer United States, California, Sacramento

Commitment to providing an exceptional customer experience by using professional communication and applying product knowledge and deep troubleshooting to perform direct actions in cluster environments to resolve various issues. Contribute...

Description:

What you will do:

Commitment to providing an exceptional customer experience by using professional communication and applying product knowledge and deep troubleshooting to perform direct actions in cluster environments to resolve various issues.
Contribute to global initiatives and projects to constantly reduce customer effort, improve tooling, and design and write automation software to improve efficiency.
Act as the direct contact and advisor for customer inquiries and issues with their Cloud Services through our Customer Portal, conference calls, and remote access.
Proactively analyze cluster status, identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions.
Record customer interactions including investigation, troubleshooting, and resolution of issues, to document diagnostic steps and issue resolution to create reusable solutions for future incidents.
Create and maintain knowledge articles aligned with the KCS (Knowledge-Centered Service) methodology.
Partner with internal teams and external parties to deliver seamless infrastructure support for Red Hat’s Cloud Services.
Manage incident and issue workloads to ensure that all customer issues are handled and resolved in a timely manner.
Maintain a strong work ethic, able to work effectively as part of a team, and focus on customers and resolving their issues.
Be available to perform weekend shift duties on a rotational schedule.

What you will bring:

5+ years of experience in a customer-facing technical support or solutions engineering role.
Proven experience in Infrastructure Implementation, Deployment, Administration, and Production Support of container technologies and orchestration platforms (e.g., CRI-O, Kubernetes, xKS, Docker, OpenShift Container Platform).
Experience with developer workflows, Continuous Integration (e.g., Jenkins), and Continuous Deployment paradigms.
Exceptional technical, analytical, and troubleshooting skills using tools like curl, strace, oc (kubectl), and Wireshark analysis to investigate and form precise action plans for issue remediation with components such as networking, system performance issues, Kubernetes, OpenShift Container Platform, Service Mesh, and RESTful API calls.
Experience working with tools surrounding the Kubernetes ecosystem such as Prometheus, Grafana, FluentD, etc.
Experience working with configuration management tools (e.g., Ansible, Terraform) and monitoring and automation tools (e.g., Ansible, Splunk).
Proficient scripting and automation skills (e.g., Python, Bash, Go) to convert manual and maintenance functions into fully orchestrated automation is a plus.
Ability to operate in complex, highly secure, and highly available environments and interact with Site Reliability Engineering (SRE) domain experts maintaining those environments.
Familiarity with established ITIL practices such as Incident, Change, Problem, and Release Management.
Excellent English communication skills (written and verbal) and interpersonal skills, with a desire to mentor other members of the support team and share technical knowledge in a helpful and timely fashion.
Experience logging issues and working with issue tracking tools such as Jira.
Ability to work effectively as part of an agile team, actively communicate status, and complete deliverables on schedule with a strong sense of initiative and ownership.
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Ability to work effectively and collaborate within a geographically distributed, global team.

The salary range for this position is $84,400.00 - $134,970.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

02.05.2025

Red hat Principal Software Engineer United States, California, Sacramento

Develop and maintain a high-quality, high-performing AI Core platform open source upstream stack enabling Red Hat AI/MLOps platforms offerings. Maintain CI/CD build pipelines for container images that allow faster, more...

Description:

About the Job

In this role, you will work with a diverse team of highly motivated engineers on designing, implementing, and integrating AI Core Platform capabilities and contribute directly to upstream communities such as PyTorch, FSDP, vLLM and Triton, and others.

You will work closely with product engineering groups within Red Hat focused on integrating and delivering enterprise-ready software that’s hardened, tested, and securely distributed with our AI/MLOps platforms.

What you will do

Develop and maintain a high-quality, high-performing AI Core platform open source upstream stack enabling Red Hat AI/MLOps platforms offerings
Maintain CI/CD build pipelines for container images that allow faster, more secure, reliable, and frequent releases
Contribute directly to upstream runtime communities such as PyTorch, FSDP, vLLM, Triton, and others.
Consistently participate and take leadership opportunities in community meetings and foundation/project governance topics.
Share upstream contributions at events, conferences and via technical blogs and publications
Coordinate and communication with various Red Hat product and open source stakeholders
Applying a growth mindset by staying up-to-date on the latest advancements in AI frameworks, hardware accelerators, and ML advancements

What you will bring

Highly experienced with programming in Python and PyTorch
Experience with hardware accelerators (e.g., GPUs, FPGAs) for AI workloads
Experience with Python packaging such as PyPI libraries
Development experience with C++ and CUDA APIs is a big plus
Solid understanding of the fundamentals of model training and inferencing architectures
Experience with Git, shell scripting, and related technologies
Experience with the development of containerized applications in Kubernetes
Experience with Cloud Computing using at least one of the following Cloud infrastructures AWS, GCP, Azure, or IBM Cloud
Ability to work across a large distributed hybrid engineering team
Experience with open-source development is a plus

The salary range for this position is $170,600.00 - $281,370.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

17.04.2025

Red hat Principal Software Engineer United States, California, Sacramento

Description:

About the Job

What you will do

Develop and maintain a high-quality, high-performing AI Core platform open source upstream stack enabling Red Hat AI/MLOps platforms offerings
Maintain CI/CD build pipelines for container images that allow faster, more secure, reliable, and frequent releases
Contribute directly to upstream runtime communities such as PyTorch, FSDP, vLLM, Triton, and others.
Consistently participate and take leadership opportunities in community meetings and foundation/project governance topics.
Share upstream contributions at events, conferences and via technical blogs and publications
Coordinate and communication with various Red Hat product and open source stakeholders
Applying a growth mindset by staying up-to-date on the latest advancements in AI frameworks, hardware accelerators, and ML advancements

What you will bring

Highly experienced with programming in Python and PyTorch
Experience with hardware accelerators (e.g., GPUs, FPGAs) for AI workloads
Experience with Python packaging such as PyPI libraries
Development experience with C++ and CUDA APIs is a big plus
Solid understanding of the fundamentals of model training and inferencing architectures
Experience with Git, shell scripting, and related technologies
Experience with the development of containerized applications in Kubernetes
Experience with Cloud Computing using at least one of the following Cloud infrastructures AWS, GCP, Azure, or IBM Cloud
Ability to work across a large distributed hybrid engineering team

Following is considered a plus

Experience with open-source development

The salary range for this position is $170,600.00 - $281,370.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

Red hatSenior Performance Resilience Engineer - LLM Inference

United States, California, Sacramento

645719052

06.09.2025

Description:

What you will do:

Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD
Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)
Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)
Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates
Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)
Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d
Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.
Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

3+ years in reliability, and/or performance engineering on large-scale distributed systems
Expertise in systems‑level software design
Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)
Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.
Fluency in Python (data & ML), strong Bash/Linux skills
Exceptional communication skills - able to translate raw data into customer value and executive narratives
Commitment to open‑source values and upstream collaboration

The following is considered a plus:

Master’s or PhD in Computer Science, AI, or a related field
History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering
Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Full job details

These jobs might be a good fit

Red hat Senior Performance Resilience Engineer - LLM Inference United States, District of Columbia, Washington

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Colorado, Denver

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Illinois, Springfield

Red hat Senior Performance Resilience Engineer - LLM Inference United States, North Carolina, Raleigh

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

Find your dream job in the high tech industry with Expoint. With our platform you can easily search for Senior Technical Support Engineer opportunities at Red Hat in United States, Sacramento. Whether you're seeking a new challenge or looking to work with a specific organization in a specific role, Expoint makes it easy to find your perfect job match. Connect with top companies in your desired area and advance your career in the high tech field. Sign up today and take the next step in your career journey with Expoint.