Expoint – all jobs in one place
Finding the best job has never been easier

Linux Software Engineer - Applications Engineering jobs at Red Hat

Advance your career in high tech with Expoint. Discover job opportunities as a Linux Software Engineer - Applications Engineering and join top companies in the industry such as Red Hat. Sign up today and take control of your future.
Company (1)
Job type
Job categories
Job title (1)
United States
State
City
280 jobs found
07.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Colorado, Denver

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Expand
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Illinois, Springfield

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Expand
06.09.2025
R

Red hat Context Engineer / Prompt Engineer- AI Testing United States, North Carolina, Raleigh

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Expand
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, District of Columbia, Washington

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Expand
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, New York, City of Albany

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Expand
06.09.2025
R

Red hat Senior Software Engineer United States, North Carolina, Raleigh

Limitless High-tech career opportunities - Expoint
Description:

Job Summary:

During this internship, you will help apply data management and analysis techniques to quantify the health and sustainability of open source software communities. You will have regular contact with AL/ML engineers and other data scientists gaining important insights into how Red Hat fosters its open source communities to grow its business. And you'll collaborate with open source community builders to provide timely reports of upstream activities to Red Hat stakeholders.

Job Responsibilities:

  • Collect and document input from Red Hat technology users, customers, and other stakeholders to understand customer needs and requirements.

  • Execute and develop a competitive analysis through researching competitive solutions and documenting their relative strengths and weaknesses.

  • Develop and prioritize and document requirements, epics, and user stories for new releases of our offerings.

  • Translate key findings into visualized presentations

  • Network with other talented interns in an inclusive workplace where you can be yourself and thrive

Required Skills:

  • Excellent written and verbal communication skills in English

  • Ability to manage tasks, meet deadlines and analyze data to foster data-driven decisions

  • Ability to effectively establish and maintain communication with both internal and external stakeholders

  • Strong organizational and logistical skills

  • Passion, curiosity, and desire to create new things and examine how things work internally

  • Willingness to learn and proactively work as a part of a wider team

Expand
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, California, Sacramento

Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Expand
Limitless High-tech career opportunities - Expoint
Description:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Expand
Discover your dream career in the high tech industry with Expoint. Our platform offers a wide range of Linux Software Engineer - Applications Engineering jobs opportunities, giving you access to the best companies in the field, like Red Hat. With our easy-to-use search engine, you can quickly find the right job for you and connect with top companies. No more endless scrolling through countless job boards, with Expoint you can focus on finding your perfect match. Sign up today and follow your dreams in the high tech industry with Expoint.