Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים

דרושים Software/senior Software Engineer - Ansible ב-Red Hat ב-ארהב

מצאו את ההתאמה המושלמת עבורכם עם אקספוינט! חפשו הזדמנויות עבודה בתור Software/senior Software Engineer - Ansible ב-United States והצטרפו לרשת החברות המובילות בתעשיית ההייטק, כמו Red Hat. הירשמו עכשיו ומצאו את עבודת החלומות שלך עם אקספוינט!
חברה (1)
אופי המשרה
קטגוריות תפקיד
שם תפקיד (1)
United States
אזור
עיר
נמצאו 384 משרות
07.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Colorado, Denver

Limitless High-tech career opportunities - Expoint
Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red...
תיאור:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Show more
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Illinois, Springfield

Limitless High-tech career opportunities - Expoint
Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD. Design GPU/accelerator-aware fault experiments that target vLLM...
תיאור:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Show more
06.09.2025
R

Red hat Context Engineer / Prompt Engineer- AI Testing United States, North Carolina, Raleigh

Limitless High-tech career opportunities - Expoint
Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red...
תיאור:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Show more
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, District of Columbia, Washington

Limitless High-tech career opportunities - Expoint
Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD. Design GPU/accelerator-aware fault experiments that target vLLM...
תיאור:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Show more
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, New York, City of Albany

Limitless High-tech career opportunities - Expoint
Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD. Design GPU/accelerator-aware fault experiments that target vLLM...
תיאור:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Show more
06.09.2025
R

Red hat Senior Software Engineer United States, North Carolina, Raleigh

Limitless High-tech career opportunities - Expoint
Collect and document input from Red Hat technology users, customers, and other stakeholders to understand customer needs and requirements. Execute and develop a competitive analysis through researching competitive solutions and...
תיאור:

Job Summary:

During this internship, you will help apply data management and analysis techniques to quantify the health and sustainability of open source software communities. You will have regular contact with AL/ML engineers and other data scientists gaining important insights into how Red Hat fosters its open source communities to grow its business. And you'll collaborate with open source community builders to provide timely reports of upstream activities to Red Hat stakeholders.

Job Responsibilities:

  • Collect and document input from Red Hat technology users, customers, and other stakeholders to understand customer needs and requirements.

  • Execute and develop a competitive analysis through researching competitive solutions and documenting their relative strengths and weaknesses.

  • Develop and prioritize and document requirements, epics, and user stories for new releases of our offerings.

  • Translate key findings into visualized presentations

  • Network with other talented interns in an inclusive workplace where you can be yourself and thrive

Required Skills:

  • Excellent written and verbal communication skills in English

  • Ability to manage tasks, meet deadlines and analyze data to foster data-driven decisions

  • Ability to effectively establish and maintain communication with both internal and external stakeholders

  • Strong organizational and logistical skills

  • Passion, curiosity, and desire to create new things and examine how things work internally

  • Willingness to learn and proactively work as a part of a wider team

Show more
06.09.2025
R

Red hat Senior Performance Resilience Engineer - LLM Inference United States, California, Sacramento

Limitless High-tech career opportunities - Expoint
Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD. Design GPU/accelerator-aware fault experiments that target vLLM...
תיאור:

What you will do:

  • Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD

  • Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)

  • Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)

  • Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates

  • Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)

  • Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d

  • Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

  • Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

  • 3+ years in reliability, and/or performance engineering on large-scale distributed systems

  • Expertise in systems‑level software design

  • Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)

  • Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.

  • Fluency in Python (data & ML), strong Bash/Linux skills

  • Exceptional communication skills - able to translate raw data into customer value and executive narratives

  • Commitment to open‑source values and upstream collaboration

The following is considered a plus:

  • Master’s or PhD in Computer Science, AI, or a related field

  • History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering

  • Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

Show more
Limitless High-tech career opportunities - Expoint
Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red...
תיאור:

What you will do:

  • Win the confidence of our customers through the successful delivery of discovery, analysis and design workshops that shape and influence the customer’s architecture design decisions to align with the Red Hat solution.

  • Lead consulting teams through successful customer pilot and production deployments, workload onboarding and ongoing lifecycle management

  • Work closely with product business units, product engineering, consulting, support, and sales to ensure a world-class customer experience with Red Hat’s products.

  • Contribute to the development of repeatable methodologies and tools designed to scale Red Hat’s services capabilities, promote repeatable customer engagements, and lower delivery risk.

What you will bring:

  • Prior experience working in a consulting and architecture roles

  • Good customer-facing skills; ability to present to customers and lead customer interactions

  • Excellent knowledge of modern telco networks and protocols and their implementation in software based solutions at the architecture and hands-on delivery levels

  • Experience designing and delivering cloud and automation technology for telco network platforms via distributed cross-functional teams and with telco-wide deployments

  • Knowledge of Red Hat’s infrastructure solutions like Red Hat OpenShift Container Platform, Red Hat OpenStack Platform, Red Hat Enterprise Linux (RHEL)

  • Excellent written and verbal language skills in French and English

  • Willingness to travel across the region

  • Red Hat Certifications are considered a plus

Show more
בואו למצוא את עבודת החלומות שלכם בהייטק עם אקספוינט. באמצעות הפלטפורמה שלנו תוכל לחפש בקלות הזדמנויות Software/senior Software Engineer - Ansible בחברת Red Hat ב-United States. בין אם אתם מחפשים אתגר חדש ובין אם אתם רוצים לעבוד עם ארגון ספציפי בתפקיד מסוים, Expoint מקלה על מציאת התאמת העבודה המושלמת עבורכם. התחברו לחברות מובילות באזור שלכם עוד היום וקדמו את קריירת ההייטק שלכם! הירשמו היום ועשו את הצעד הבא במסע הקריירה שלכם בעזרת אקספוינט.