דרושים Support Engineer ב-United States, California, Sacramento

What you will do:

Own the resilience testing roadmap for vLLM and llm-d: define resilience indicators, prioritize fault scenarios, and establish go/no-go gates for releases and CI/CD
Design GPU/accelerator-aware fault experiments that target vLLM and the stack beneath it (drivers, GPU Operator/DevicePlugin, NCCL/collectives, storage/network paths, NUMA/topology)
Build an automated harness (preferably extending krkn-chaos (https://github.com/krkn-chaos/krkn) ) to run controlled experiments with scoped blast radius, and evidence capture (logs, traces, metrics)
Integrate fault signals into pipelines (GitHub Actions or otherwise) as resilience gates alongside performance gates
Develop detection and diagnostics: dashboards and alerts for pre-fault signals (e.g., vLLM queue depth, GPU throttling, P2P downgrades, KV-cache pressure, allocator fragmentation)
Triage and root-cause resilience regressions from field/customer issues; upstream bugs and fixes to vLLM and llm-d
Explore and experiment with emerging AI technologies relevant to software development and testing, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.
Publish learnings (internal/external): failure patterns, playbooks, SLO templates, experiment libraries, and reference architectures; present at internal/external forums

What you will bring:

3+ years in reliability, and/or performance engineering on large-scale distributed systems
Expertise in systems‑level software design
Expertise with Kubernetes and modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI)
Observability & forensics skills with experience with Prometheus/Grafana, OpenTelemetry tracing, eBPF/BPFTrace/perf, Nsight Systems, PyTorch Profiler; adept at converting raw signals into actionable narratives.
Fluency in Python (data & ML), strong Bash/Linux skills
Exceptional communication skills - able to translate raw data into customer value and executive narratives
Commitment to open‑source values and upstream collaboration

The following is considered a plus:

Master’s or PhD in Computer Science, AI, or a related field
History of upstream contributions and community leadership, public talks or blogs on resilience, or chaos engineering
Competitive benchmarking and failure characterization at scale.

The salary range for this position is $127,890.00 - $211,180.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

משרות נוספות שיכולות לעניין אותך

Red hat Senior Performance Resilience Engineer - LLM Inference United States, District of Columbia, Washington

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Colorado, Denver

Red hat Senior Performance Resilience Engineer - LLM Inference United States, New York, City of Albany

Red hat Senior Performance Resilience Engineer - LLM Inference United States, Illinois, Springfield

04.09.2025

Jacobs Project Tunnel Engineer United States, California, Sacramento

שיתוף

At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for thriving cities, resilient environments, mission critical outcomes, operational advancement, scientific discovery and cutting edge...

At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for thriving cities, resilient environments, mission-critical outcomes, operational advancement, scientific discovery and cutting-edge manufacturing, turning abstract ideas into realities that transform the world for good.

משרות נוספות שיכולות לעניין אותך

23.08.2025

Jacobs Facilities Support Intermediate Level United States, California, Sacramento

שיתוף

At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for thriving cities, resilient environments, mission critical outcomes, operational advancement, scientific discovery and cutting edge...

At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for thriving cities, resilient environments, mission-critical outcomes, operational advancement, scientific discovery and cutting-edge manufacturing, turning abstract ideas into realities that transform the world for good.

משרות נוספות שיכולות לעניין אותך

04.07.2025

Red hat Senior Technical Support Engineer United States, California, Sacramento

שיתוף

Commitment to providing an exceptional customer experience by using professional communication and applying product knowledge and deep troubleshooting to perform direct actions in cluster environments to resolve various issues. Contribute...

What you will do:

Commitment to providing an exceptional customer experience by using professional communication and applying product knowledge and deep troubleshooting to perform direct actions in cluster environments to resolve various issues.
Contribute to global initiatives and projects to constantly reduce customer effort, improve tooling, and design and write automation software to improve efficiency.
Act as the direct contact and advisor for customer inquiries and issues with their Cloud Services through our Customer Portal, conference calls, and remote access.
Proactively analyze cluster status, identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions.
Record customer interactions including investigation, troubleshooting, and resolution of issues, to document diagnostic steps and issue resolution to create reusable solutions for future incidents.
Create and maintain knowledge articles aligned with the KCS (Knowledge-Centered Service) methodology.
Partner with internal teams and external parties to deliver seamless infrastructure support for Red Hat’s Cloud Services.
Manage incident and issue workloads to ensure that all customer issues are handled and resolved in a timely manner.
Maintain a strong work ethic, able to work effectively as part of a team, and focus on customers and resolving their issues.
Be available to perform weekend shift duties on a rotational schedule.

What you will bring:

5+ years of experience in a customer-facing technical support or solutions engineering role.
Proven experience in Infrastructure Implementation, Deployment, Administration, and Production Support of container technologies and orchestration platforms (e.g., CRI-O, Kubernetes, xKS, Docker, OpenShift Container Platform).
Experience with developer workflows, Continuous Integration (e.g., Jenkins), and Continuous Deployment paradigms.
Exceptional technical, analytical, and troubleshooting skills using tools like curl, strace, oc (kubectl), and Wireshark analysis to investigate and form precise action plans for issue remediation with components such as networking, system performance issues, Kubernetes, OpenShift Container Platform, Service Mesh, and RESTful API calls.
Experience working with tools surrounding the Kubernetes ecosystem such as Prometheus, Grafana, FluentD, etc.
Experience working with configuration management tools (e.g., Ansible, Terraform) and monitoring and automation tools (e.g., Ansible, Splunk).
Proficient scripting and automation skills (e.g., Python, Bash, Go) to convert manual and maintenance functions into fully orchestrated automation is a plus.
Ability to operate in complex, highly secure, and highly available environments and interact with Site Reliability Engineering (SRE) domain experts maintaining those environments.
Familiarity with established ITIL practices such as Incident, Change, Problem, and Release Management.
Excellent English communication skills (written and verbal) and interpersonal skills, with a desire to mentor other members of the support team and share technical knowledge in a helpful and timely fashion.
Experience logging issues and working with issue tracking tools such as Jira.
Ability to work effectively as part of an agile team, actively communicate status, and complete deliverables on schedule with a strong sense of initiative and ownership.
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Ability to work effectively and collaborate within a geographically distributed, global team.

The salary range for this position is $84,400.00 - $134,970.00. Actual offer will be based on your qualifications.

Pay Transparency

● Comprehensive medical, dental, and vision coverage

● Flexible Spending Account - healthcare and dependent care

● Health Savings Account - high deductible medical plan

● Retirement 401(k) with employer match

● Paid time off and holidays

● Paid parental leave plans for all new parents

● Leave benefits including disability, paid family medical leave, and paid military leave

משרות נוספות שיכולות לעניין אותך

18.05.2025

Jacobs Wildlife Crossing Engineer United States, California, Sacramento

שיתוף

Collaboration with multidisciplinary teams of engineers, scientists, planners, and other professionals to develop project approvals and designs. Supporting proposal preparation, client relationships, and business development activities. Preparing scopes of work...

Your impact

Key responsibilities of this position include:

Collaboration with multidisciplinary teams of engineers, scientists, planners, and other professionals to develop project approvals and designs.
Supporting proposal preparation, client relationships, and business development activities.
Preparing scopes of work and estimating level of effort for task completion.
Acting as a project manager and lead engineer on projects.
Maintaining excellent communication with clients, regulatory agencies, local government officials, employees, and subcontractors.
Providing management support and fostering the career growth of junior-level environmental staff.

משרות נוספות שיכולות לעניין אותך

09.05.2025

F

Fortinet Systems Engineer SLED United States, California, Sacramento

שיתוף

Sales calls - be the main technical resource on sales calls and answer/ educate the customer on issues ranging from features, specifications and functionality to integration. Conversant with networking applications...

• Pre-sales - assist in qualifying sales leads from a technical standpoint.
• Sales calls - be the main technical resource on sales calls and answer/ educate the customer on issues ranging from features, specifications and functionality to integration.
• Conversant with networking applications and solutions.
• Post-sales - be the lead technical contact for identified accounts for technical issues and will work closely with the technical support team and engineering to answer, elevate and resolve customer's technical issues.
• Provide assistance to identified customers with post-sales training.

Required Skills:

• 5 – 8 years experience in technical/pre-sales support as a sales or systems engineer
• 5 – 8 years experience in LAN/WAN/Internet services administration
• Strong understanding of DNS and NFS, SMTP, HTTP, TCP/IP
• Knowledge of the following technologies: Routing, Switching, VPN, LAN, WAN, Network Security, Intrusion Detection, and Anti Virus.
• Strong understanding in the following technologies and protocols: RADIUS, PKI, IKE, Certificates, L2TP, IPSEC, FIREWALL, 802.1Q, MD5, SSH, SSL, SHA1, DES, 3DES
• Experience with encryption and authentication technologies required
• Strong presentation skills

• The Systems Engineer, SLED is required to customarily and regularly work outside of their office or home office engaged in selling, including travel as needed to make a sale.

• Bachelor’s Degree or equivalent experience. Graduate degree preferred.

Wage ranges are based on various factors including the labor market, job type, and job level. Earnings for this position are expected to be $215,400 - $278,700. Need to talk to recruiter . Exact salary offers will be determined by factors such as the candidate's subject knowledge, skill level, qualifications, experience, and geographic location.

משרות נוספות שיכולות לעניין אותך

04.05.2025

Jacobs Bridge Engineer United States, California, Sacramento

שיתוף

Serve in a lead technical support role on a variety of bridge project sizes. Lead successful delivery of high-quality projects within budget and on schedule. Effectively collaborate with others and...

Your impact

Our Bridge Engineers:

Serve in a lead technical support role on a variety of bridge project sizes
Lead successful delivery of high-quality projects within budget and on schedule
Effectively collaborate with others and lead transportation and bridge teams in all aspects of bridge analysis and design, from conceptual planning and preliminary design to final design and construction
Identify creative and innovative engineering solutions based on client, project, and site constraints
Provide technical guidance and oversight, and perform quality reviews of work by others
Commit to quality and continuous improvement as individuals as well as part of a team
Complete assigned tasks with a high-level of quality within schedule and budget constraints while collaborating with teams of professionals from multiple disciplines
Lead and support project execution, quality management, and safety plans
Develop plans, specifications, cost estimates, and final bid packages for bridges and other transportation structures
Train, mentor, and direct the work of less-experienced engineers
Identify schedule and cost variances and develop/implement recommendations for corrective action in a timely manner
Demonstrate leadership by organizing and actively participating in technical development and other networking activities both internally and externally
Assist in marketing activities to procure new opportunities, coordinating with client account management leads
Have strong written and oral communication skills and a team-oriented attitude

This position will be based out of any of our Northern CA offices including Sacramento, CA, Redding, CA, San Francisco, CA, Oakland, CA and San Jose, CA, and may include limited travel.