Finding the best job has never been easier
Share
What will you do?
Develop and maintain a high-quality ML inference runtime platform for multi-modal model serving.
Contribute to upstream inference runtime communities such as vLLM, DeepSpeed, OpenVINO, and others.
Implement and maintain CI/CD pipelines that allow faster, more secure, reliable, and frequent releases
Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
Coordination and communication with various stakeholders
Applying a growth mindset by staying up to date with AI and ML advancements
What will you bring?
Proficiency with programming in Python
Familiarity with model parallelization, quantization, and memory optimization using vLLM, DeepSpeed, and other inference libraries.
3+ years of experience in DevOps, Automation, and modern Software Deployment practices
Solid understanding of the fundamentals of model inferencing architectures
Experience with Jenkins, Git, shell scripting, and related technologies
Experience with the development of containerized applications in Kubernetes
Experience with Agile development methodologies
Experience with Cloud Computing using at least one of the following Cloud infrastructures AWS, GCP, Azure, or IBM Cloud
Ability to work across a large distributed hybrid engineering team
Development experience with C++ especially with the CUDA APIs is a plus
Experience with open-source development is a plus
These jobs might be a good fit