Finding the best job has never been easier
Share
Infrastructure Design & Optimization
Architect, develop and deploy scalable AI/ML infrastructure using Kubernetes, Docker, and containerized solutions.
Configure & Optimize GPU, TPU, and high-performance compute resources for efficient training and inference.
Implement cloud-based AI solutions and integrate with on-prem environments.
Design high-speed networking and storage solutions to support AI workloads. Strong understanding of RDMA, RoCE V2 protocols.
Working and Managing experience with Nvidia SuperPOD.
Deep understanding and hands on experience on Parallel File Systems (Lustre)
AI/ML Pipeline Performance & Automation
Optimize distributed training workflows.
Implement automated deployment pipelines.
Ensure cost-efficient resource utilization by tuning cloud auto-scaling, spot instances, and job scheduling.
Monitoring, Security & Compliance
Develop observability and monitoring tools using Prometheus, Grafana.
Ensure AI workloads comply with security best practices (RBAC, IAM, encryption).
Maintain high availability, fault tolerance, and disaster recovery strategies for AI infrastructure.
AI/ML engineers, data scientists, and DevOps teams to streamline AI workflows.
Stay ahead of emerging AI infrastructure trends
3-7 years of experience in AI/ML infrastructure, DevOps, or HPC environments .
Strong expertise in Linux, Kubernetes, and container orchestration .
Hands-on experience with GPUs, TPUs, or AI accelerators .
Knowledge of storage architectures, distributed file systems (Lustre), and data lakes .
Experience with monitoring, logging, and automation tools (Prometheus, ELK, Grafana, Python, Bash) .
Strong understanding of AI/ML frameworks (TensorFlow, PyTorch)
Excellent problem-solving skills and ability to work in a fast-paced AI/ML environment .
Expert level experience on C programming knowledge.
This website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our for more information.
These jobs might be a good fit