המקום בו המומחים והחברות הטובות ביותר נפגשים
Prime Video is looking for a driven and talented ML engineer with prior expertise in deploying, optimizing, and maintaining ML and DL-based workloads. ML/DL solutions enable content-adaptive processing and encoding of video as well as on models that measure video quality. You will help deploy proven algorithms/architectures, optimize, re-train, expand coverage to additional encoding profiles or codecs, quantize the models (as necessary), and integrate such workloads at scale with the help of other orchestration teams on instances that offer the best cost and turn-around times. You will develop suitable monitoring dashboards and guardrails to ensure proper operation.Key job responsibilities
As an ML-engineer, you will assist Research/Applied Scientists in the team to collect ground-truth data, clean data and labels, set up scalable training of such models to utilize multiple GPUs efficiently, deploy pre-trained inference with optimal performance on appropriate EC2 instances, work with SDEs to define suitable job queues and APIs for the inference workloads to integrate them as part of larger orchestration, and will develop suitable monitoring dashboards to keep track of the different training/inference jobs.You will triage operational bottlenecks and failures related to ML/DL workloads.You will identify the evolving best practices for running such workloads at scale with optimum performance.You will define/refine suitable processes related to maintenance of large datasets, framework versions, code maintenance, mechanisms used to identify the right instance type for a given algorithm and ways to maximize utilization of availed compute instances while meeting SLA guarantees.A day in the life
You will extract and maintain features from a large set of training videos to train classical ML models.You may obtain and maintain ground-truth labels required for training ML-models.You will develop or adopt tools to monitor progress during training. You may perform cross-validation in multiple folds to verify the performance of different ML models.You may benchmark readily available ML/DL solutions (open or proprietary) and compare them against internal solutions.You will work with stakeholders (e.g. product, studios, Applied Scientists, Engineering team members) to facilitate fully automated as well as human-in-the-loop type of workflows.You will create appropriate tickets for known issues and will triage and root-cause such issues as per their severity.
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Prior experience in deploying training and inference workloads on cloud instances covering both CPU and multi-GPU.
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
- Work experience deploying ML/DL work-loads in production for video or computer vision use-cases
משרות נוספות שיכולות לעניין אותך