Responsible for complete ML model lifecycle. Build/optimize ML models, implement scalable data pipelines, automate workflows using AI/LLMs. Collaborate with engineers, PMs, business users. Ensure secure, efficient ML solutions across environments. Requires Python proficiency, GCP experience, strong MLDay to Day work:- Architect, Design, and implement scalable, production-quality backend microservices and REST/gRPC APIs using your primary language and framework (e.g., Java/Spring Boot, Scala/Akka,
- Architect and support event-driven and real-time data solutions using messaging or streaming platforms such as Apache Kafka, Apache Flink, Apache spark structured streaming, Pulsar, Pub/Sub, or similar.Integrate with, and optimize, relational (PostgreSQL, MySQL) or NoSQL databases, designing schema and high-performance queries.
- Leverage containerization (Docker) and orchestration (Kubernetes) to build and deploy cloud-native, resilient applications.
- Contribute to CI/CD pipelines, infrastructure as code, and cloud-native operational practices.
- Champion secure coding, observability, monitoring, and performance optimization across all services.
- Document system architecture, real-time data workflows, and operational runbooks.
Essential Responsibilities:
- Oversee load balancers, Global Server Load Balancing (GSLB) equipment, Content Delivery Network (CDN) platforms, and troubleshoot client/server issues.
- Handle proactive and reactive outages, work with vendors, and take ownership of network incidents, escalations, and root cause analysis.
- Lead backbone network design and architecture for on-premises and public cloud migration, and build hybrid connectivity solutions.
- Work with business unit leadership, product managers, and customers to develop project objectives, timelines, and feature requirements.
- Support network system availability, manage network capacity, and ensure compliance with Information Security standards.
- Deploy equipment, manage SSL certificate hosting and renewal, and build infrastructure for data centers and cloud environments.
- Implement and continually improve operational procedures, systems, and network engineering best practices.
- Offer on-call support, monitor infrastructure and hosted services, and troubleshoot complex issues.
- Manage and influence global teams, provide guidance and support, and mentor team members in areas of expertise.
- Communicate incident impacts, risks, and mitigation plans to executives, document incident information, and create network engineering documentation and best practices.
Minimum Qualifications:
- Minimum of 5 years of relevant work experience and a Bachelor's degree or equivalent experience.
- Master's degree or higher in Computer Science, Mathematics, or related field with keen interest in Machine Learning and AI
- Proven experience in developing and implementing solutions in machine learning and AI-related spaces
- Strong programming skills in languages such as Python, Java, or C++
- In-depth knowledge of machine learning frameworks and libraries for analytics and text processing (e.g., TensorFlow, PyTorch)
- Experience with cloud services related to machine learning (Vertex AI, etc.)
- Excellent problem-solving skills and the ability to work in a fast-paced environment
- Strong communication skills to effectively collaborate with team members and stakeholders
- Strong knowledge of algorithms, statistics, data structures, distributed systems, and software engineering best practices
- Proven experience leading and delivering complex ML projects at production scale
- Experience integrating ML solutions into cloud environments (e.g., AWS, Azure, GCP) is highly desirable
Travel Percent:
The total compensation for this practice may include an annual performance bonus (or other incentive compensation, as applicable), equity, and medical, dental, vision, and other benefits. For more information, visit .
The US national annual pay range for this role is $123,500 to $212,850
Our Benefits:
Any general requests for consideration of your skills, please