Own the reference architecture for GenAI: LLM hosting, vector DBs, orchestration layer, real‑time inference, and evaluation pipelines.
Design and govern Retrieval‑Augmented Generation (RAG) pipelines—embedding generation, indexing, hybrid retrieval, and prompt assembly—for authoritative, auditable answers.
Select and integrate toolchains (LangChain, LangGraph, LlamaIndex, MLflow, Kubeflow, Airflow) and ensure compatibility with cloud GenAI services (Azure OpenAI, Amazon Bedrock, Vertex AI).
Implement MLOps / LLMOps: automated CI/CD for model fine‑tuning, evaluation, rollback, and blue‑green deployments; integrate model‑performance monitoring and drift detection.
Embed “shift‑left” security and responsible‑AI guardrails—PII redaction, model‑output moderation, lineage logging, bias checks, and policy‑based access controls—working closely with CISO and compliance teams.
Optimize cost‑to‑serve through dynamic model routing, context‑window compression, and GPU / Inferentia auto‑scaling; publish charge‑back dashboards for business units.
Mentor solution teams on prompt engineering, agentic patterns (ReAct, CrewAI), and multi‑modal model integration (vision, structured data).
Establish evaluation frameworks (e.g., LangSmith, custom BLEU/ROUGE/BERT‑Score pipelines, human‑in‑the‑loop) to track relevance, hallucination, toxicity, latency, and carbon footprint.
Report KPIs (MTTR for model incidents, adoption growth, cost per 1k tokens) and iterate roadmap in partnership with product, data, and infrastructure leads.
Required Qualifications:
10+ years designing cloud‑native platforms or AI/ML systems; 3+ years leading large‑scale GenAI, LLM, or RAG initiatives.
Deep knowledge of LLM internals, fine‑tuning, RLHF, and agentic orchestration patterns (ReAct, Chain‑of‑Thought, LangGraph).
Proven delivery on vector‑database architectures (Pinecone, Weaviate, FAISS, pgvector, Milvus) and semantic search optimization.
Mastery of Python and API engineering; hands‑on with LangChain, LlamaIndex, FastAPI, GraphQL, gRPC.
Strong background in security, governance, and observability across distributed AI services (IAM, KMS, audit trails, OpenTelemetry).
Preferred Qualifications:
Certifications: AWS Certified GenAI Engineer – Bedrock or Microsoft Azure AI Engineer Associate.
Experience orchestrating multimodal models (images, video, audio) and streaming inference on edge devices or medical sensors.
Published contributions to open‑source GenAI frameworks or white‑papers on responsible‑AI design.
Familiarity with FDA or HIPAA compliance for AI solutions in healthcare.
Demonstrated ability to influence executive stakeholders and lead cross‑functional tiger teams in a fast‑moving AI market.