Your Role and ResponsibilitiesIn this role, you will:
- Develop and maintain scalable distributed systems in AWS.
- Develop and maintain high performance k8s clusters across multiple regions
- Develop and maintain telemetry infrastructure & service instrumentation (python) for metrics, distributed tracing, and logging
- Support infrastructure for a petabyte scale data platform and stream analysis services.
- Work with Audio and Speech AI Engineers to accelerate development and deployment of heterogeneous analysis and distributed training
- pipelines
- Participate in the definition and management of SLIs, SLOs and error budgets for infrastructure and production services
- Design and implement infrastructure-as-code pipelines
Required Technical and Professional Expertise
- 2+ Years AWS experience designing, implementing, and support cloud based infrastructure
- 2+ Years experience architecting, deploying, and supporting kubernetes in cloud environments.
- 2+ years experience designing and supporting distributed systems.
- Experience writing production code in one of more languages such as Python (preferred), Java, Go in a microservices environments.
- 2+ Years Linux experience configuring, supporting, and optimizing
Preferred Technical and Professional Expertise
- Familiarity running distributed ML workloads in cluster orchestrated environments
- Experience building and supporting telemetry and related infrastructure (Open Telemetry, Jaeger, Grafana, Prometheus)
- Experience designing and implementing infrastructure as code pipelines
- 2+ Years PubSub Experience (Kafka, SQS, SNS, MQTT)
- Experience designing and implementing traffic routing strategies in edge and microservices environments.