Your Role and Responsibilities
- AWS Distributed Systems: Develop and maintain scalable distributed systems in AWS.
- Kubernetes Cluster Management: Develop and maintain high-performance Kubernetes (k8s) clusters across multiple regions.
- Telemetry Infrastructure: Develop and maintain telemetry infrastructure and service instrumentation (Python) for metrics, distributed tracing, and logging.
- Petabyte-Scale Data Platform: Support infrastructure for a petabyte-scale data platform and stream analysis services.
- Collaboration with Audio and Speech AI Engineers: Work with AI Engineers to accelerate development and deployment of heterogeneous analysis and distributed training pipelines.
- SLIs, SLOs, and Error Budget Management: Participate in the definition and management of SLIs, SLOs, and error budgets for infrastructure and production services.
- Infrastructure-as-Code Pipelines: Design and implement infrastructure-as-code pipelines.
Required Technical and Professional Expertise
- Bachelor degree or above, over 1 year IT related work experience
- Installation, configuration, implementation and maintenance of the full range of IBM power and storage and tape libraries, especially for p7/p8/p9,ts3500/4500,DS8K,V7000,F900,SVC,XIV etc…
- AIX/HACMP/Linux one installation, configuration and maintenance.
- Cisco CCNP holder
- Network configuration and trouble shooting.
- Good communication skill in Chinese and English