Job responsibilities
- Guide and assist peers in creating designs and gaining consensus.
- Collaborate with teams to design and implement deployment approaches using automated CI/CD pipelines.
- Design, develop, test, and implement solutions for availability, reliability, and scalability.
- Implement infrastructure, configuration, and network as code.
- Collaborate with technical experts and stakeholders to resolve complex problems.
- Utilize service level indicators and objectives to proactively resolve issues before they impact customers.
- Support the adoption of site reliability engineering best practices within the team.
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
- Manage and optimize various types of databases, including relational, NoSQL, and columnar databases.
- Utilize programming languages such as Python, SQL, Spark, Ada, R, C/C++, Java, and JavaScript.
- Demonstrate experience with big data platforms like Databricks, Spark, Snowflake, and Hadoop.
- Apply knowledge of machine learning, deep learning, generative AI, and statistical analysis.
- Use containerization tools like Docker and orchestration platforms like Kubernetes.
- Apply site reliability engineering principles, including SLAs, SLOs, and error budgets.
- Understand networking fundamentals, including TCP/IP, DNS, and network protocols.
- Experience with cloud services like AWS, Azure, or Google Cloud.
- Familiarity with version control systems like Git.
- Thorough understanding of encryption, access controls, and secure data transmission techniques.
Preferred qualifications, capabilities, and skills
- Experience with data platforms like Splunk, Datadog, Dynatrace, and the Elastic Stack.
- Implement data ingestion techniques such as Batch Ingestion & Streaming Real-time Ingestion (Kafka/Cribl).
- Utilize data visualization and analytics tools like Grafana, Splunk, Tableau, Power BI, and Graph Explorer.
- Perform data wrangling and cleansing, and manage Data ETL processes (Cleansing, Transformation, Integration/Enrichment).
- Strong communicator with excellent problem-solving, critical thinking, and analytical reasoning skills, along with attention to detail and a passion for innovation.